Disaster Recovery Planning Phases

This section outlines the disaster recovery planning phases. Note that the actual creation of the plan does not happen until step five. This is because the scope and objectives of the plan must be defined, then analysis of risk and possible solutions must be evaluated. Once all the research is complete, then the proper decisions can be made and the actual plan can be formed based on the resources available.

1 Create a contingency planning policy statement

This part of the disaster recovery planning involves the creation of a project definition. A definition of what you are planning will help keep the disaster recovery planning on track since the objectives will be defined and understood. It will also provide for plan maintenance after the plan is completed and implemented. One of the things that should be defined here is what level of loss should be considered in the plan such as should the plan consider loss due to theft of confidential data. The loss may be severe depending on the type of business and the value of data that is at risk. The contingency planning policy statement defines:

  1. Objectives of the plan
  2. Roles and responsibilities of involved staff
  3. Functions of the organization to be covered by the contingency plan
  4. The systems covered by the contengency plan.
  5. Identify resources required to develop and maintain the plan.
  6. Define testing schedules for the plan
  7. Create a maintenance schedule for the plan
  8. Identify training requirements
  9. Determine backup frequency and locations where backup media will be stored in conjunction with the Backup policy.

2 Perform a business system analysis

Conduct a business impact analysis (BIA) which will characterize system requirements, processes, and interdependencies.

  1. Identify business processes and especially critical processes. These processes would include sales, marketing, customer service, finance, human resources, facilities, research, and others.
  2. Identify the business process dependencies and critical IT resources and other resources that the business processes depend upon. Consider:
    1. Power requirements
    2. Connections
    3. Environmental controls
    4. Network equipment including servers, routers, and switches
    5. Facilities and building space.
  3. Determine the cost of business disruption versus cost of recovery against time. The data assessment results for data in the organization should help to identify these costs.
  4. Determine how soon each system must be recovered to minimize loss against cost, plan for measures to meet the requirement.

3 Perform a risk assessment and analysis

List threats and analyze the likelihood of the threat materializing. Calculate the amount of loss that would occur if the threat materializes. Analyze threats to all business processes including the processes that primary process are dependent upon. These threats would include threats to IT infrastructure, communications, building services and others. The threats should be analyzed against both the processess and supporting systems.

Risk management - Steps 1 and 2 are risk assessment (base on likelihood and impact - assign a risk level of low medium or high)

  1. Identify threats and vulnerabilities
  2. Identify current controls and additional controls to prevent incidents, reduce likehood of incidents, or reduce damage
  3. Identify risks requiring contengency plans

List risks that threaten each system and determine the likelihood of the risk materializing

  1. Calculate the probability that the threat will materialize and the degree of impact with each low, medium, and high. If you can calculate a potential loss if the threat materializes or a cost range, it will help when establishing a budget.
  2. Consider earthquakes, flood, tornado, and power outages. Check the possible threats listed in the page titled "Organizational Threats".

4. Develop recovery strategies

  1. List possible solutions. Possible solutions should:
    • Reduce probability of threat materializing
    • Reduce impact
  2. Identify preventative controls - This step can allow the elimination or reduction of the threat to the business operation. This includes appropriately sized uninterruptible power supplies (UPS), generators, fire suppression systems, air conditioning, offsite storage of backup media and reilable backups, water sensors, physical and policy based security controls.

When developing disaster recovery stragegies, consider cost, allowable outage times, possible incidents that can cause outages. Recovery strategies should consider the use of mirrored systems, RAID, equipment vendor service level agreements, mobile sites, mirrored sites, or hot sites, cold sites. The backup method should be defined in the Backup Policy and the Backup Policy should work with the Disaster Recovery Plan. Site types:

  • Mirrored site - Redundant site with a mirrored copy of data in close to real time.
  • Hot site - A site with space, hardware, equipment, and personnel to support replacement or current systems in short notice. Contains complete telecommunications and hardware equipment.
  • Warm site - Space that is partly equiped with power, communications capability, and some computer equipment which can temporarily accomodate relocated systems. The site is operationally ready to receive or operate computer equipment. Contains partial hardware equipment and partial to full telecommunications equipment.
  • Cold site - Usually has space and environmental controls, a little communications capability, power, and environmental controls which can receive and support IT equipment but it is not maintained ready to receive computer equipment. It may have little to no telecommunications equipment and very little hardware equipment.
  • Mobile site - Usually provided by a vendor, this type of site is usually a tractor trailor fitted with communications and power equipment to meet IT system requirements. These sites should be planned with the vendor before a disaster occurs.

5. Establish budget

Consider:

  1. Loss due to downtime especially sales loss and other productivity loss.
  2. Loss of sensitive data.
  3. Loss of equipment and setup of replacement equipment.
  4. System to recover from complete loss, cost, and length of recovery time weighed against the business needs.
  5. Potential losses against what is available to spend for disaster recovery and mitigation

6. Develop the plan

  1. Establish recovery procedures
  2. Create a recovery team
  3. Develop an IT contingency plan - The plan must include roles and responsibilities of involved staff and define involved teams

7. Test the plan

Make the test team go through the process of setting up a replacement network and new systems without access to information or facilities in the main building as though it had been destroyed. Make them provide all documentation including:

  1. Procedures for building all servers
  2. Contact information for vendors and internal people who must be contacted during an emergency. Include contact information for ISPs.
  3. Network drawings and documentation including documentation about IP addresses, netmasks of the network, ISP router information, ISP DNS information.
  4. IT policies and procedures including security policies for re-establishment of policies and procedures should the primary facility be destroyed.
  5. All software and licenses needed to re-build all systems.
  6. Backed up data that can be restored.
  7. Documentation about every system configuration including how it is secured, services that are running, the latest patches.

Plan testing, training, and exercises

  1. Test system recovery on a different computer from backup media.
  2. Test internal and external connectivity
  3. Test the performance of systems on the alternate equipment.

8. Plan maintenance

The plan must be reviewed and updated regularly to be sure that it is kept current as business requirements and technology both change. Consider:

  1. Names and contact information of venders changing
  2. Names and contact information of team members changing
  3. Changes to available and required hardware and software.
  4. Changes to technical procedures caused by business requirements changes or technology changes.
  5. Changes to security requirements.
  6. Changes to operational requirements.