Disaster Recovery Plan Example

I would perform my risk assessment based on the total amount of destruction in several categories. For example, assume I have a business that produces 10 million per year in sales where our cost of product is five million of the ten million in sales. We have operating costs of two million per year where one million is for employee salaries and the other is to operate our facilities. Therefore the business makes 3 million per year in profit. I will assume the business has one office building which includes the sales floor, office areas, and IT server area. Half of the 10 million in sales is through the internet.

You have determined that your primary business functions include:

  1. Sales
  2. Information Technology
  3. Marketing
  4. Customer service
  5. Finance including accounts payable, accounts billable, payroll, and accounting.
  6. Purchasing

You have determined that all your primary business functions rely heavily on your IT department not only because your systems are well automated with all sales, marketing, finance, customer service, and purchasing information being stored on your servers, but also because half your sales depends on your website being available. Your sales floor is on the first level of your building

Categories of Destruction include:
     Cause	Probability per year	Replacement Time	Plant Cost
Total Building Loss	0.9%		1 Year			$2 million
     Tornado	0.3%
     Fire		0.5%
     Flood	0.1%
Loss of Sales Floor	1.5%		1 Month			$300,000
     Flood	0.3%
     Riot		0.8%
     Fire		0.4%
Loss of IT Server Area	0.8%		2 Months		$400,000
     Fire		0.8%
Total Loss of Web server  15%		2 weeks		$4000
     System Fail 10%
     Security Inc	5%
Total Loss of File server  15%		2 weeks		$4000
     System Fail 10%
     Security Inc	5%
Total Loss of Mail server  15%		2 weeks		$4000
     System Fail 10%
     Security Inc	5%
* Security Inc = Security incursion resulting in system failure.

Some of these threats may only destroy small specific areas and the probabilities of total destruction versus partial destruction have been calculated separately. In the case of loss of servers, only a total system failure and extreme security incidents which would disable the system have been considered. A repair or replacement of the system may be necessary but would require two days to two weeks depending on the situation. In this situation the time to recover may be reduced by keeping some additional computer equipment on hand which may be included in the disaster recovery plan.

This example does not consider historical web server down time and the cost to sales.

Cost of Loss:
Loss				Plant loss	Sales loss	Productivity loss		Total
Total Building Loss		$2 million	$10 million	$1 million		$13 million
Loss of Sales Floor		$300,000	$416,666	$5000			$721,666
Loss of IT Server Area		$400,000	$833,333	$133,333		$1,366,666
Total Loss of Web server		$4000		$192,307	$0			$196,307
Total Loss of File server		$4000		$0		$33,333			$37,333
Total Loss of Mail server		$4000		$0		$16,000			$20,000

It is obvious that the main loss is much greater than actual loss of property. Even considering the fact that the organization may have insurance, the losses could still be:

Loss				Total		Probability	Risk per year
Total Building Loss		$11 million	0.9%		$99,000
Loss of Sales Floor		$421,666	1.5%		$6325
Loss of IT Server Area		$966,666	0.8%		$7733
Total Loss of Web server		$192,307	15%		$28,846
Total Loss of File server		$33,333		15%		$5000
Total Loss of Mail server		$16,000		15%		$2400

At this point, the business has a risk assessment and knows the possible loss to the business should any of these threats be realized. This example is not a detailed enough example to apply to most organizations but should give a general idea about how to quantify the risks. It does not really quantify many of the threats such as theft, sabatoge, and others so a real assessement would need to be more detailed.

Recovery Strategy

In the case of some of the failure scenarios we analyzed above some possible solutions may include:

  1. Total Building Loss
    • Contract another company to supply floor space for a limited period of time
    • Create a separate facility in another location
  2. Loss of Sales Floor
    • Contract another company to supply floor space for a limited period of time
    • Create a separate sales floor in another part of town
  3. Loss of IT Server area
    • Contract another company to supply floor space for a limited period of time
    • Have duplicate servers at an alternate location
  4. Loss of Web Server
    • Have additional server available Recovery time:4 hours $2289 sales loss + $4000 server cost
  5. Loss of File Server
    • Have additional server available Recovery time:4 hours $397 productivity loss + $4000 server cost
  6. Loss of Mail Server
    • Have additional server available Recovery time:4 hours $190 productivity loss + $4000 server cost

I have not provided costs for items 1 through 3 but some may have definate advantages such as having a different sales floor in another part of town or some nearby town. This could expand the business and possibly provide a temporary alternate location if needed. In item 4, the potential loss of sales was $196,307 per event for a risk of $28,846 per year based on the chance of the loss being realized. Therefore, it is obvious that spending $4000 to get the rebuild time down to 4 hours provides a very worthwhile benefit. The same hardware can also be used to cover items 5 and 6 with no additional cost which saves even more. These are the kind of immediate disaster recovery solutions that should be implemented quickly. If it is decided that more redundance is cost effective, then these systems should also be put in place. Many obvious solutions depending on cost include redundant servers (which I recommend for this company's web server), RAID, UPS for power, generators, and redundant server power supplies. Some of these solutions simply prevent minor or even significant outages. For example using RAID can prevent several hours of service outage from occurring since the server continues to operate with the loss of one hard drive. Without RAID, losing one hard drive may cause a new drive to need to be installed and data to be restored from backup in the event of a failure. This could take many hours.

In each of these solutions the following should be considered:

  1. Cost of the solution
  2. Time to implement the solution after an incident which affects loss of productivity and loss of income costs.
  3. Annual value of potential loss which is the threat risk per year. some number of years worth of this money could be spent to reduce the damage if the threat is realized or to reduce the chance of the threat materializing.

Disaster Recovery Budgeting

When attempting to establishing a budget it may not be completely obvious how much money should be spent. This is mainly because, the threat may never be materialized or it could happen tomorrow. It is somewhat of a gamble. However, in a previous step, we calculated the real value in dollars of some threats per year to our example business.

In our example we had a total set of risk costs per year just under $150,000. Lets say we could spend a certain dollar amount to reduce this number listed as follows:

	Amount Spent	Threat Risk per Year	How Much Less Threat Risk Annually
1	$0		$150,000		$0
2	$10,000		$90,000			$60,000
3	$20,000		$85,000			$65,000
4	$60,000		$65,000			$85,000
5	$100,000	$40,000			$110,000
6	$150,000	$30,000			$120,000

Spending the amounts for items 2, 3, and 4 are very worthwhile since the money realized in threat risk per year would statistically be recovered in two years or less. Spending the additional $40,000 to go from item 4 to 5 saving an additional $25,000 in threat risk per year would also be recovered in less than two years and is worthwhile. Spending the additional $50,000 to go from item 5 to item 6 would require five years to recover and may be marginally worth the extra money. Depending on the circumstances such as wether it helps expand the business or reduces some other costs, it may or may not be worthwhile.

At this point, you can decide what amount of money you want to spend. Don't forget to consider maintenance costs associated with the solution you decide to use. Many of these factors are not considered in this document for simplicity. Once a decision is made, management will approve the budget and you will begin to develop the disaster recovery plan. We have decided to use plan 5.

Developing the Plan

At this point, we have a budget and a list of threats we want to protect against or mitigate. Let's say plan 5 uses a hot site with a redundant web server, file server, and mail server. The hot site contains mirrored servers that are copies of the servers in the main IT server room with the capability to copy data in close to real time to the alternate servers. The hot site is connected to the internet using a T1 costing $700 per month. Additional rent for the alternate facility is $2000 per month. Point to point VPN is used to provide a private connection to the new facility and load balancing is done using a content switch for the web servers between the hot site and the main facility. Therefore, there is a set of servers at the main facility and another one at the hot site with data actively being stored on the backup set of servers. There may also be some load balancing done depending on the speed and reliability of the cross town connection.