Business Continuity & Disaster Recovery | CompTIA Network+ N10-007 | 3.2b

In this video you will learn about business continuity & disaster recovery concepts such as: cold sites, warm sites, hot sites, various types of backups, snapshots, MTTR, MTBF, and SLA requirements.

Recovery

Disaster recovery sites are sites where IT functions can be set up when a disaster prevents the use of the original location. These fall into three categories:

  • Cold Site:  A cold site has power, HVAC, and network connections, but would need equipment and data before it could be used for IT functions.  This is the least expensive to maintain before a disaster but takes the longest time to set up during a disaster.
  • Warm Site:  A site that has power, HVAC, network, and hardware suitable for IT functions is a warm site.  Systems at the warm site might need to have operating systems, apps, and data restored, or operating systems and apps could be already installed to save time.  A warm site costs more than a cold site, and would require ongoing maintenance of hardware and possibly software, but can be made ready in hours, rather than days, compared to a cold site.
  • Hot Site:  A hot site is, in IT terms, a duplicate of your primary IT functions, with hardware, apps, and data ready to run in minutes or less in the event of a disaster.  This is the most expensive of the three disaster recovery plans, but for an organization that can afford no downtime, it might be the only one that is worth considering.

Backups

When hardware fails, you can purchase replacements, but when storage devices fail, the data is lost unless you have backup copies.  A backup is a copy of information stored on a computing device (laptop, desktop, server, or mobile device). A backup can be restored in the event of data loss.  A backup copy of information on a system can be used by the system in case the original is lost or corrupted. There are many backup methods designed for different requirements.  The following sections discuss backup methods and when to use them.

  • Full backup: backs up all files whether they were backed up previously or not. When the file is backed up, a file attribute known as the archive bit is changed to indicate the file has been backed up.
  • Differential backup: backs up all files changed since the last full backup.
  • Incremental backup: backs up only the files changed since the last full or incremental backup.

A differential backup and incremental backup are two different methods of backing up only changed files. If incremental backups are used between full backups, in the event that a full restoration is needed, the last full backup and all incremental backups must be restored. However, if differential backups are used between full backups, only the last full backup and the last differential backup must be restored.

Snapshots

A snapshot is a record, comprised of metadata, that indicates the state of blocks & files in a unit of storage.  Often, snapshots come as a feature of NAS or SAN storage and are created and held on that storage.  They allow the user to roll back to previously existing versions of a volume, drive, file system, database, etc.  Snapshots are like a point-in-time copy or table of contents that shows which blocks and/or files existed and where.  In the case of rolling back, the volume or unit of storage in question would be changed to a state that reflected the snapshot, by removal and movement of blocks, etc.  Also, snapshot are not backups because they are not copies.  They don’t take up much space individually, but their total volume can grow, especially if there are lots of deleted blocks/files, and so suppliers usually limit the amount of snapshots that can be retained.[1]

MTTR (Mean Time to Repair)

MTTR is a basic measure of the maintainability of repairable items.  It represents the average time required to repair a failed component or device.  Expressed mathematically, it is the total corrective maintenance time for failures divided by the total number of corrective maintenance actions for failures during a given period of time.[2]

MTBF (Mean Time Between Failures)

MTBF is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation.  MTBF can be calculated as the average time between failures of a system.  The term is used for repairable systems, while MTTF (mean time to failure) denotes the expected time to failure for a non-repairable system.[3]  The definition of MTBF depends on the definition of what is considered a failure.  For complex, repairable systems, failures are considered to be those out of design conditions which place the system out of service and into a state for repair.  Failures which occur that can be left or maintained in an unrepaired condition, and do not place the system out of service, are not considered failures under this definition.[4]  In addition, units that are taken down for routine scheduled maintenance or inventory control are not considered within the definition of failure.[5]  The higher the MTBF, the longer a system is likely to work before failing.

SLA (Service-Level Agreement) Requirements

A SLA is a commitment between a service provider and a client.  Particular aspects of the service — quality, availability, responsibilities — are agreed between the service provider and the service user.[1]  The most common component of an SLA is that the services should be provided to the customer as agreed upon in the contract.  As an example, ISPs will commonly include SLAs within the terms of their contracts with customers to define the level(s) of service being sold in plain language terms.  In this case, the SLA will typically have a technical definition in mean time between failures (MTBF), mean time to repair or mean time to recovery (MTTR); identifying which party is responsible for reporting faults or paying fees; responsibility for various data rates; throughput; jitter; or similar measurable details.

References

Adshead, A. (2020, Apr 23). Storage 101: Snapshots vs backup. ComputerWeekly.
Mean time to repair. Wikipedia.
Lienig, J., Bruemmer, H. (2017). Fundamentals fo Electronic Systems Deisgn.
Mean time between failures. Wikipedia.
Stephen, (2011, July 6). Defining Failure:  What Is MTTR, MTTF, and MTBF?
Kearney, K.T.; Torelli, F. (2011). Service level Agreements for Cloud Computing.