Business Continuity & Disaster Recovery | CompTIA Network+ N10-007 | 3.2a

In this video you will learn about business continuity & disaster recovery concepts such as: fault tolerance, high availability, load balancing, NIC teaming, port aggregation, clustering, power management, battery backups/UPS, dual power supplies, & redundant circuits.

Availability Concepts

Fault Tolerance

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components.  If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown.  Fault tolerance is particularly sought after in high-availability or life-critical systems.  A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails.[1]  The term is most commonly used to describe computer systems designed to continue more or less full operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial failure.  That is, the system as a whole is not stopped due to problems either in the hardware or the software.[2]

High Availability

High availability means that an IT system, component, or application can operate at a high level, continuously, without intervention, for a given time period.  High-availability infrastructure is configured to deliver quality performance and handle different loads and failures with minimal or zero downtime.[3]  There are three principles of systems design in reliability engineering which can help achieve high availability:

  1. Elimination of a single point of failure.  This means adding or building redundancy into the system so that failure of a component does not mean failure of the entire system.
  2. Reliable crossover.  In redundant systems, the crossover point itself tends to become a single point of failure.  Reliable systems must provide for reliable crossover.
  3. Detection of failures as they occur.  If the two principles above are observed, then a user may never see a failure — but the maintenance activity must.

Load Balancing

Load balancing refers to the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient.  Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.[4]  A load balancer may be:[5]

  • A physical device, a virtualized instance running on specialized hardware or a software process.
  • Incorporated into application delivery controllers designed to more broadly improve the performance & security of three-tier web and microservices-based applications, regardless of where they’re hosted.
  • Able to leverage many possible load balancing algorithms, including round robin, server response time & the least connection method to distribute traffic in line with current requirements.

NIC Teaming

Network Interface Card (NIC) teaming allows you to combine multiple physical network adapters into a virtual NIC, which will then be presented to the operating system (OS) as a single NIC.  All of the traffic being sent from the OS will pass through the virtual NIC and be load-balanced across the assigned physical network connections.  The benefits of NIC teaming stem from its load-balancing capabilities.  Efficiently breaking up network traffic across multiple connections better utilizes network resources, ensures the availability of servers and increases bandwidth.  It also helps simplify network configuration.[6]

Port Aggregation

Port aggregation (or link aggregation) is the combining (aggregating) of multiple network connections in parallel by any of several methods, in order to increase throughput beyond what a single connection could sustain, to provide redundancy in case one of the links should fail, or both.  A link aggregation group (LAG) is the combined collection of physical ports.  Other umbrella terms used to describe the concept include:  trunking, bundling, channeling, or teaming.  Essentially, port/link aggregation increases the bandwidth & resilience of Ethernet connections.[7]

Clustering

Clustering refers to the interconnection of servers in a way that makes them appear to the operating environment as a single system.  As such, the cluster draws on the power of all the servers to handle the demanding processing requirements of a broad range of technical applications.  It also takes advantage of parallel processing in program execution.  Shared resources in a cluster may include physical hardware devices such as disk drives and network cards.  TCP/IP addresses, entire applications, and databases.  The cluster service is a collection of software on each node that manages all cluster-specific activity, including traffic flow and load balancing.  The nodes are linked together by standard Ethernet, FDDI, ATM, or Fibre Channel connections.  The advantages of clustering are as follows:[8]

  • Performance:  Throughput & response time are improved by using a group of machines at the same time.
  • Availability:  If one node fails, the workload is redistributed among the other nodes for uninterrupted operation.
  • Incremental Growth:  Performance & availability can be enhanced by adding more nodes to the cluster.
  • Scaling:  Theoretically, there is no limit on the number of machines that can belong to the cluster.
  • Price & Performance:  The individual nodes of a cluster typically offer very good performance for their price.  Because clustering does not involve the addition of expensive high-performance processors, buses, or cooling systems, the cluster retains the price/performance advantage of its individual members.

Power Management

Power management is a feature of some electrical appliances, especially copiers, computers, CPUs, GPUs & computer peripherals such as monitors and printers, that turn off the power or switches the system to a low-power state when inactive.  In computing this is known as PC power management.  Power management for computer systems is desired for many reasons, such as:[9]

  • Reduction in overall energy consumption.
  • Prolonging the battery life for portable & embedded systems.
  • Reduction in cooling requirements.
  • Reduction in noise.
  • Reduction in operating costs for energy & cooling.

Battery Backups/UPS (Uninterruptible Power Supply)

An uninterruptible power supply (UPS) is an electrical apparatus that provides emergency power to a load when the input power source or mains power fails. An UPS is typically used to protect hardware such as computers, data centers, telecommunication equipment or other electrical equipment where an unexpected power disruption could cause injuries, fatalities, serious business disruptions or data loss.[10] An UPS differs from an auxiliary or emergency power system or standby generator in that it will provide near-instantaneous protection from input power interruptions by supplying energy stored in batteries.

Dual Power Supply

A dual power supply is a regular direct current power supply.  It can provide a positive as well as negative voltage.  It ensures stable power supply to the devices as well as it helps to prevent system damage.  Many electronic circuits require a source of DC power.  A dual power supply is used to power the electronic as well as electrical equipment.  The dual power supply provides positive as well as negative potential with the ground.  Mainly electronic circuits consist of tubes or transistors, require a DC power source.[11]

Redundant Circuits

Redundant circuits (network redundancy) is a process through which additional or alternate instances of network devices, equipment and communication mediums are installed within network infrastructure.  It is a method for ensuring network availability in case of a network device or path failure and unavailability.  As such, it provides a means of network failover.  Typically, network redundancy is achieved through the addition of alternate network paths, which are implemented through redundant standby routers and switches.  When the primary path is unavailable, the alternate path can be instantly deployed to ensure minimal downtime and continuity of network services.[12]

References