In this video you will learn about business continuity & disaster recovery concepts such as: fault tolerance, high availability, load balancing, NIC teaming, port aggregation, clustering, power management, battery backups/UPS, dual power supplies, & redundant circuits.
Fault Tolerance
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability or life-critical systems. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails.[1] The term is most commonly used to describe computer systems designed to continue more or less full operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial failure. That is, the system as a whole is not stopped due to problems either in the hardware or the software.[2]
High Availability
High availability means that an IT system, component, or application can operate at a high level, continuously, without intervention, for a given time period. High-availability infrastructure is configured to deliver quality performance and handle different loads and failures with minimal or zero downtime.[3] There are three principles of systems design in reliability engineering which can help achieve high availability:
Load Balancing
Load balancing refers to the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.[4] A load balancer may be:[5]
NIC Teaming
Network Interface Card (NIC) teaming allows you to combine multiple physical network adapters into a virtual NIC, which will then be presented to the operating system (OS) as a single NIC. All of the traffic being sent from the OS will pass through the virtual NIC and be load-balanced across the assigned physical network connections. The benefits of NIC teaming stem from its load-balancing capabilities. Efficiently breaking up network traffic across multiple connections better utilizes network resources, ensures the availability of servers and increases bandwidth. It also helps simplify network configuration.[6]
Port Aggregation
Port aggregation (or link aggregation) is the combining (aggregating) of multiple network connections in parallel by any of several methods, in order to increase throughput beyond what a single connection could sustain, to provide redundancy in case one of the links should fail, or both. A link aggregation group (LAG) is the combined collection of physical ports. Other umbrella terms used to describe the concept include: trunking, bundling, channeling, or teaming. Essentially, port/link aggregation increases the bandwidth & resilience of Ethernet connections.[7]
Clustering
Clustering refers to the interconnection of servers in a way that makes them appear to the operating environment as a single system. As such, the cluster draws on the power of all the servers to handle the demanding processing requirements of a broad range of technical applications. It also takes advantage of parallel processing in program execution. Shared resources in a cluster may include physical hardware devices such as disk drives and network cards. TCP/IP addresses, entire applications, and databases. The cluster service is a collection of software on each node that manages all cluster-specific activity, including traffic flow and load balancing. The nodes are linked together by standard Ethernet, FDDI, ATM, or Fibre Channel connections. The advantages of clustering are as follows:[8]
Power Management
Power management is a feature of some electrical appliances, especially copiers, computers, CPUs, GPUs & computer peripherals such as monitors and printers, that turn off the power or switches the system to a low-power state when inactive. In computing this is known as PC power management. Power management for computer systems is desired for many reasons, such as:[9]
Battery Backups/UPS (Uninterruptible Power Supply)
An uninterruptible power supply (UPS) is an electrical apparatus that provides emergency power to a load when the input power source or mains power fails. An UPS is typically used to protect hardware such as computers, data centers, telecommunication equipment or other electrical equipment where an unexpected power disruption could cause injuries, fatalities, serious business disruptions or data loss.[10] An UPS differs from an auxiliary or emergency power system or standby generator in that it will provide near-instantaneous protection from input power interruptions by supplying energy stored in batteries.
Dual Power Supply
A dual power supply is a regular direct current power supply. It can provide a positive as well as negative voltage. It ensures stable power supply to the devices as well as it helps to prevent system damage. Many electronic circuits require a source of DC power. A dual power supply is used to power the electronic as well as electrical equipment. The dual power supply provides positive as well as negative potential with the ground. Mainly electronic circuits consist of tubes or transistors, require a DC power source.[11]
Redundant Circuits
Redundant circuits (network redundancy) is a process through which additional or alternate instances of network devices, equipment and communication mediums are installed within network infrastructure. It is a method for ensuring network availability in case of a network device or path failure and unavailability. As such, it provides a means of network failover. Typically, network redundancy is achieved through the addition of alternate network paths, which are implemented through redundant standby routers and switches. When the primary path is unavailable, the alternate path can be instantly deployed to ensure minimal downtime and continuity of network services.[12]
References