Purpose of a Database | CompTIA IT Fundamentals FC0-U61 | 5.1

In this video you will learn about the purpose of a database and along with database concepts such as:  usage of a database, flat file versus database, records, and storage.

Usage of Database

A database, also known as a databank or datastore and sometimes abbreviated as DB, is a large quantity of indexed digital information.  It can be searched, referenced, compared, changed or otherwise manipulated with optimal speed and minimal processing expense. Some of the most popular database management system (DBMS) apps are:

The following discusses four major usage features of databases: create, import/input, query, & reports.

Create

The first step in using a database is to create its structure.  During this process, developers identify the tables needed for the database, how the files will be related to each other, and the data that will be stored in each file.  For example, a company might create a branch table and a company section table. The tables can be related to each other by using common fields such as employee names and the section each employee works in. A table is a data structure that contains rows and columns.  A database can be made up of one or more tables.

Example of a database table

Import/Input

To populate the tables with data, you can use import/input.  Import brings data into the database from an existing file, such as a spreadsheet, a comma-separated values (CSV) file, or other delimited text file. A delimited text file has a header row that is used to name fields, rows of data that will become records, and characters such as commas, semicolons, or tabs between the fields.

CSV file example showing comma-separated values

Some options for importing data include using it to create a new table, appending the records to an existing table (suitable if the structure of the data you want to import matches the data already in the table), and creating a link to the data source. The other method is input, in which the user enters each data record.  Input can be used to populate an entire database, or it can be used to add additional records after data has been imported.  Most database apps enable the user to create a data entry form to make entering data easier.

Query

A query is a request for data or information from a database table or combination of tables.  This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, i.e., trend analysis from data-mining tools.  Queries can be performed in a variety of ways.

Reports

A database report is the formatted result of database queries and contains useful data for decision-making and analysis. A report can be built from a query, so it can provide a printed result of the query.  A report does not need to be printed, but it is print-ready. A report can group and sort data, hide details, and perform operations such as total, count, and average.  A report can also be exported as a data file.

Flat File vs. Database

A database management app can work with a single table or with two or more tables.  What’s the difference?

A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. A flat file database stores all information in a single table. This is suitable if all of the information you need to track is closely related. Flat file managers (also known as NoSQL apps) are also excellent choices for searching and querying non-structured and semi-structured data, such as web pages.

However, if you need to relate different types of information to each other, such as customers, purchases, suppliers, and shippers, using a relational database is a better choice.

A relational database is a collection of data items with pre-defined relationships between them. These items are organized as a set of tables with columns and rows. Tables are used to hold information about the objects to be represented in the database. Each column in a table holds a certain kind of data and a field stores the actual value of an attribute. The rows in the table represent a collection of related values of one object or entity. Each row in a table could be marked with a unique identifier called a primary key, and rows among multiple tables can be made related using foreign keys. This data can be accessed in many different ways without reorganizing the database tables themselves.

Multiple Concurrent Users

Databases that will be used by more than one user at a time (multiple concurrent users) need to have some type of locking mechanism to prevent data corruption.  Locks can happen at the record (row) level or at the table level. The type of locking available depends on the table type (also known as a storage engine type).  A table type that supports saving or completely rolling back a transaction is referred to as transaction-safe. Some table types are optimized for speed and size, but are not transaction-safe and support only table locking.  A table type such as InnoDB (now the standard type in MySQL) allows row locking and also supports transactions.

Scalability

Database scalability is the ability of a database to handle changing demands by adding/removing resources such as increasing numbers of records, users, and transactions.  The ability to handle a greater number of records can vary according to the database engine (table) type used by a database app. A second major consideration is the number of transactions a database can handle.  With the rise of online shopping and social networking, scalability also means the ability to keep up with more users and more transactions.  Scalability can be improved by increasing the performance of the server running a database, by subdividing the database’s tables among multiple physical or virtual servers, and by choosing the right database engine for the task.

Speed

Non-relational databases are generally faster than relational databases when many transactions and lookups are required.  That’s because a non-relational database stores everything in a single table and does not need to connect tables to each other.  However, there are many ways to speed up a relational database, including the following:

  • Creating a database structure that is optimized for speed
  • Limiting the number of results you request in a single query
  • Avoiding looping logic in queries
  • Adding caching support to your server to lessen the load
  • Using data normalization (not mixing different types of data, such as alphanumeric and numeric data, in a single table column)
  • Properly using logical and primary keys

Variety of Data

The phrase “variety of data” is shorthand for the many different types of data that might be stored in a database:  tables, photos, videos, text, etc. Variety is one of the three factors (3Vs) in “big data”, with the others being velocity and volume. Velocity refers to how quickly data can be accessed, updated, and analyzed.  Volume refers to the amount of data stored.

To deal with different types of data, it is often necessary to use different types of database and support apps.  Another consideration for data that is in a relational database is how to make complex relationships work. Tables can be related to other tables in one of three ways:

  • One-to-One:  One record in table A is related to one record in table B.
  • One-to-Many:  One record in table A is related to two or more records in table B.
  • Many-to-Many:  Two or more records in table A are related to two or more records in table B.

Records

A database record is composed of fields, each of which contains one item of information. A set of records constitutes a file. For example, a personnel file might contain records that have three fields: a name field, an address field, and a phone number field. More specifically, a record is a grouping of fields within a table that reference one particular object.  The term record is frequently used synonymously with row. A record is also known as a tuple.

Example of a database record

Storage

To help determine how best to store and protect data, many organizations have a database administrator (DBA).  A database administrator is a specialized computer systems administrator who maintains a successful database environment by directing or performing all related activities to keep the data secure.

There are a variety of ways to store data used in databases.  Conventional servers are suitable for relatively small amounts of data that will be accessed over a local area network.  However, databases that may be many terabytes in size should be accessed through a more flexible and faster method such as a storage area network (SAN).  Unlike a shared drive on a server or a network-attached storage (NAS) device, a SAN is accessed as if it were a local drive by a server but uses fast fiber optic connections.  Cloud storage (a type of cloud computing) is suitable for databases that will be accessed by remote users.  Amazon, Microsoft, and other vendors offer dedicated cloud database solutions. For data that will be accessed only occasionally or is being retained as an archive, tape libraries continue to be a suitable solution.

ACID

An important acronym to help you understand how databases work is ACID (Atomicity, Consistency, Isolation, Durability).

  • Atomicity:  A database follows the all or nothing rule, i.e., the database considers all transaction operations as one whole unit or atom.  Thus, when a database processes a transaction, it is either fully completed or not executed at all.
  • Consistency:  Ensures that only valid data following all rules and constraints is written in the database.  When a transaction results in invaled data, the database reverts to its previous state, which abides by all customary rules and constraints.
  • Isolation:  Ensures that transactions are securely and independently processed at the same time without interference, but it does not ensure the order of transactions.  For example, user A withdraws $100 and user B withdraws $250 from user Z’s account, which has a balance of $1000. Since both A and B draw from Z’s account, one of the users is required to wait until the other user transaction is completed, avoiding inconsistent data.  If B is required to wait, then B must wait until A’s transaction is completed, and Z’s account balance changes to $900. Now, B can withdraw $250 from this $900 balance.
  • Durability:  In the above example, user B may withdraw $100 only after user A’s transaction is completed and is updated in the database.  If the system fails before A’s transaction is logged in the database, A cannot withdraw any money, and Z’s account returns to its previous consistent state.

Data Persistence

Persistent data denotes information that is infrequently accessed and not likely to be modified even after the process that created it ceases or the machine it is running on is powered off.  When an object or state is created and needs to be persistent, it is saved in a non-volatile storage location, like a hard drive, versus a temporary file or volatile random access memory (RAM).  Another example is when a database that has been updated with new transactions is not persistent until it has been stored to disk. The next highest level is found in databases that store the data on disk at all times but allow the data to be updated in place. Databases that run in memory and create periodic saves to disk (snapshots) have data persistence only after the data has been committed to disk.