2 Storage Hardware

Many different storage hardware designs have existed, and new designs enter the market every few years. As of 2024, three designs are common enough to deserve discussion.

2.1 Solid State Drive (SSD)

Solid state drives, like processors and memories, have no physically moving parts, instead conveying information purely through electrical signals. They typically have some circuitry for accessing elements of an array, where those elements are have some type of two-state matter to store data. A low-voltage signal applied to that matter will reveal what state it is in; a high-voltage signal will change its voltage. Thus, reading is significantly more efficient than writing.

Flash, one of the most common SSD technologies, actually has three modes. Erase is a slow operation that sets all bits in a block of the SSD to have the same value: all 1s for NOR flash, all 0s for NAND flash. Write changes an erased bit to the other bit and is much faster than erase. Read checks the value of a bit and is faster than write. Erasing causes wear on the device; after a few thousand erasures the block cannot be fully erased again. Reading also causes a little wear on bits near the bit being read: after a few hundred thousand reads of a bit, neighboring bits will be written. The device is also susceptible to extreme temperatures and X-ray radiation.

Common SSD speeds include

10K–100K operations per second (read or write a block of data).
2GiB–20GiB per second read throughput if reading large contiguous blocks of data.
Writes are a few times slower than reads.
$30–$200 per TiB capacity, or more for smaller form factors like SSID cards and thumb drives
Hardware API that pretends to be a read/write array but actually replaces written-to blocks with fresh copies from a store of erased blocks.

2.2 Hard Disk Drive (HDD)

Hard disk drives consist of rigid platters spinning on a central spindle coated in a magnetized material accessed by a swinging arm with electromagnets at the tip. These either detect the magnetic pole of the part of the disk spinning beneath them (read) or change it by applying a strong magnetic field to them (write). Data access much more rapid if the read arm stays put and reads the stream of data spinning past (sequential read) than if it needs to move the arm and/or wait for the right part of the platter to spin into place (seek time).

Because the disk has moving parts, it can experience various mechanical failures: wind turbulence between the mostly-stationary arm and the rapidly-moving platter, beam stress from sudden accelerations such as being bumped or dropped, gyroscopic forces if twisted sharply while the platter is spinning, and wear on the bearings and motors. In the worst case, these forces can cause the arm to physically collide with and scratch the platter, an event called crashing or head crash which can render the disk unusuable. Most HDD contain a variety of internal sensors that stop the platter if they detect something odd.

HDD speeds are the same for reading and for writing, but vary significantly depending on how the data is arranged on the platter surface. Common HDD speeds include

100MiB–300MiB per second read/write for sequential access.
200–1K random seeks per second.
$10–$40 per TiB capacity
Hardware API that exposes the 2D nature of the platter, making the disk resemble a 2D array.

2.3 Magnetic Tape

Magnetic tape uses the same storage concept as HDD, but instead of limiting the surface to a carefully engineered platter spinning at high speed the magnetic surface is glued to a flexible plastic ribbon and spooled at much lower speeds between two reels with the read/write head in between.

Magnetic tape is much slower than an HDD, has a limited number of times it can be used because of the physical wear on the ribbon and glue, but is typically engineered to have a very long shelf-life if not used too often. Tape is the dominant form of long-term backup storage: data is written to tape, then stored in a closet or locked in a vault and only accessed again if restoration from backup is needed.

I have not found many sources of the speed of magnetic tape, but it tends to run $2–$10 per TiB capacity.

3 File Systems

File systems are a pervasive abstraction around the raw array-like interface provided by storage hardware. They are typically presented as a tree: interior nodes are called folders or directories and leaf nodes are called files.

Interior nodes are managed by the file system and typically contain a set of contained files and folders. Leaf nodes are managed by the user and can contain any bytes you wish. Both are also accompanied by some file-system-managed metadata, such as the node’s name, which user account owns the node, what operations are permitted on the node, and when the node was last modified.

Files are identified by in the file system using a path. Two types of paths are supported. An absolute path lists the names of all nodes from the root of the tree to the file in question. A relative path lists how to reach the file from a given reference node, using the special name .. to mean go to this node’s parent. Windows traditionally uses \ to separate names in the path; all other systems, including the web, use / instead.

On any non-Windows computer, absolute paths begin with a / while relative paths do not.

Example absolute paths:

/ – the root directory
/home/cs340user/games/ – another directory
/home/cs340user/classes/cs340/mp7/app.py – a file
/tmp/deleteme.txt – another file

We can make a relative path from /home/cs340user/games/ to /home/cs340user/classes/cs340/mp7/app.py as

../classes/cs340/mp7/app.py
../../cs340user/classes/cs340/mp7/app.py
../classes/cs340/../cs340/mp7/app.py

Some file systems allow for a directed graph, not just a tree, by allowing the exact same file or folder to appear as a child of multiple folders. This feature is called a hard link and is relatively uncommon in practice. More common is a soft link or shortcut: a special type of file-system-managed file that stores the path of the actual node.

Most file systems offer some limited synchronization options: typically some form of optional reader-writer lock, a system call that only opens a file if it does not yet exist, and an append-only mode that guarantees that concurrent writes to the file never overlap one another. While these tools an be used to achieve many sychronization results, using them effectively requires some care and often programs use files with no synchronization, relying on the user to not request conflicting concurrent operations.

4 Blob/Object stores

A database provides a more-synchronized way of storing data on top of a file system. There are many types of databases, but many of the largest-scale and lightest-weight databases act like a set of blobs or objects.

A blob is the common name for uninterpreted binary data; file systems typically treat files like blobs. An object (also called a document) is data with a known but unconstrained structure, like a JSON or XML.

Stores that are oriented around blobs of objects typically offer just one form of synchronization: each blob or object update is atomic, meaning that if concurrent updates are sent to the store one happens in its entirety before the other begins. They generally do not offer inter-blob synchronization nor part-of-blob edits.

4.1 Key-value store

A key-value store has a dict-like interface: you provide a unique key string and any blob value you wish and that blob is stored associated with that key. You can access or replace the blob by using its key, but most key-value stores do not provide part-of-blob access.

Redis is a popular network-oriented key-value store, though it does support some looking inside values so they aren’t quite treated like blobs. Key-value stores are also pervasive within a single computer, with tools like dbm being part of many applications. The zip archive format is also a type of key-value store, though it is rarely discussed as such.

Some key-value stores add a tree structure to the keys, much like a file system does with its paths, becoming a hierarchical data store. These are commonly used to store preferences and configuration information, such as is done by dconf and the Windows Registry.

4.2 Object stores

Object stores have a set-like interface: you can add objects to the store with no specific key or path required. To retrieve objects, a query where the query provides constraints on which objects should be returned. These queries require defining a query language and often require the implementation to store various fast-lookup indices to speed up query execution.

MongoDB is a popular network-oriented object store; CouchDB, DynamoDB, and the databases built in to various search engine tools like elasticsearch and solr are also object stores.

5 ACID

Many databases support transactions, meaning a several accesses and updates grouped together, and are engineered to provide those transactions with the ACID properties:

Atomicity: Each transaction happens as if they were a single operation with no concurrent operations overlapping with them.
Consistency: The database is always in a consistent state, with no dangling pointers or contradictory data.
Isolation: One transaction cannot reveal anything about the operation of concurrent transactions.
Durability: Data is stored in such a way that power loss, even unexpected power loss mid-transaction, leaves the results of all completed transactions stored in a way that can be fully recovered when power is restored.

These properties are a great boon to developers, allowing them to ignore most challenges related to concurrency and synchronization. To achieve them, database engines impose various restrictions on the data being stored.

By far the most common set of restrictions are those of the the relational databases: data consists of a set of tables, with records as rows and known types in each column; interactions are moderate by a declarative language called Structured Query Language or SQL¹¹ SQL is sometimes pronounced like sequel. SQLite is the most pervasive such system but usually used in small single-computer systems; MySQL and Postgres are the most pervasive for large many-computer systems. Relational databases and SQL are topics large enough deserve (and have) their own courses.

Some of the more mature object store and key-value store databases have started to add ACID transactions, though they are no known for that property and are rarely as optimized for it as relational databases.

6 Others

There is no intrinsic bound on the number of designs of databases one might have. A few examples of other databases include:

Graph databases like Neo4j are somewhat like object stores with pointers; this design permits graph-oriented queries with languages like Gremlin.
Column stores are a variant of relational databases optimized for a different system design.
Append-only databases write data but never read it and are particularly useful for applications requiring transparent audits. Replay logs are a simple form of append-only database that lists every action that occurs in the order in which it occurs, allowing a system to restore any given state by re-running a prefix of the log. Blockchain is class of distributed append-only databases.
Object-oriented databases move object-oriented programming concepts such as objects, classes, pointers, and inheritance from transient memory to durable storage, with various adjustments to better meet the constraints of that interface.
Multi-modal databases like OrientDB and ArangoDB use a single implementation to provide multiple disjoint use-cases, often with several different query languages and various tricks or limitations to keep the different modalities operating together safely.
Successful databases often add features over time, such Redis starting as a simple key-value store but gradually adding support for more structured values.

1 Transient Storage