3 Storage Types: Block vs File vs Object
Just about everyone today uses a computer or smartphone on a daily basis. In the process, we are using one or more methods to store data, usually without thinking about how it is being stored. This post will explain briefly what the three main different types of data storage are and how they are typically used.
What Does Data Look Like?
For simplicity in this article, the word computer can be taken to mean any digital device that has the capability to store information. This can include PCs, laptops, servers, tablets, smart phones & watches, TVs, modern automobiles, and maybe even your refrigerator. So, what are they doing with that information?
Computers function at the lowest level using electronic or optic signals that either represent a "1" or a "0", which is the source of the term digital. Programs or applications execute code that consists of ones and zeroes, and information is stored in the same manner.
Magnetic storage such as hard drives, floppy drives, and tape, use a magnetic charge to indicate whether a one or a zero is being stored or read back, each position being called a bit.
Solid State disks are basically silicon microelectronic chips and use a somewhat different method to achieve this, but the end result is still a one or a zero. If you could actually see the microscopic insides of these devices, what you would see would be a continuous stream of ones and zeroes.
It is a very ingenious strategy and has served us well for many years. But how does a computer make sense out of all these bits?
#1 Blocks: Why Hard Drives and SSD Store Data in Blocks
As one can probably imagine, keeping track of where all those bits are and what they mean is a rather large and important task. Methods were developed as early as the 1960s to track where these bits were placed on the types of magnetic media mentioned previously, and to make it simple to retrieve them when needed.
Without getting into the physical track and sector layouts of different types of media, computer operating systems use the concept of a block of bits or bytes (8 bits of data) to divide and track chunks of data. The software that controls access to the storage devices takes care of the abstraction of the blocks of data the operating system is dealing with, and the physical storage of the blocks. This eases the burden on operating systems to make use of various data storage devices.
Sharing Block Storage Over Networks
Block data methods are in use on most computer operating systems, and as mentioned earlier this includes smart phones, tablets, etc. With servers, access to block storage has evolved to use SAN (Storage Area Network) storage to share expensive Storage Array capacity among groups of servers
Using a block storage sharing method, storage presented to a server over a SAN looks very similar to a locally-attached hard drive, and can be utilized in a similar fashion. In the next section, we'll take a closer look at how those blocks are actually put to use.
#2 Files: What Really is a File?
Human computer users are most often interested in pieces of data called files. A file is one or more blocks of data that the operating system or more accurately, the file system in use by the operating system defines with certain parameters that the operating system and applications can use. The name originally came from a comparison with the manila folders used to hold and organize paper documents, that are usually referred to as files in the business world.
Different file systems and operating systems use different methods to define what is in these files, what applications can open them, what users have permission to view or edit them, and so on. The main point here is that a file is a group of one or more blocks of bits/bytes that represent a distinct unit of data.
Files are usually organized into what are either called "directories" or "folders" (another analogy referring to a type of hanging pockets that can hold one or more of the manila folders mentioned earlier). File systems are designed to manage and keep track of this organization so that data can be easily stored and retrieved.
#2 Objects: What is Object Storage?
Object storage is a more recent method of data storage than block or file, with early abstract design work possibly starting in the late 1980s and more concrete development in the mid- to late-1990s. The primary driving forces in this development were scalability in both capacity and performance. The rapidly increasing amounts of data being created and stored were beginning to stress the limits of both the computer systems and the storage devices.
There are a variety of object storage implementations, but the basic premise of these is the use of methods of separating or abstracting the connection between the description of the data such as file names, tags, and other properties (metadata or information about the data), and the actual bits and bytes being stored.
The key benefits of using an object storage method are that the low level details of the formatting of block storage into volumes and file systems has been removed from the administrator or end user, and that the work of saving or retrieving a file has been moved from the end user's computer or server typically to a device or appliance called a gateway.
The most prominent use of object storage is found with "Cloud" providers, such as Amazon, Google, or Microsoft Azure. These providers still use block storage such as hard drives or SSDs in their data centers, but the customer is being given access to that storage through a gateway or object storage method that isolates them from the underlying physical hardware. In other words, a customer using a cloud provider's object storage never knows or needs to know what kind of devices their data is being stored on or where; they only need to know that the provider is giving them the agreed upon level of capacity, performance, and data protection.
Block vs File vs Object: 3 File Sharing Methods
As networks evolved and collaborative use of files became more feasible, methods were developed to share files and whole file systems over networks. There have been, and still are, a variety of computer networks, but the most common today is Transmission Control Protocol/Internet Protocol (TCP/IP). This article will only briefly discuss several of the file sharing protocols that run on TCP/IP networks.
Computers running Microsoft Windows typically use a protocol called "Server Message Block" (SMB) to share files with other computers. Linux and Unix computers more commonly use "Network File System" (NFS). However, it is possible for these protocols (and others not mentioned here) to be used with these and other operating systems such as Mac OS.
The primary difference between block and file storage, therefore, is that block storage is basically "raw" chunks of data space, while file storage is a method of storing collections of data in units of data called "files", typically organized into directories or folders. Locally defined file systems are, for example, the "C: drive" in Microsoft Windows, and other "drive letters" assigned to other volumes. The use of the word "drive" is a holdover from the days when the C: drive would have represented an entire physical hard drive, but as hard drive sizes have grown, a more common scenario is that a hard drive (again, block storage), is divided or "partitioned" into multiple volumes, which the operating system has created file systems on and assigned drive letters to for addressing.
Network-Attached Storage (NAS) is the term used to describe most methods of sharing file systems over networks. These can be made visible to the end user in a variety of ways, the most common being the mapping of a shared NAS file system being "mounted" to a drive letter in Microsoft Windows. In most cases, the appearance of these file systems and the files within them is identical to a local file system.
Block vs File vs Object: Which Data Type Should You Use?
Today, very few computer users or even data center server administrators want to be concerned about block storage considerations, such as whether their data is being stored on a hard drive or SSD, a storage array over a SAN, or maybe even through an object storage gateway in the Cloud. As long as the performance, protection, privacy, availability, and other expectations are being met, the low level details of data storage are minutiae that are unnecessary information.
The answer to the question posed by the heading for this final section is, that in many cases you may not have a choice, but you will be using two or three of these types of storage by default. In one example, as a laptop or PC user, you are already using block storage "under the hood", you just don't see it. Your drive/SSD controller electronics, firmware, and software drivers are doing the heavy lifting, and your operating system's file system of choice is managing the files and folders/directories.
However, in the case of the use of a cloud storage gateway, or especially if you are managing "hosted" virtual computers in a cloud provider's environment, both the low level block storage and the abstraction to objects that are separate from but pointed to by their metadata, are transparent to you but you are still using them. There are systems available that will enable the use of object storage as part of an on-premise solution, so the decision is really whether the benefits of such systems justify the additional cost. The sales of such systems indicate that the answer to that question is that they often do.