As our daily dependence on applications grows, our expectations for those applications also grow. We want applications to be always up, bug free, easy to use, secure, and have high performance.
It’s a simple relationship: data performance powers application performance, which in turn powers business performance. And just like there are a growing number of applications that process data, there are an equally growing number of ways to store the data. How you store and retrieve that data matters.
In order to get peak performance, it is important to understand the differences between storage engines. Using one of these algorithms can affect the way your queries perform. This article will discuss data storage algorithms and why you should care about how they operate.
Data Storage and Retrieval
First, let’s talk about how we interact with data. There are two primary data actions: store it, and later, retrieve it. Above that, we apply some structure to the data. There are mainly two ways of doing this:
- A relational database management system (RDBMS), or "SQL data."
- A non-relational database, or "NoSQL data."
While data can be stored in many different ways, we need to effectively organize the data in order to search for it and access it. In the case of SQL and NoSQL, both solutions build special data structures called "indexes." The data structure chosen often helps to determine the performance characteristics of the store and retrieve commands.
B-Trees
A traditional and widely-used data structure is called "B-tree." B-tree structures are a standard part of computer science textbooks and are used in most (if not all) RDBMS products.
B-tree data structures' performance characteristics are well understood. In general, all operations perform well when the data size fits into the available memory. (By memory, I mean the amount of RAM accessible by the RDBMS in either the physical server or virtual server.) This memory limit is usually a hard restriction. Below is a general chart I like to use to demonstrate B-tree performance characteristics.
This chart clearly illustrates a couple of points:
- As soon as the data size exceeds available memory, performance drops rapidly.
- Choosing a flash-based storage helps performance, but only to a certain extent—memory limits still cause performance to suffer.