SQL for Beginners Part 2

Hi! Welcome to my blog series! If this is your first time here, the general idea is to explore many facets of programming from the beginner's perspective. This week, we're building off what we learned last week, which was a beginner's guide to SQL!

Today we're going to touch on NoSQL database models, a pretty broad class of database management systems. NoSQL is a fairly new movement designed to facilitate the manipulation and access of huge quantities of data. Traditional SQL-based models generally fall short in that area, so we'll see how NoSQL does its thing and the tradeoffs involved.

So first, how does NoSQL differ from the traditional relational models? The old models aimed to meet a set of criteria called "ACid" to guarantee reliable performance. Because it's programming, it only makes sense that ACid is an acronym. So what's it stand for?

A – Atomicity

If part of a transaction (an operation that changes the database) fails, then the whole thing should fail. In other words, the "all or nothing" principle. This is important, because if the first part of a transaction fails, then we should be able to run it again without having duplicate or partial data.

C – Consistency

Transactions that begin with the database in a valid state leave the database in a valid state. Any data written to the database must conform to the database's rules.

I – Isolation

If transactions are executed concurrently (in parallel), then the end result should be the same as if they were run in series.

D – Durability

If a transaction finishes successfully, then its results should be stored permanently, even in the case of power loss and other failures/errors.

As always, Wikipedia has more information, plus a pretty excellent example of a violation of each principle. In short, ACid-compliant databases are rigid and sturdy. If a transaction completes, the database is good to go, and there are backup measures in case the transaction fails. ACidity is the primary benefit of SQL-based database management systems; each component of ACid requires its own overhead.

NoSQL databases are not always ACid-compliant. The overhead required to ensure each ACid property, particularly consistency, is instead spent on ensuring the database can handle a large amount of data at a reasonable speed. Most NoSQL systems instead ensure "eventual consistency," which means that at any moment in time, the database may or may not be in a valid state. With enough time, the changes made by all the transactions will propagate across the database's replicas.

Replicas? That's right. Most NoSQL systems utilize a distributed architecture for "performance reasons and memory limitations." Data is duplicated across several different computers or servers (or shards, but that's a complicated topic for another time). This use of parallel computing allows NoSQL systems to be easily scaled, allowing the system to handle a larger amount of data at about the same speed.

If it seems like this whole business of duplication and replicas violates most relational database principles, it is. A relational database management system's primary goal is to normalize its data – this means no data repetition or duplication. Newer, NoSQL data stores occasionally abandon this model because it's potentially faster to retrieve the data needed at any given time. Having all the data linked is good for normalization, but it takes longer to retrieve the necessary information on larger data sets.

Phew, that was a wordy post. In the next post, we'll cover a very basic example of an application backed by a NoSQL database, so that'll be exciting! See you then!