The constant background debate about relational versus NoSQL has become entrenched in dogma and misses many fundamental aspects of data management. Fortunately, the notion that the difference between SQL and NoSQL has anything to do with SQL has already been thoroughly dispelled – but there still remains an active debate about the virtues or otherwise of schemas and ACID transactions.
Let’s deal with schemas first. A schema is just an abstraction of the data and is not a method of storage. The schema is simply there to provide a conceptual model for data access and exists as a logical layer above the storage structure in much the same way that a file system provides a logical abstraction of the physical storage system.
While the relational model was conceived to provide an intuitive framework for understanding real-world entities and a self-documenting structure for supporting queries, the database industry made the mistake of closely coupling that model with the storage organisation. This has resulted in the unnecessary polarisation between row stores and column stores and the horrible complexity and inertia typically experienced with traditional relational databases. Whereas the underlying database storage structure should be schema-less and the relational model should simply be an optional schema layered upon it.
By using a schema-less storage structure that efficiently resolves arbitrary access paths between stored elements it does not matter whether the logical access model is relational or not. Indeed, there is no reason why the relational schema cannot co-exist with other models in the same database at the same time.
ACID or not? Actually, there’s a false premise to that question. ACID is not a single property, it includes atomicity, durability and consistency and these are all orthogonal features which can be varied independently. Should a sequence of updates be applied atomically as a single change? Does durability of a change need to be guaranteed? How consistent should these changes be across the database infrastructure? These are not the same question and that last one actually requires more than a simple binary answer because consistency is really a spectrum of possibilities ranging from none, to eventual, to immediate.
In fact the requirements for atomicity, durability and consistency can vary by individual transaction. It is easy to imagine an application where a class of transactions that require durability coexist with a class of transactions that do not require it at all. Similarly for atomicity and consistency. So it is not even adequate to configure these guarantees globally across the whole database – let alone hardwire them into the database infrastructure itself, which is the typical state of affairs.
While the relational model may choose to enforce global acidity, durability and immediate consistency for transaction – the database engine need not.
Even durability is not a simple yes or no.
Does committing a change to a disk make that change durable? Not if the disk fails. Is mirrored storage sufficient? How many mirrors are deemed safe? Is a single-site sufficient? How distant do these replicas need to be? One man’s durable is another man’s ephemeral.
Forget the arguments about choosing black or white – we need grey-scale.