JustOneDB launched on Engine Yard

JustOne Database Inc is pleased to announce that JustOneDB is now available on Engine Yard as an Add-on service.

JustOneDB is an excellent relational database for enterprise level customers in the Cloud because of its ability to handle large data volumes and effortlessly adapt to changing requirements without requiring any specialist database skills – so the Engine Yard service is a natural fit for JustOneDB.

If you are an Engine Yard customer, follow these steps to access the JustOneDB Add-on:

1. Go to https://cloud.engineyard.com/addons (login required) or navigate to “Add-ons” in Engine Yard Cloud
2. Click on the “Details” button for JustOneDB
3. Sign up and follow the instructions for “Activate”, “Update Code” and “Deploy”

Further details for using the Engine Yard Add-on service are available here

Big is in the Eye of the Beholder

One of the really hot topics for 2012 is Big Data and every vendor seems keen to ride the Big Data wave. But what is big? According to Wikipedia, “Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.” In other words, Big Data is data that overwhelms a conventional database.

It is often perceived that Big Data is about unstructured data, but this is not so. Any data can be big – structured or unstructured. Indeed communications network operators have struggled with billions of daily structured network events and managed databases with multiple terabytes of structured data for decades now. This data was generated long before social networks arrived, so Big Data is nothing new, but the widespread adoption of the Internet has expanded the domain of the Big Data problem.

Currently, you can think of the Internet as connecting both places and people where those people initiate the vast majority of the data generated either directly or indirectly. The content of an unstructured document is typically written by a human, a web transaction is usually initiated by a human and most network events arise directly or indirectly from human activity on that network.

But the Internet is now hosting devices that autonomously create information about time, location, orientation, velocity and various other measurements as structured content. These are not just smartphones, these are sensors scattered and embedded everywhere to monitor and measure everything from energy use and health to asset tracking. This is the age of the “Internet of Things”.

Now consider that the Internet currently connects about 1 billion places and about 5 billion people. Yet it is estimated that over 50 billion devices will become connected by 2020 – each generating structured data at regular and frequent intervals. This machine-to-machine communications (M2M) market is widely regarded as a multi-trillion dollar opportunity and network operators are planning some fundamental changes to their network architecture to handle the deluge expected from  its exponential growth.

Not only is this device data more profuse than the data we see now, but it is mostly structured too. Big Data? As the song goes, you ain’t seen nothing yet.

Memento

Event sourcing is a design pattern for retaining an exhaustive history of events such that the state of the system now (or at some point in the past) can be derived by a replay of those events. The goal of event sourcing is to derive the current system state from the combination of past events with current application logic such that changes to logic are applied retrospectively and become implicitly reflected in the current system state.

Conceptually, event sourcing only requires a single chronological log to record every event about every entity and a primary key generator for creating primary keys for new entities. Furthermore, in extremis, application state need never be persisted since it can always be recreated – you can think of this as lazy evaluation of system state. You can find a thorough discussion of event sourcing here.

There may appear to be analogies here to audit logs and write-ahead logging in databases; but closer inspection shows that an audit log only provides a history of events for a known system state; and a write-ahead log provides a limited history of recent changes which is retained only until the whole system state can be reliably persisted.

In principle, a relational database could apply an event sourcing design pattern such that every definition command, every insert, every update and every delete statement is recorded as an autonomous event so that the state of a row is recreated by replaying all of the events related to it. Of course, it is time consuming and inefficient to reconstruct everything from first principles for every query and therefore databases retain current state to avoid the reconstruction costs. Indeed, reconstruction of system state typically only occurs at database start-up when system state is recovered from any pending write-ahead or recovery logs. Hopefully, a database doesn’t expect to change its internal processing logic that frequently, so any potential advantages are enormously outweighed by the disadvantages of a wholly lazy evaluation.

While more volatile applications may choose an event sourcing pattern to effect a robust delivery environment for rapidly evolving requirements, performance considerations will often dictate a hybrid approach whereby current system state is recreated by applying recent events (rather than all events) to recent system state (rather than an empty state).

We also have to be careful to understand what an event is. Simply recording a change in state (such as a new attribute value) is not really event sourcing; whereas recording the cause (such as a command) that gave rise to that change is. While the latter allows system state to be fully revised according to changes in application logic; the former only allows us to recreate state at any given point in time – yet  this is still very useful within a database for the purposes of auditing and analytics.

Much like Memento (the film), which offers a delightfully ambiguous interpretation of reality, event sourcing allows a new reality to be created simply through an alternative interpretation of events – but, thankfully, databases are expected to hold a more definitive view of their world.

Black and white

The constant background debate about relational versus NoSQL has become entrenched in dogma and misses many fundamental aspects of data management.  Fortunately, the notion that the difference between SQL and NoSQL has anything to do with SQL has already been thoroughly dispelled – but there still remains an active debate about the virtues or otherwise of schemas and ACID transactions.

Let’s deal with schemas first. A schema is just an abstraction of the data and is not a method of storage. The schema is simply there to provide a conceptual model for data access and exists as a logical layer above the storage structure in much the same way that a file system provides a logical abstraction of the physical storage system.

While the relational model was conceived to provide an intuitive framework for understanding real-world entities and a self-documenting structure for supporting queries, the database industry made the mistake of closely coupling that model with the storage organisation. This has resulted in the unnecessary polarisation between row stores and column stores and the horrible complexity and inertia typically experienced with traditional relational databases. Whereas the underlying database storage structure should be schema-less and the relational model should simply be an optional schema layered upon it.

By using a schema-less storage structure that efficiently resolves arbitrary access paths between stored elements it does not matter whether the logical access model is relational or not. Indeed, there is no reason why the relational schema cannot co-exist with other models in the same database at the same time.

ACID or not? Actually, there’s a false premise to that question. ACID is not a single property, it includes atomicity, durability and consistency and these are all orthogonal features which can be varied independently. Should a sequence of updates be applied atomically as a single change? Does durability of a change need to be guaranteed? How consistent should these changes be across the database infrastructure? These are not the same question and that last one actually requires more than a simple binary answer because consistency is really a spectrum of possibilities ranging from none, to eventual, to immediate.

In fact the requirements for atomicity, durability and consistency can vary by individual transaction. It is easy to imagine an application where a class of transactions that require durability coexist with a class of transactions that do not require it at all. Similarly for atomicity and consistency.  So it is not even adequate to configure these guarantees globally across the whole database – let alone hardwire them into the database infrastructure itself, which is the typical state of affairs.

While the relational model may choose to enforce global acidity, durability and immediate consistency for transaction – the database engine need not.

Even durability is not a simple yes or no.

Does committing a change to a disk make that change durable? Not if the disk fails. Is mirrored storage sufficient? How many mirrors are deemed safe? Is a single-site sufficient? How distant do these replicas need to be? One man’s durable is another man’s ephemeral.

Forget the arguments about choosing black or white – we need grey-scale.

Cover up

A traditional relational database falls into one of two camps – it is either a row store or a column store. These are both intuitive arrangements for storing a table structure, where either each row or each column is stored contiguously. The idea being that if you want to fetch a row, you will find it in one place or if you want to scan a column you will find all of the column values in one place.

Of course, the problem with both storage models is that they are overly simplistic and neither helps you find relevant rows in a selective OLTP query or eliminate irrelevant rows in a collective analytical query and the only way to make these storage models effective is to layer indexes on top or to throw more hardware at them.

Indexes are very familiar in a row store – but they also appear in column stores as row projections where multiple columns are stitched together to avoid the high cost of row re-construction; and more familiar index structures can also be found in some column stores too, for filtering columns and for joining between tables.

In its most general sense, any index can be considered to be a store of key and address pairs where both the key and the address may be stored either explicitly or implicitly for an index entry. So for example, in a key comparative structure (such as a B-Tree, LSM tree, fractal tree etc) both the key and the address are explicitly recorded in the index entry; whereas in a bit map, both the key and the address are implied from their position within the index. Naturally, a variety of possible index structures covers all four possible combinations  as shown in the table below (which is not an exhaustive list of structures by any means).

The advantage of an index with an implicit address is that the row address order will be preserved and scans across multiple indexes against the same table can be efficiently resolved to yield a combined result. Thus with bit map indexes, multiple indexes can be logically conjoined or disjoined for the same table without any need to sort/merge explicit addresses. With these types of indexes there is rarely any need for a composite key index because separate indexes can be easily combined into a single composite result.

An index with an explicit address will also typically retain the address order for a single and specific key value (since there is typically no advantage to changing the address order). Thus, with an inverted list we can easily merge the results from different lists; and similarly for a B-Tree, where we choose one particular key from each index we can merge the results directly because the address order is preserved.

However, when we attempt to scan an index with an explicit key and address for multiple keys, the addresses may no longer be in row address order and we have to either sort the addresses or create an ordered address map (such as a bit map) for each index to be able to merge addresses across indexes on the same table. Naturally, as a query covers more of the key domain, the more addresses must be mapped or sorted prior to merge and the less trivial this task becomes.

However, the important point here is that as soon as a query covers more than one key in an explicit key and address index, the overhead on index merging changes significantly.

Why is this important? Well, it is widely assumed, that an index built with composite keys (that is, multiple keys concatenated together) covers and removes the need for indexes that might be built on a subset of those keys. A classic example of this is building a compound B-Tree index on the department number and employee salary column in an employees table. This index might be used to find employees in a specific department in a certain salary range. Since we have already included the department number, this index can also be used to find all employees in a specific department regardless of their salary. Therefore, there is no need to build a separate department number index because the composite index will suffice. Right? Not quite. Whereas an index built solely on department number can be easily merged with an indexed built solely on employee gender to find all male employees in a specific department; the composite index requires the employees in the queried department to be sorted or address mapped before they can be merged with the gender index.

Worse still, while index scans that retain address order can be easily decomposed into an arbitrary number of address partitions across the address space with all partitions scanned and merged in parallel, this is no longer immediately possible if one or more the indexes has to be sorted or mapped first.

Therefore, the idea that a row store can be made analytical by overlaying covering indexes is only effective where those indexes precisely match the query requirements – but to cover all those requirements typically requires an awful lot of indexes.

Divide and ponder

Partitioning is an essential component to achieving scalability in a database server and there are several reasons for adopting partitioning.

One reason is to overcome the performance shortcomings of index structures and the following two partitioning  methods can be used to mitigate them

  • An index can be partitioned on time to reduce the active insert area of an index, such that a greater proportion of the active area can be cached than would be possible without partitioning. This can be used to alleviate poor insert and update performance in large indexes.
  • An index can be partitioned in line with queried keys to reduce the search area of an index, such that a query can eliminate index partitions based on key search criteria. This can be used to reduce query time in large indexes.

Clearly, it is unlikely that the different partitioning requirements above will be in alignment and therefore attempting to partition indexes for both fast inserts and fast queries typically creates conflicting partitioning requirements and these schemes require careful design to achieve an optimal partitioning arrangement.

In addition, the following partitioning methods can be used to achieve general scalability by allowing parallel processing and by easing data life-cycle management.

  • A table or index can be abundantly and equitably partitioned to allow arbitrarily parallel and independent operations against it.
  • A table or index can be partitioned by time divisions to facilitate life cycle management such that old and obsolete partitions can be dropped.

In contrast to partitioning used to rescue index performance, these schemes are mutually accommodating and do not conflict with each other; nor do they not require design decisions – data can be implicitly divided into arbitrarily granular time divisions for lifecycle management and parallel query operations can exploit those partitions at will.

However, while it is beneficial to partition most structures by time division for general scalability and life-cycle management, tree indexes (B-Trees etc) do not behave well in this respect. This is because an index partitioned orthogonally to the keys used in a query requires all of the partitions to be scanned for that query. While adding more partitions may reduce the size of each individual partition, the cost of scanning multiple partitions rises linearly with the number of partitions, yet the search depth of each partition reduces logarithmically with partition size. Hence the more partitions we add, the costlier the query for a tree index.

This increase in retrieval cost for a tree index partitioned orthogonally to a query key over a non-partitioned index is given by P(1-logN(P)) where P is the number of partitions and N is the total number of keys. We can see that for any non-trivial value of N, the retrieval cost effectively increases linearly with the number of partitions.

Of course, we can scan those tree index partitions in parallel, but the linear increase in overall cost means we gain nothing – it will take about the same time as it would to scan an unpartitioned index and yet we have to perform a lot more work and consume many more resources to achieve much the same result.

So, while a tree index will offer logarithmic scaling for specific queries, when general scalability requirements impose query independent partitioning upon it, that benefit is lost.

Gone in a flash

Flash storage has become mainstream. But is it here to stay? Probably not.

There is a whole raft of non-volatile memory technologies snapping at the heels of flash memory. These new technologies use different effects such as magnetic, resistive and material phase effects to store data rather than the charge effects used by flash memory. These new technologies include Phase Change Memory, Spin Torque Transfer Memory, Resistive Memory and Racetrack Memory and some of these are already in production.

The important thing about these new technologies is that they are all much faster than flash, they all offer greater density and have much lower power requirements than flash. In many cases, these new forms of memory promise orders of magnitude greater speed and density. In fact, we are likely to see non-volatile memory that is both faster and more dense than current DRAM.

In the longer term, these technologies have the potential to change the architecture of computing whereby storage is memory and is integrated with the CPU. It may be that memory becomes sufficiently dense and close to the CPU that caching becomes unnecessary. That makes CPU cores a whole lot simpler and removes the need for pre-fetching, branch predictions, pipelining or super-scaling. Simpler cores can mean faster cores and many more cores. We may be looking at an architecture of interconnected memory modules with integrated cores (and no storage).

In fact, recent research has proposed the idea that resistive memory can perform logic operations in addition to functioning as memory. That could create a very new and challenging architecture indeed.

Ultimately, we will probably need to forget about making software cache efficient (because there won’t be one) but we will need to ensure that software is embarrassingly parallel instead.

In the meantime, flash will have its decade and will become temporarily ubiquitous in mobile devices. What about enterprise storage? Can it replace hard disk in a decade? There is so much hard disk storage out there and new disk remains such a cheap option that flash can probably only become dominant as a storage cache to accelerate mechanical systems – before it is overtaken by one of these rapidly emerging technologies.

Flash storage. RIP.