Tag Archives: Logging

Memento

Event sourcing is a design pattern for retaining an exhaustive history of events such that the state of the system now (or at some point in the past) can be derived by a replay of those events. The goal of event sourcing is to derive the current system state from the combination of past events with current application logic such that changes to logic are applied retrospectively and become implicitly reflected in the current system state.

Conceptually, event sourcing only requires a single chronological log to record every event about every entity and a primary key generator for creating primary keys for new entities. Furthermore, in extremis, application state need never be persisted since it can always be recreated – you can think of this as lazy evaluation of system state. You can find a thorough discussion of event sourcing here.

There may appear to be analogies here to audit logs and write-ahead logging in databases; but closer inspection shows that an audit log only provides a history of events for a known system state; and a write-ahead log provides a limited history of recent changes which is retained only until the whole system state can be reliably persisted.

In principle, a relational database could apply an event sourcing design pattern such that every definition command, every insert, every update and every delete statement is recorded as an autonomous event so that the state of a row is recreated by replaying all of the events related to it. Of course, it is time consuming and inefficient to reconstruct everything from first principles for every query and therefore databases retain current state to avoid the reconstruction costs. Indeed, reconstruction of system state typically only occurs at database start-up when system state is recovered from any pending write-ahead or recovery logs. Hopefully, a database doesn’t expect to change its internal processing logic that frequently, so any potential advantages are enormously outweighed by the disadvantages of a wholly lazy evaluation.

While more volatile applications may choose an event sourcing pattern to effect a robust delivery environment for rapidly evolving requirements, performance considerations will often dictate a hybrid approach whereby current system state is recreated by applying recent events (rather than all events) to recent system state (rather than an empty state).

We also have to be careful to understand what an event is. Simply recording a change in state (such as a new attribute value) is not really event sourcing; whereas recording the cause (such as a command) that gave rise to that change is. While the latter allows system state to be fully revised according to changes in application logic; the former only allows us to recreate state at any given point in time – yet  this is still very useful within a database for the purposes of auditing and analytics.

Much like Memento (the film), which offers a delightfully ambiguous interpretation of reality, event sourcing allows a new reality to be created simply through an alternative interpretation of events – but, thankfully, databases are expected to hold a more definitive view of their world.

Tradition

There are two very different strategies for dealing with updates and deletes in a relational database. Update-in-place semantics will directly overwrite data in-situ with a new version of that data; while append semantics will simply append the new version to a chronological log and leave the old data untouched. The update-in-place approach was extensively used in early database architectures but more recent database systems tend to opt for append semantics. Let’s look at the reasons why that may be.

When we append updates rather than overwrite data we are able to keep the old and consistent version of the data while the update is in progress and this affords a robust recovery mechanism should the update fail. We are able to easily watermark the updates and we can choose to move that watermark only when we know that the updates have been completed and successfully persisted in storage. Whereas with update-in-place semantics we would have to write and retain the old version in a separate log before the update took place to be able to guarantee recovery from a failed update (assuming that a full recovery from a database backup copy is a little too rudimentary for most people).

The use of an append semantics allows multiple updates to be appended to the same area of storage rather than scattered at random locations across the database. This affords a huge performance benefit to updates because large numbers of small updates become coalesced into a smaller number of larger writes and this hugely reduces the load on fsync() operations – which are essential for durability in write back cache arrangements and for avoiding anomalies arising from out-of-order write behaviour in modern disk systems.

Even on flash storage this type of large append write pattern has huge benefits over small random writes because much less of an erase block is wasted with each write. With a small random update I/O, a flash device has to copy the entirety  of an updated block to a new empty location and remap the block’s location. Once a flash device fills, unused blocks must be explicitly erased before they can be re-used and on a busy device this garbage collection activity can impact performance even where it happens asynchronously ahead of time.

Append semantics show significant performance gains in other areas too, such as in transaction isolation. With update-in-place semantics either an explicit MVCC (multi-version consistency control) scheme with separate before-update images must be managed; or worse still, read activity must lock-out update activity on the same data to ensure consistent reads. While the use of read locks is a death knell to performance with concurrent users and is tantamount to eating small children in a public place; the overhead of managing before and after images also places a huge burden on throughput. However, none of this band aid architecture is needed with append semantics because versioning is implicit and watermarks demarcate the boundaries between transactions.

There’s also a slight deception to the term update-in-place. What happens when the version we are attempting to overwrite is smaller than the updated version? This usually leads to schemes such as row chaining where database rows are split to accommodate increasing row length. These chained row fragments may not even reside in the same storage block as the original row and this ultimately leads to further performance horror at query time.

Moreover, contemporary data management requirements commonly require a history of transactions to be retained for management, analysis and legal reasons. So why go out of your way to destroy that very history by overwriting it?

That all sounds great for append semantics, but are there any good reasons for choosing to update in place? The usual response here is that append semantics will require more storage space than update-in-place semantics because we are retaining a complete history of the updates. This is indeed true, but is it important?

On any very large database system dealing with machine generated or web data, the huge majority of data is derived from immutable inserts that never get updated and any data that can be updated (such as customer or account data) is insignificant by comparison. Besides, customer and account data is the very type of data we want to maintain a history for because these are the significant real-world entities on which the business depends.

At this point, someone will helpfully suggest that they might use a table which contains a single row which gets frequently updated for the sole purpose of maintaining a sequence number.  (Suddenly, a silence falls and a distant church bell can be heard to slowly toll…) Enough said.

Therefore, either the volume of updates is unlikely to significantly impact storage space on a large system where space is critical; or the system is too small for it to matter at today’s storage prices. Indeed, storage bandwidth is a lot costlier than storage capacity and update-in-place offers no benefit there at all.

Neither is query performance improved by update-in-place semantics since with append semantics, all versions of a row can be equally accessible and resolvable through whatever structures are use to locate an original row. Indeed query performance is likely to suffer more with update-in-place semantics because of row chaining.

So why do some database vendors continue to use update-in-place semantics? Good question.

A precedent embalms a principle. Benjamin Disraeli.