Tag Archives: Flash

Gone in a flash

Flash storage has become mainstream. But is it here to stay? Probably not.

There is a whole raft of non-volatile memory technologies snapping at the heels of flash memory. These new technologies use different effects such as magnetic, resistive and material phase effects to store data rather than the charge effects used by flash memory. These new technologies include Phase Change Memory, Spin Torque Transfer Memory, Resistive Memory and Racetrack Memory and some of these are already in production.

The important thing about these new technologies is that they are all much faster than flash, they all offer greater density and have much lower power requirements than flash. In many cases, these new forms of memory promise orders of magnitude greater speed and density. In fact, we are likely to see non-volatile memory that is both faster and more dense than current DRAM.

In the longer term, these technologies have the potential to change the architecture of computing whereby storage is memory and is integrated with the CPU. It may be that memory becomes sufficiently dense and close to the CPU that caching becomes unnecessary. That makes CPU cores a whole lot simpler and removes the need for pre-fetching, branch predictions, pipelining or super-scaling. Simpler cores can mean faster cores and many more cores. We may be looking at an architecture of interconnected memory modules with integrated cores (and no storage).

In fact, recent research has proposed the idea that resistive memory can perform logic operations in addition to functioning as memory. That could create a very new and challenging architecture indeed.

Ultimately, we will probably need to forget about making software cache efficient (because there won’t be one) but we will need to ensure that software is embarrassingly parallel instead.

In the meantime, flash will have its decade and will become temporarily ubiquitous in mobile devices. What about enterprise storage? Can it replace hard disk in a decade? There is so much hard disk storage out there and new disk remains such a cheap option that flash can probably only become dominant as a storage cache to accelerate mechanical systems – before it is overtaken by one of these rapidly emerging technologies.

Flash storage. RIP.

Tradition

There are two very different strategies for dealing with updates and deletes in a relational database. Update-in-place semantics will directly overwrite data in-situ with a new version of that data; while append semantics will simply append the new version to a chronological log and leave the old data untouched. The update-in-place approach was extensively used in early database architectures but more recent database systems tend to opt for append semantics. Let’s look at the reasons why that may be.

When we append updates rather than overwrite data we are able to keep the old and consistent version of the data while the update is in progress and this affords a robust recovery mechanism should the update fail. We are able to easily watermark the updates and we can choose to move that watermark only when we know that the updates have been completed and successfully persisted in storage. Whereas with update-in-place semantics we would have to write and retain the old version in a separate log before the update took place to be able to guarantee recovery from a failed update (assuming that a full recovery from a database backup copy is a little too rudimentary for most people).

The use of an append semantics allows multiple updates to be appended to the same area of storage rather than scattered at random locations across the database. This affords a huge performance benefit to updates because large numbers of small updates become coalesced into a smaller number of larger writes and this hugely reduces the load on fsync() operations – which are essential for durability in write back cache arrangements and for avoiding anomalies arising from out-of-order write behaviour in modern disk systems.

Even on flash storage this type of large append write pattern has huge benefits over small random writes because much less of an erase block is wasted with each write. With a small random update I/O, a flash device has to copy the entirety  of an updated block to a new empty location and remap the block’s location. Once a flash device fills, unused blocks must be explicitly erased before they can be re-used and on a busy device this garbage collection activity can impact performance even where it happens asynchronously ahead of time.

Append semantics show significant performance gains in other areas too, such as in transaction isolation. With update-in-place semantics either an explicit MVCC (multi-version consistency control) scheme with separate before-update images must be managed; or worse still, read activity must lock-out update activity on the same data to ensure consistent reads. While the use of read locks is a death knell to performance with concurrent users and is tantamount to eating small children in a public place; the overhead of managing before and after images also places a huge burden on throughput. However, none of this band aid architecture is needed with append semantics because versioning is implicit and watermarks demarcate the boundaries between transactions.

There’s also a slight deception to the term update-in-place. What happens when the version we are attempting to overwrite is smaller than the updated version? This usually leads to schemes such as row chaining where database rows are split to accommodate increasing row length. These chained row fragments may not even reside in the same storage block as the original row and this ultimately leads to further performance horror at query time.

Moreover, contemporary data management requirements commonly require a history of transactions to be retained for management, analysis and legal reasons. So why go out of your way to destroy that very history by overwriting it?

That all sounds great for append semantics, but are there any good reasons for choosing to update in place? The usual response here is that append semantics will require more storage space than update-in-place semantics because we are retaining a complete history of the updates. This is indeed true, but is it important?

On any very large database system dealing with machine generated or web data, the huge majority of data is derived from immutable inserts that never get updated and any data that can be updated (such as customer or account data) is insignificant by comparison. Besides, customer and account data is the very type of data we want to maintain a history for because these are the significant real-world entities on which the business depends.

At this point, someone will helpfully suggest that they might use a table which contains a single row which gets frequently updated for the sole purpose of maintaining a sequence number.  (Suddenly, a silence falls and a distant church bell can be heard to slowly toll…) Enough said.

Therefore, either the volume of updates is unlikely to significantly impact storage space on a large system where space is critical; or the system is too small for it to matter at today’s storage prices. Indeed, storage bandwidth is a lot costlier than storage capacity and update-in-place offers no benefit there at all.

Neither is query performance improved by update-in-place semantics since with append semantics, all versions of a row can be equally accessible and resolvable through whatever structures are use to locate an original row. Indeed query performance is likely to suffer more with update-in-place semantics because of row chaining.

So why do some database vendors continue to use update-in-place semantics? Good question.

A precedent embalms a principle. Benjamin Disraeli.

Logistical nightmare

Imagine a man who runs a small retail company selling electronic components.

He works out of a tiny workshop in which he packages his orders and has just enough room to hold a handful of parcels and keep a small card index.

He also owns a vast warehouse where he stores his inventory for fulfilling his retail orders. This warehouse is organised into aisles, racks and crates. Each crate holds 1000 items; each rack holds a 1000 crates; and each aisle houses 1000 racks. To keep things organised, every crate, rack and aisle is uniquely numbered and every stored component is arranged in strict alphabetical order across all of the crates with each aisle, rack and crate labelled with the alphabetical range of its contents.

Each aisle holds a card index which lists the alphabetical range of each rack and each rack holds a card index which lists the alphabetical range of each crate. There is also a card index for identifying the correct aisle and this card index is kept at the workshop. To find the correct crate simply requires identifying the correct aisle, then the correct rack and then the correct crate using the respective card indexes.

This man is immobile and unable to drive a car and so works with an assistant who drives between the workshop and the warehouse in a pick-up truck to collect the items as required. Unfortunately while the assistant can read numbers well enough, he is completely illiterate and needs to be told exactly which numbered aisle, rack or crate he has to go to. Worse still, the warehouse is so far away that it takes a whole day for the assistant to drive there and back.

To parcel up an order, the man has to follow these steps for each item…

  • Consult the card index in his workshop to work out which aisle the item belongs in
  • Send his assistant to the warehouse to fetch the card index from that numbered aisle
  • When the assistant returns, consult the card index for the aisle to identify the correct rack
  • Send his assistant back to the warehouse to fetch the card index from that numbered rack (and return the card index for the aisle)
  • When the assistant returns, consult the card index for the rack to identify the correct crate
  • Send his assistant back to the warehouse to fetch the numbered crate (and return the card index for the rack)

As you might expect, the man is actually reasonably fast at reading the card index, sorting through a crate and packaging the parcels but with a whole day consumed by each round trip by his assistant, he actually spends the majority of his time just waiting for his assistant to return with a card index.

Moreover, the pick-up truck could potentially transport multiple crates at once but spends its time mostly empty – carting around just one card index and the occasional crate.

This is a logistical nightmare and you wouldn’t run a business like this – yet this is exactly the logistics of using a B-tree index.

You might think the times above are bonkers, but if you scale the timings precisely to that of a CPU and hard disk, they become insane…

  • 1 minute to flick through the card index
  • Over 2 months for the assistant to return from the warehouse

Even scaling the timings for flash storage, it doesn’t look that viable either

  • 1 minute to flick through the card index
  • 10 hours for the assistant to drive to and return from the warehouse.

Of course the man can be doing other things while he is waiting for his assistant to return, but on this scale it would take him 15 minutes to switch to another parcel (to do another minute of useful work).

Life gets even worse for this poor man when a new consignment of components arrives at his workshop and the components have to be dispatched to the distant warehouse and placed into their correct crates one by one…

This is certainly not a sensible way to run a business; nor is it likely to be the best way to retrieve data.

At the speed of light

It is quite enlightening to look at the maximum distance that can be travelled by light (or information) within a single CPU clock tick. This has reduced from around 0.1Km in the 1970s (at 3MHz) down to 100mm now (at 3 GHz). 

In other words, for today’s CPU to request information and have it returned within a single clock tick, the information must be in the near vicinity of the CPU. Therefore, any kind of storage is going to create a storage wall no matter how fast that storage technology is (flash, racetrack, PCM etc). Even storage as fast as the CPU cache itself will still present a storage wall if it has to sit beyond the motherboard – simply because information will not be able to travel faster than the speed of light (quantum computing and entanglement aside).

Ultimately, if we do replace current hard disk and flash storage with fast non-volatile RAM we still will not have removed the access latency problems we currently see, unless all of the data can be located near the CPU itself or we abandon the von Neumann architecture that underpins the computing industry.