A thread-safe hash table for multi-cores (2/2)

25 October 2019

This is Part 2 of a two-post series on the concurrent-hashtable package. In the previous post, we looked at the performance and discussed some benchmark results. Here we will look at the correctness aspects, in particular:

Linearization Points and Implementation

In this post, we will take a look at the details of our implementation to see if the underlying ideas are sound. To claim that each operation is thread-safe, we need to assign linearization points, which we can think of as the points in time when the method takes effect.

Linearization Points for Insertions/Deletions

Both insert and delete call genericModify to do the actual data modification. genericModify for key k performs 2 reads: It will read the table vector, and then read the Chain at the index to which k is hashed. A Chain is essentially a TVar [(k,v)] that contains all key-value pairs that map to that index.

Since reading the table vector and then reading the chain are non-atomic it could happen that some other thread has started resizing the table in-between these two reads. Of course, we could have made our life much simpler by putting them into the same transaction, but this would cause all other concurrently modifying threads to have a conflict with this thread when trying to commit their STM transaction, defeating the whole point of using a fine-grained synchronization approach.

Let’s go through the details of how interleaving updates are handled: To keep the contention between threads low, each chain has its own MigrationStatus value, which is NotStarted initially. If resizing has already started for this table entry, then MigrationStatus must have value Ongoing (ensured by the resizing thread) and the STM transaction in genericModify will block (by calling retry) until the migration status changes to Finished. Then, we simply call genericModify again to find the right index in the now-updated table where, initially, all chains have their MigrationStatus set to NotStarted. This will continue until the thread finds a point in time when the chain isn’t affected by resizing.

Now suppose that the thread observes a migration status of NotStarted, i.e., either the chain hasn’t been processed yet by the resizing thread or no resizing is happening at this point. In this case, we go ahead and perform the modification of the chain for this index. It’s crucial for correctness that reading the MigrationStatus and performing the update to the chain happen atomically, and hence we make them part of the same STM transaction.

Where is the Linearization Point?

We set the linearization point for insert/delete exactly at the point where the STM transaction in genericModify commits, assuming that the MigrationStatus value that was read in the transaction was NotStarted.

Why does this work?

It follows from the code of genericModify that NotStarted is the only possible value for the MigrationStatus such that genericModify returns to insert/delete. In all other cases, genericModify will just keep trying again. Therefore, linearizability follows from the atomicity of the STM transaction in genericModify. As mentioned above, the load of the hash table will not be updated atomically in insert/delete, but this only affects the timing of resizing and not the overall correctness.

Linearization Points for Resizing

We first describe how resizing works internally: Initially, the MigrationStatus will be set to NotStarted for each chain, and, once resizing is in progress, the resizer-thread will set this field to Ongoing for each chain starting at index 0. Since the collection of these writes is not atomic - we want to avoid “stop the world” synchronization as much as possible - threads can still insert/delete data into/from the chains at higher indices while the chains at lower indices are already being migrated to the new table.

Where is the Linearization Point?

For resize, we set the linearization point to be at the time when the STM transaction commits that replaces the old table with the new one.

Why does this work?

No other thread has access to the new table until it is written and hence linearizability follows from the STM’s atomicity guarantee.

Linearization Points for Lookups

For lookups, we perform 2 reads using readTVarIO: first we read the table vector and then we read the chain at index \(i\) to retrieve the item. There are some complications because something else could have happened between the first and second read. Suppose lookup reads the table vector at time \(t_1\) and reads the chain at index \(i\) at time \(t_2\):

  1. Another thread could have completed resizing the table, i.e., its linearization point for the resizing lies in the (open) interval \((t_1,t_2)\)Note that this time interval excludes \(t_1\) and \(t_2\).

    , and so the table index that we read at time \(t_1\) was out of date at time \(t_2\). (We need to be careful because it could be that the item we’re looking for was already deleted from the new table by time \(t_2\) and we might not know about it because we’ve read the chain in the old table!)

  2. No resizing was completed during the interval \((t_1,t_2)\). However, another thread inserted/deleted an item with the same key during that time.

We will look at these 2 cases separately.

Case (a) - resizing happened

Where is the Linearization Point?

Let \(t_4\) be the linearization point of the first resize that was completed during \((t_1,t_2)\). Let \(t_3\) be the linearization point of the latest insert/delete for the chain at index \(i\) during \((t_1,t_4)\); if there was no such insert/delete occuring, define \(t_3=t_1\). We choose the linearization point for lookup to be at some time in \((t_3,t_4)\).

Why does this work?

If there wasn’t any insert/delete, then the lookup would just return the value for the key k that was in the chain at index \(i\) in the old table (before resizing), and its linearization point would be set to happen before the one for resizing. Therefore, whatever happens after resizing doesn’t concern the return value of lookup.

On the other hand, if the latest insert/delete had a linearization point at some time \(t_4\), then its STM transaction must have committed before time \(t_2\), which is when the lookup reads the chain. Hence the readTVarIO issued by lookup will already see the value updated by this modification, which is perfectly fine because the linearization points are exactly in this order.

Case (b) - no resizing happened

Where is the Linearization Point?

We set the linearization point to be at the time of the second read, i.e., time \(t_2\).

Why does this work?

Suppose that a lookup for key k returns Just a and we set the linearization point at time \(t_2\) as defined above. Let’s assume that an item was inserted into the same chain, with a linearization point at time \(t_7\); moreover, we assume that no other modifications of index k took place in the period \([t_7,t_2]\). Since \(t_7 < t_2\), it follows that the read of the chain at index \(i\) by the lookup happened after the STM transaction for the insert took place and hence the lookup will return the correct value. A similar argument works for delete.

Deadlock-freedom and Starvation?

Deadlock-Freedom

It’s relatively straightforward to show freedom from deadlock. From the code, we can see that the only point where there is any potential for deadlock is the call to retry in genericModify, which is reached if and only if the thread observes a MigrationStatus value of OnGoing. How did the status of the chain get this value? That must have happened in resize. Looking at the code of resize, we can see that the status of each chain is updated to Finished eventually, which will wake up any thread that is blocked because of having read OnGoing. Since resize is non-blocking it follows that at least some thread will be making progress and hence we have deadlock-freedom.

What about Starvation?

Unfortunately, we cannot rule out that some thread starves. This is a property that our hash table inherits from the STM runtime, which itself is not starvation-free. Apart from that, it is also possible that a thread starves by getting stuck in genericModify. For instance, if thread A is really slow, and other threads are constantly inserting new items, it could happen that, each time thread A reads the chain’s status, it has value OnGoing because of a simultaneous call to resize by another thread. That said, both of the above scenarios are quite unlikely to be an issue for practical workloads.