Home > mailing lists

Hash tables in dynamic shared memory - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Hash tables in dynamic shared memory
Date	October 5, 2016 00:40:54
Msg-id	CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.com Whole thread Raw
Responses	Re: Hash tables in dynamic shared memory (Andres Freund <andres@anarazel.de>) Re: Hash tables in dynamic shared memory (Dilip Kumar <dilipbalaut@gmail.com>)
List	pgsql-hackers

Tree view

Hi hackers,

Here is an experimental hash table implementation that uses DSA memory
so that hash tables can be shared by multiple backends and yet grow
dynamically. Development name: "DHT".

It's impossible to write a general purpose hash table that will be
suitable for every use case, considering all the combinations of
design trade offs and subsets of functionality someone might want.
What I'm trying to do here is produce something that is generally
useful for many cases and easy to use, along the lines of dynahash,
but use dynamic shared memory. Here is the bullet point summary:

* DSA memory-backed
* open hashing (separate chaining)
* incremental resizing
* built-in partition-based locking
* concurrent iteration

There is other work being done in this space: I'm aware of Andres's
current thread[1] on Robin Hood-style closed hashing tables with
macro-ised size, hash and comparator operations à la C++ container
templates, and I'm following along with interest. He vigorously
rejects chaining hash tables as the natural enemies of memory
hardware. Obviously there are pros and cons: this node-based chaining
design allows resizing, iteration with stronger guarantees, and users
can point directly to (or into) the entries from other data
structures, but yeah... it has to follow pointers into far away cache
lines to do that. I guess Andres's simplehash should already work in
DSA memory nicely anyway since it doesn't have any internal pointers
IIUC (?). As for concurrency, Robert Haas also did some interesting
experiments[2] with (nearly) lock-free hash tables and hazard
pointers. I'm not sure what the optimum number of different
implementations or levels of configurability is, and I'm certainly not
wedded to this particular set of design tradeoffs.

Potential use cases for DHT include caches, in-memory database objects
and working state for parallel execution. (Not a use case: the shared
hash table for my as-yet-unposted DSA-based shared parallel hash join
table work, which follows the existing open-coded dense_alloc-based
approach for reasons I'll explain later.)

There are a couple of things I'm not happy about in this version of
DHT: the space overhead per item is too high (hash + wasted padding +
next pointer), and the iterator system is overly complicated. I have
another version in development which gets rid of the hash and padding
and sometimes gets rid of the next pointer to achieve zero overhead,
and I'm working on a simpler iteration algorithm that keeps the
current guarantees but doesn't need to reorder bucket chains. More on
that, with some notes on performance and a testing module, soon.

Please find attached dht-v1.patch, which depends on dsa-v2.patch[3].
I'm posting this version of DHT today because it is a dependency of
another patch due on this list shortly.

See also simple-cache.patch which contains a simple smoke test
extension so you can type SELECT simple_cache_put('42', 'Hello
world'), SELECT simple_cache_get('42') etc.

Thanks to my colleagues Amit Khandekar, Amul Sul, Dilip Kumar and John
Gorman for bug fixes and suggestions, and Robert Haas for feedback.

Please let me know your thoughts, and thanks for reading.

[1] https://www.postgresql.org/message-id/flat/20161003041215.tfrifbeext3i7nkj%40alap3.anarazel.de
[2] https://www.postgresql.org/message-id/CA%2BTgmoYE4t-Pt%2Bv08kMO5u_XN-HNKBWtfMgcUXEGBrQiVgdV9Q%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/CAEepm%3D1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Thomas Munro
Date: 05 October 2016, 00:32:26
Subject: Creating a DSA area to provide work space for parallel execution

From: Gilles Darold
Date: 05 October 2016, 01:12:43
Subject: Re: proposal: psql \setfileref

Hash tables in dynamic shared memory - Mailing list pgsql-hackers

Attachment

Previous

Next