Re: Hash tables in dynamic shared memory - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Hash tables in dynamic shared memory
Date
Msg-id CAEepm=1NzRqBoEeosHCagqMV03E-j89hAs83tnOCpFH_QyBi0g@mail.gmail.com
Whole thread Raw
In response to Re: Hash tables in dynamic shared memory  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On Wed, Oct 5, 2016 at 7:02 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Oct 5, 2016 1:23 AM, "Thomas Munro" <thomas.munro@enterprisedb.com>
> wrote:
>>
>> On Wed, Oct 5, 2016 at 12:11 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>> > On Wed, Oct 5, 2016 at 11:22 AM, Andres Freund <andres@anarazel.de>
>> > wrote:
>> >>> Potential use cases for DHT include caches, in-memory database objects
>> >>> and working state for parallel execution.
>> >>
>> >> Is there a more concrete example, i.e. a user we'd convert to this at
>> >> the same time as introducing this hashtable?
>> >
>> > A colleague of mine will shortly post a concrete patch to teach an
>> > existing executor node how to be parallel aware, using DHT.  I'll let
>> > him explain.
>> >
>> > I haven't looked into whether it would make sense to convert any
>> > existing shmem dynahash hash table to use DHT.  The reason for doing
>> > so would be to move it out to DSM segments and enable dynamically
>> > growing.  I suspect that the bounded size of things like the hash
>> > tables involved in (for example) predicate locking is considered a
>> > feature, not a bug, so any such cluster-lifetime core-infrastructure
>> > hash table would not be a candidate.  More likely candidates would be
>> > ephemeral data used by the executor, as in the above-mentioned patch,
>> > and long lived caches of dynamic size owned by core code or
>> > extensions.  Like a shared query plan cache, if anyone can figure out
>> > the invalidation magic required.
>>
>> Another thought: it could be used to make things like
>> pg_stat_statements not have to be in shared_preload_libraries.
>>
>
> That would indeed be a great improvement. And possibly also allow the
> changing of the max number of statements it can track without a restart?

Yeah.  You don't explicitly size a DHT hash table, it just grows as
required to keep the load factor low enough, possibly leading the DSA
area it lives in to create DSM segments.  Currently it never gets
smaller though, so if you throw out a bunch of entries you will be
freeing up the memory occupied by the entries themselves (meaning:
giving it back to the DSA area, which might eventually give back to
the OS if the planets are correctly aligned rendering a DSM segment
entirely unused), but the hash table's bucket array won't ever shrink.

> I was also wondering if it might be useful for a replacement for some of the
> pgstats stuff to get rid of the cost of spooling to file and then rebuilding
> the hash tables in the receiving end. I've been waiting for this patch to
> figure out if that's useful. I mean keep the stats collector doing what it
> does now over udp, but present the results in shared hash tables instead of
> files.

Interesting thought.  I haven't studied how that stuff works... I've
put it on my reading list.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Vitaly Burovoy
Date:
Subject: Re: Fast AT ADD COLUMN with DEFAULTs
Next
From: Andres Freund
Date:
Subject: Re: Hash tables in dynamic shared memory