Home > mailing lists

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date	December 26, 2018 22:21:05
Msg-id	5300.1545852065@sss.pgh.pa.us Whole thread Raw
In response to	Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) (Andres Freund <andres@anarazel.de>)
List	pgsql-hackers

Tree view

Andres Freund <andres@anarazel.de> writes:
> On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
>> I'm not sure that I understand quite what you have in mind for a
>> serialized non-perfect hashtable.  Are you thinking that we'd just
>> construct a simplehash and serialize it?

> I was basically thinking that we'd have the perl script implement a
> simple hash and put the keyword (pointers) into an array, handling
> conflicts with the simplest linear probing thinkable. As there's never a
> need for modifications, that ought to be fairly simple.

I think it was Knuth who said that when you use hashing, you are putting
a great deal of faith in the average case, because the worst case is
terrible.  The applicability of that to this problem is that if you hit
a bad case (say, a long collision chain affecting some common keywords)
you could end up with poor performance that affects a lot of people for
a long time.  And our keyword list is not so static that you could prove
once that the behavior is OK and then forget about it.

So I'm suspicious of proposals to use simplistic hashing here.

There might well be some value in Robert's idea of keying off the first
letter to get rid of the first few binary-search steps, not least because
those steps are particularly terrible from a cache-footprint perspective.
I'm not sold on doing anything significantly more invasive than that.

            regards, tom lane

pgsql-hackers by date:

From: Peter Geoghegan
Date: 26 December 2018, 22:08:54
Subject: Re: random() (was Re: New GUC to sample log queries)

From: Tom Lane
Date: 26 December 2018, 22:31:06
Subject: Re: random() (was Re: New GUC to sample log queries)

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

Previous

Next