Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date
Msg-id 5300.1545852065@sss.pgh.pa.us
Whole thread Raw
In response to Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
>> I'm not sure that I understand quite what you have in mind for a
>> serialized non-perfect hashtable.  Are you thinking that we'd just
>> construct a simplehash and serialize it?

> I was basically thinking that we'd have the perl script implement a
> simple hash and put the keyword (pointers) into an array, handling
> conflicts with the simplest linear probing thinkable. As there's never a
> need for modifications, that ought to be fairly simple.

I think it was Knuth who said that when you use hashing, you are putting
a great deal of faith in the average case, because the worst case is
terrible.  The applicability of that to this problem is that if you hit
a bad case (say, a long collision chain affecting some common keywords)
you could end up with poor performance that affects a lot of people for
a long time.  And our keyword list is not so static that you could prove
once that the behavior is OK and then forget about it.

So I'm suspicious of proposals to use simplistic hashing here.

There might well be some value in Robert's idea of keying off the first
letter to get rid of the first few binary-search steps, not least because
those steps are particularly terrible from a cache-footprint perspective.
I'm not sold on doing anything significantly more invasive than that.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: random() (was Re: New GUC to sample log queries)
Next
From: Tom Lane
Date:
Subject: Re: random() (was Re: New GUC to sample log queries)