Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date
Msg-id 28725.1545841359@sss.pgh.pa.us
Whole thread Raw
In response to Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)  (David Fetter <david@fetter.org>)
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 19, 2018 at 8:01 PM Andres Freund <andres@anarazel.de> wrote:
>> The last time I looked into perfect hash functions, it wasn't easy to
>> find a generator that competed with a decent normal hashtable (in
>> particular gperf's are very unconvincing). The added tooling is a
>> concern imo.  OTOH, we're comparing not with a hashtable, but a binary
>> search, where the latter will usually loose.  Wonder if we shouldn't
>> generate a serialized non-perfect hashtable instead. The lookup code for
>> a read-only hashtable without concern for adversarial input is pretty
>> trivial.

> I wonder if we could do something really simple like a lookup based on
> the first character of the scan keyword. It looks to me like there are
> 440 keywords right now, and the most common starting letter is 'c',
> which is the first letter of 51 keywords. So dispatching based on the
> first letter clips at least 3 steps off the binary search.  I don't
> know whether that's enough to be worthwhile, but it's probably pretty
> simple to implement.

I think there's a lot of goalpost-moving going on here.  The original
idea was to trim the physical size of the data structure, as stated
in the thread subject, and just reap whatever cache benefits we got
along the way from that.  I am dubious that we actually have any
performance problem in this code that needs a big dollop of added
complexity to fix.

In my hands, the only part of the low-level parsing code that commonly
shows up as interesting in profiles is the Bison engine.  That's probably
because the grammar tables are circa half a megabyte and blow out cache
pretty badly :-(.  I don't know of any way to make that better,
unfortunately.  I suspect that it's just going to get worse, because
people keep submitting additions to the grammar.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Next
From: David Fetter
Date:
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)