Home > mailing lists

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date	December 26, 2018 19:22:39
Msg-id	28725.1545841359@sss.pgh.pa.us Whole thread Raw
In response to	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) (David Fetter <david@fetter.org>) Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 19, 2018 at 8:01 PM Andres Freund <andres@anarazel.de> wrote:
>> The last time I looked into perfect hash functions, it wasn't easy to
>> find a generator that competed with a decent normal hashtable (in
>> particular gperf's are very unconvincing). The added tooling is a
>> concern imo.  OTOH, we're comparing not with a hashtable, but a binary
>> search, where the latter will usually loose.  Wonder if we shouldn't
>> generate a serialized non-perfect hashtable instead. The lookup code for
>> a read-only hashtable without concern for adversarial input is pretty
>> trivial.

> I wonder if we could do something really simple like a lookup based on
> the first character of the scan keyword. It looks to me like there are
> 440 keywords right now, and the most common starting letter is 'c',
> which is the first letter of 51 keywords. So dispatching based on the
> first letter clips at least 3 steps off the binary search.  I don't
> know whether that's enough to be worthwhile, but it's probably pretty
> simple to implement.

I think there's a lot of goalpost-moving going on here.  The original
idea was to trim the physical size of the data structure, as stated
in the thread subject, and just reap whatever cache benefits we got
along the way from that.  I am dubious that we actually have any
performance problem in this code that needs a big dollop of added
complexity to fix.

In my hands, the only part of the low-level parsing code that commonly
shows up as interesting in profiles is the Bison engine.  That's probably
because the grammar tables are circa half a megabyte and blow out cache
pretty badly :-(.  I don't know of any way to make that better,
unfortunately.  I suspect that it's just going to get worse, because
people keep submitting additions to the grammar.

            regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 26 December 2018, 18:45:11
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: David Fetter
Date: 26 December 2018, 19:38:41
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

Previous

Next