Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From John Naylor
Subject Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date
Msg-id CAJVSVGWbC0KuZKV=wVNUvGWO5L-ay_aYRztEAanX8=bc1Ju0kA@mail.gmail.com
Whole thread Raw
In response to Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
On 12/19/18, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> Is there any particular reason not to go further and use a perfect hash
> function for the lookup, rather than binary search?

When I was investigating faster algorithms, I ruled out gperf based on
discussions in the archives. The approach here has modest goals and
shouldn't be too invasive.

With the makefile support and separate keyword files in place, that'll
be one less thing to do if we ever decide to replace binary search.
The giant string will likely be useful as well.

Since we're on the subject, I think some kind of trie would be ideal
performance-wise, but a large amount of work. The nice thing about a
trie is that it can be faster then a hash table for a key miss. I
found a paper that described some space-efficient trie variations [1],
but we'd likely have to code the algorithm and a way to emit a C code
representation of it. I've found some libraries, but that would have
more of the same difficulties in practicality that gperf had.

[1] https://infoscience.epfl.ch/record/64394/files/triesearches.pdf

-John Naylor


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: A few new options for vacuumdb
Next
From: Pavel Stehule
Date:
Subject: Re: slow queries over information schema.tables