Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date
Msg-id 31489.1545499200@sss.pgh.pa.us
Whole thread Raw
In response to Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)  (John Naylor <jcnaylor@gmail.com>)
Responses Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
List pgsql-hackers
John Naylor <jcnaylor@gmail.com> writes:
> Using a single file also gave me another idea: Take value and category
> out of ScanKeyword, and replace them with an index into another array
> containing those, which will only be accessed in the event of a hit.
> That would shrink ScanKeyword to 4 bytes (offset, index), further
> increasing locality of reference. Might not be worth it, but I can try
> it after moving on to the core scanner.

I like that idea a *lot*, actually, because it offers the opportunity
to decouple this mechanism from all assumptions about what the
auxiliary data for a keyword is.  Basically, we'd redefine
ScanKeywordLookup as having the API "given a string, return a
keyword index if it is a keyword, -1 if it isn't"; then the caller
would use the keyword index to look up the auxiliary data in a table
that it owns, and ScanKeywordLookup doesn't know about at all.

So that leads to a design like this: the master data is in a header
that's just like kwlist.h is today, except now we are thinking of
PG_KEYWORD as an N-argument macro not necessarily exactly 3 arguments.
The Perl script reads that, paying attention only to the first argument
of the macro calls, and outputs a file containing, say,

static const uint16 kw_offsets[] = { 0, 6, 15, ... };

static const char kw_strings[] =
    "abort\0"
    "absolute\0"
    ...
;

(it'd be a good idea to have a switch that allows specifying the
prefix of these constant names).  Then ScanKeywordLookup has the
signature

int ScanKeywordLookup(const char *string_to_lookup,
                      const char *kw_strings,
                      const uint16 *kw_offsets,
                      int num_keywords);

and a file using this stuff looks something like

/* Payload data for keywords */
typedef struct MyKeyword
{
    int16        value;
    int16        category;
} MyKeyword;

#define PG_KEYWORD(kwname, value, category) {value, category},

static const MyKeyword MyKeywords[] = {
#include "kwlist.h"
};

/* String lookup table for keywords */
#include "kwlist_d.h"

/* Lookup code looks about like this: */
    kwnum = ScanKeywordLookup(str,
                              kw_strings,
                              kw_offsets,
                              lengthof(kw_offsets));
    if (kwnum >= 0)
       ... look into MyKeywords[kwnum] for info ...

Aside from being arguably better from the locality-of-reference
standpoint, this gets us out of the weird ifdef'ing you've got in
the v2 patch.  The kwlist_d.h headers can be very ordinary headers.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Joins on TID
Next
From: Andres Freund
Date:
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)