Re: [HACKERS] RFC: Key normalization for nbtree - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] RFC: Key normalization for nbtree
Date
Msg-id CAH2-Wznx61+QATZ5Lee35XFOJ_Y4LLPHVHy6hsZOawMy14bRzA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] RFC: Key normalization for nbtree  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Mon, Jul 10, 2017 at 4:08 PM, Greg Stark <stark@mit.edu> wrote:
> One thing I would like to see is features like this added to the
> opclasses (or opfamilies?) using standard PG functions that return
> standard PG data types. So if each opclass had a function that took
> the data type in question and returned a bytea then you could
> implement that function using a language you felt like (in theory),
> test it using standard SQL, and possibly even find other uses for it.

That seems like a good goal.

> That kind of abstraction would be more promising for the future than
> having yet another C api that is used for precisely one purpose and
> whose definition is "provide the data needed for this usage".

Perhaps this is obvious, but the advantage of flattening everything
into one universal representation is that it's a very simple API for
opclass authors, that puts control into the hands of the core nbtree
code, where it belongs. For text, say, you can generate fictitious
minimal separator keys with suffix truncation, that really are as
short as possible, down to level of individual bits. If you tried to
do this with the original text representation, you'd have to worry
about the fact that that's probably not going to even be valid UTF-8,
and that encoding aware truncation is needed, etc. You'd definitely
have to "double check" that the truncated key was greater than the
left half and less than the right half, just in case you didn't end up
with a valid separator due to the vagaries of the collation rules.

That's the kind of complexity that scales poorly, because the
complexity cannot be isolated.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: [HACKERS] New partitioning - some feedback
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] More race conditions in logical replication