Thread: [HACKERS] RFC: Key normalization for nbtree

[HACKERS] RFC: Key normalization for nbtree

From
Peter Geoghegan
Date:
I've created a new Wiki page that describes a scheme for normalizing
internal page items within B-Tree indexes, and the many optimizations
that this can enable:

https://wiki.postgresql.org/wiki/Key_normalization

Key normalization means creating a representation for internal page
items that we always just memcmp(), regardless of the details of the
underlying datatypes.

My intent in creating this wiki page is to document these techniques
centrally, as well as the problems that they may solve, and to show
how they're all interrelated. It might be that confusion about how one
optimization enables another holds back patch authors.

It might appear excessive to talk about several different techniques
in one place, but that seemed like the best way to me, because there
are subtle dependencies. If most of the optimizations are pursued as a
project all at once (say, key normalization, suffix truncation, and
treating heap TID as a unique-ifier), that may actually be more likely
to succeed than a project to do just one. The techniques don't appear
to be related at first, but they really are.

I'm not planning on working on key normalization or any of the other
techniques as projects myself, but FWIW I have produced minimal
prototypes of a few of the techniques over the past several years,
just to verify my understanding. My theories on this topic seem worth
writing down. I'm happy to explain or clarify any aspect of what I
describe, and to revise the design based on feedback. It is still very
much a work in progress.

-- 
Peter Geoghegan



Re: [HACKERS] RFC: Key normalization for nbtree

From
Claudio Freire
Date:
On Mon, Jul 10, 2017 at 3:40 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> It might appear excessive to talk about several different techniques
> in one place, but that seemed like the best way to me, because there
> are subtle dependencies. If most of the optimizations are pursued as a
> project all at once (say, key normalization, suffix truncation, and
> treating heap TID as a unique-ifier), that may actually be more likely
> to succeed than a project to do just one. The techniques don't appear
> to be related at first, but they really are.

I do have a patch that attacks suffix truncation, heap tid unification
and prefix compression all at once.

It's on a hiatus ATM, but, as you say, the implementations are highly
correlated so attacking them at once makes a lot of sense. Or, at
least, attacking one having the other in the back of your mind.

Key normalization would simplify prefix compression considerably, for instance.

A missing optimization is that having tid unification allows VACUUM to
implement a different strategy when it needs to clean up only a tiny
fraction of the index. It can do the lookup by key-tid instead of
scanning the whole index, which can be a win if the index is large and
the number of index pointers to kill is small.



Re: [HACKERS] RFC: Key normalization for nbtree

From
Alvaro Herrera
Date:
Claudio Freire wrote:

> A missing optimization is that having tid unification allows VACUUM to
> implement a different strategy when it needs to clean up only a tiny
> fraction of the index. It can do the lookup by key-tid instead of
> scanning the whole index, which can be a win if the index is large and
> the number of index pointers to kill is small.

Doing index cleanup by using keys instead of scanning the whole index
would be a *huge* win for many use cases.  I don't think that work needs
to be related to either of these patches.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] RFC: Key normalization for nbtree

From
Peter Geoghegan
Date:
On Mon, Jul 10, 2017 at 1:23 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Claudio Freire wrote:
>
>> A missing optimization is that having tid unification allows VACUUM to
>> implement a different strategy when it needs to clean up only a tiny
>> fraction of the index. It can do the lookup by key-tid instead of
>> scanning the whole index, which can be a win if the index is large and
>> the number of index pointers to kill is small.
>
> Doing index cleanup by using keys instead of scanning the whole index
> would be a *huge* win for many use cases.  I don't think that work needs
> to be related to either of these patches.

The problem is that it will perform horribly when there are many
duplicates, unless you really can treat heap TID as a part of the key.
Once you do that, you can be almost certain that you won't have to
visit more than one leaf page per index tuple to kill. And, you can
amortize descending the index among a set of duplicate keyed tuples.
You arrive at the leaf tuple with the lowest sort order, based on
(key, TID), kill that, and then continue an index scan along the leaf
level. VACUUM ticks them off a private "kill list" which is also
ordered by (key, TID). Much less random I/O, and pretty good
guarantees about the worst case.

-- 
Peter Geoghegan



Re: [HACKERS] RFC: Key normalization for nbtree

From
Peter Geoghegan
Date:
On Mon, Jul 10, 2017 at 12:53 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I do have a patch that attacks suffix truncation, heap tid unification
> and prefix compression all at once.

That's great! I'll certainly be able to review it.

> It's on a hiatus ATM, but, as you say, the implementations are highly
> correlated so attacking them at once makes a lot of sense. Or, at
> least, attacking one having the other in the back of your mind.
>
> Key normalization would simplify prefix compression considerably, for instance.

The stuff that I go into detail about is the stuff I think you'd need
to do up front. Prefix compression would be a "nice to have", but I
think you'd specifically do key normalization, suffix truncation, and
adding the heap TID to the key space all up front.

> A missing optimization is that having tid unification allows VACUUM to
> implement a different strategy when it needs to clean up only a tiny
> fraction of the index. It can do the lookup by key-tid instead of
> scanning the whole index, which can be a win if the index is large and
> the number of index pointers to kill is small.

I do mention that on the Wiki page. I also describe ways that these
techniques have some non-obvious benefits for VACUUM B-Tree page
deletion, which you should get "for free" once you do the 3 things I
mentioned. A lot of the benefits for VACUUM are seen in the worst
case, which is when they're really needed.

-- 
Peter Geoghegan



Re: [HACKERS] RFC: Key normalization for nbtree

From
Greg Stark
Date:
On 10 July 2017 at 19:40, Peter Geoghegan <pg@bowt.ie> wrote:
> Key normalization means creating a representation for internal page
> items that we always just memcmp(), regardless of the details of the
> underlying datatypes.

One thing I would like to see is features like this added to the
opclasses (or opfamilies?) using standard PG functions that return
standard PG data types. So if each opclass had a function that took
the data type in question and returned a bytea then you could
implement that function using a language you felt like (in theory),
test it using standard SQL, and possibly even find other uses for it.
That kind of abstraction would be more promising for the future than
having yet another C api that is used for precisely one purpose and
whose definition is "provide the data needed for this usage".

-- 
greg



Re: [HACKERS] RFC: Key normalization for nbtree

From
Peter Geoghegan
Date:
On Mon, Jul 10, 2017 at 4:08 PM, Greg Stark <stark@mit.edu> wrote:
> One thing I would like to see is features like this added to the
> opclasses (or opfamilies?) using standard PG functions that return
> standard PG data types. So if each opclass had a function that took
> the data type in question and returned a bytea then you could
> implement that function using a language you felt like (in theory),
> test it using standard SQL, and possibly even find other uses for it.

That seems like a good goal.

> That kind of abstraction would be more promising for the future than
> having yet another C api that is used for precisely one purpose and
> whose definition is "provide the data needed for this usage".

Perhaps this is obvious, but the advantage of flattening everything
into one universal representation is that it's a very simple API for
opclass authors, that puts control into the hands of the core nbtree
code, where it belongs. For text, say, you can generate fictitious
minimal separator keys with suffix truncation, that really are as
short as possible, down to level of individual bits. If you tried to
do this with the original text representation, you'd have to worry
about the fact that that's probably not going to even be valid UTF-8,
and that encoding aware truncation is needed, etc. You'd definitely
have to "double check" that the truncated key was greater than the
left half and less than the right half, just in case you didn't end up
with a valid separator due to the vagaries of the collation rules.

That's the kind of complexity that scales poorly, because the
complexity cannot be isolated.

-- 
Peter Geoghegan