Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location
Date
Msg-id CAGTBQpaJfpPXsR_S8ebo2Fi5p-muRBBzj_OT6_yjw=U8iG_uUg@mail.gmail.com
Whole thread Raw
In response to Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location  (Kevin Grittner <kgrittn@gmail.com>)
Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Thu, Aug 18, 2016 at 5:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 17, 2016 at 10:54 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> The attached patch tries to maintain the initial status of B-Tree
>> indexes, which are created with equal-key runs in physical order,
>> during the whole life of the B-Tree, and make key-tid pairs
>> efficiently searchable in the process.
>
> I have two pretty important questions that doesn't seem to be
> addressed in your email.

I believe I addressed that in the README, but if I wasn't clear, let
me know and I'll expand there too.

Summarizing here:

> 1. How does it do that?

It adds the block number and the offset number of the index tuple's
t_tid (that is, the pointer into the heap) as a final (implicit) sort
key, as is done already in tuplesort. It just keeps doing that while
comparing keys in the B-Tree when searching the leaf page for
insertion, so the insertion point now points to the exact point where
the insertion has to happen to maintain that condition, and not just
to the first valid position within the same-key run.

In fact, that's why non-leaf index tuples need a different format,
because while leaf index tuples contain the heap pointer already,
non-leaf ones contain only the downlink, not the pointer into the
heap. To be able to do comparisons and pick the right downlink, the
original heap pointer in the leaf index tuple is copied into the
downlink index tuple when splitting pages into an additional
IndexTupleData header that is prepended only to non-leaf index tuples.

This representation is chosen to minimize code changes, such that
(itup+1) is usable as before, and also because it's reasonably
compact. But it *is* necessary to check whether it's a leaf or
non-leaf tuple to choose whether to use (itup+1) or just itup, which
is why the patch requires so many changes in so many places.

> 2. What's the benefit?

The idea came up in the discussion about WARM updates.

Basically, in various situations it is convenient to be able to find
the index tuples that point to a specific heap tuple. Without this,
doing so isn't efficient - the only way to find such index tuples is
to find the leftmost index tuple that has the same key, and walk the
index right links until the right tid is found. For non-unique
indexes, but also for unique indexes with heavy bloat, this could
involve a large amount of I/O, so it isn't efficient in several
contexts. Think vacuum and WARM updates mostly (like HOT updates, but
where only some indexes don't need updating, and some do).

Having the ability to find a heap tuple by key-ctid pair is thus
beneficial because it allows efficient implementations for those
operations. It may benefit other parts of the code, but those are the
ones that come to mind.

It also makes index scans of the form

SELECT * FROM t WHERE some_col = some_const;

Scan the heap in sequential order, even if some_col has low
cardinality (ie: the query returns lots of tuples), which is a nice
performance side effect.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Fix comment in ATExecValidateConstraint
Next
From: Tom Lane
Date:
Subject: Re: Patch: initdb: "'" for QUOTE_PATH (non-windows)