Re: [PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [PROPOSAL] Effective storage of duplicates in B-tree index.
Date
Msg-id CAM3SWZTzqAh_kn99Rw8Hs7RcyhfQuqR3H36LjrAA53z94TEysA@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] Effective storage of duplicates in B-tree index.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: [PROPOSAL] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Thu, Sep 3, 2015 at 8:35 AM, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
>> * Since everything is aligned within B-Tree, it's probably worth
>> considering the alignment boundaries when doing prefix compression, if
>> you want to go that way. We can probably imagine a world where
>> alignment is not required for B-Tree, which would work on x86
>> machines, but I can't see it happening soon. It isn't worth
>> compressing unless it compresses enough to cross an "alignment
>> boundary", where we're not actually obliged to store as much data on
>> disk. This point may be obvious, not sure.
>
> That is another reason, why I doubt prefix compression, whereas effective
> duplicate storage hasn't this problem.

Okay. That sounds reasonable. I think duplicate handling is a good project.

A good learning tool for Postgres B-Trees -- or at least one of the
better ones -- is my amcheck tool. See:

https://github.com/petergeoghegan/postgres/tree/amcheck

This is a tool for verifying B-Tree invariants hold, which is loosely
based on pageinspect. It checks that certain conditions hold for
B-Trees. A simple example is that all items on each page be in the
correct, logical order. Some invariants checked are far more
complicated, though, and span multiple pages or multiple levels. See
the source code for exact details. This tool works well when running
the regression tests (see stress.sql -- I used it with pgbench), with
no problems reported last I checked. It often only needs light locks
on relations, and single shared locks on buffers. (Buffers are copied
to local memory for the tool to operate on, much like
contrib/pageinspect).

While I have yet to formally submit amcheck to a CF (I once asked for
input on the goals for the project on -hackers), the comments are
fairly comprehensive, and it wouldn't be too hard to adopt this to
guide your work on duplicate handling. Maybe it'll happen for 9.6.
Feedback appreciated.

The tool calls _bt_compare() for many things currently, but doesn't
care about many lower level details, which is (very roughly speaking)
the level that duplicate handling will work at. You aren't actually
proposing to change anything about the fundamental structure that
B-Tree indexes have, so the tool could be quite useful and low-effort
for debugging your code during development.

Debugging this stuff is sometimes like keyhole surgery. If you could
just see at/get to the structure that you care about, it would be 10
times easier. Hopefully this tool makes it easier to identify problems.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: BRIN indexes for MAX, MIN, ORDER BY?
Next
From: Peter Geoghegan
Date:
Subject: Re: [PROPOSAL] Effective storage of duplicates in B-tree index.