Home > mailing lists

Re: [PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: [PROPOSAL] Effective storage of duplicates in B-tree index.
Date	September 28, 2015 02:11:56
Msg-id	CAM3SWZTzqAh_kn99Rw8Hs7RcyhfQuqR3H36LjrAA53z94TEysA@mail.gmail.com Whole thread Raw
In response to	Re: [PROPOSAL] Effective storage of duplicates in B-tree index. (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses	Re: [PROPOSAL] Effective storage of duplicates in B-tree index. (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Thu, Sep 3, 2015 at 8:35 AM, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
>> * Since everything is aligned within B-Tree, it's probably worth
>> considering the alignment boundaries when doing prefix compression, if
>> you want to go that way. We can probably imagine a world where
>> alignment is not required for B-Tree, which would work on x86
>> machines, but I can't see it happening soon. It isn't worth
>> compressing unless it compresses enough to cross an "alignment
>> boundary", where we're not actually obliged to store as much data on
>> disk. This point may be obvious, not sure.
>
> That is another reason, why I doubt prefix compression, whereas effective
> duplicate storage hasn't this problem.

Okay. That sounds reasonable. I think duplicate handling is a good project.

A good learning tool for Postgres B-Trees -- or at least one of the
better ones -- is my amcheck tool. See:

https://github.com/petergeoghegan/postgres/tree/amcheck

This is a tool for verifying B-Tree invariants hold, which is loosely
based on pageinspect. It checks that certain conditions hold for
B-Trees. A simple example is that all items on each page be in the
correct, logical order. Some invariants checked are far more
complicated, though, and span multiple pages or multiple levels. See
the source code for exact details. This tool works well when running
the regression tests (see stress.sql -- I used it with pgbench), with
no problems reported last I checked. It often only needs light locks
on relations, and single shared locks on buffers. (Buffers are copied
to local memory for the tool to operate on, much like
contrib/pageinspect).

While I have yet to formally submit amcheck to a CF (I once asked for
input on the goals for the project on -hackers), the comments are
fairly comprehensive, and it wouldn't be too hard to adopt this to
guide your work on duplicate handling. Maybe it'll happen for 9.6.
Feedback appreciated.

The tool calls _bt_compare() for many things currently, but doesn't
care about many lower level details, which is (very roughly speaking)
the level that duplicate handling will work at. You aren't actually
proposing to change anything about the fundamental structure that
B-Tree indexes have, so the tool could be quite useful and low-effort
for debugging your code during development.

Debugging this stuff is sometimes like keyhole surgery. If you could
just see at/get to the structure that you care about, it would be 10
times easier. Hopefully this tool makes it easier to identify problems.

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Tom Lane
Date: 28 September 2015, 01:36:11
Subject: Re: BRIN indexes for MAX, MIN, ORDER BY?

From: Peter Geoghegan
Date: 28 September 2015, 02:23:44
Subject: Re: [PROPOSAL] Effective storage of duplicates in B-tree index.

Re: [PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

Previous

Next