Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date
Msg-id CAH2-Wz=aAMARy08hrzN9UOE4AegsAkge+0nsYk+no2S14W2g2Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@bowt.ie>)
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Jan 8, 2020 at 2:56 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Thanks for the review! Anything that you've written that I do not
> respond to directly can be assumed to have been accepted by me.

Here is a version with most of the individual changes you asked for --
this is v29. I just pushed a couple of small tweaks to nbtree.h, that
you suggested I go ahead with immediately. v29 also refactors some of
the "single value strategy" stuff in nbtdedup.c. This is code that
anticipates the needs of nbtsplitloc.c's single value strategy --
deduplication is designed to work together with page
splits/nbtsplitloc.c.

Still, v29 doesn't resolve the following points you've raised, where I
haven't reached a final opinion on what to do myself. These items are
as follows (I'm quoting your modified patch file sent on January 8th
here):

* HEIKKI: Do we only generate one posting list in one WAL record? I
would assume it's better to deduplicate everything on the page, since
we're modifying it anyway.

* HEIKKI: Does xl_btree_vacuum WAL record store a whole copy of the
remaining posting list on an updated tuple? Wouldn't it be simpler and
more space-efficient to store just the deleted TIDs?

* HEIKKI: Would it be more clear to have a separate struct for the
posting list split case? (i.e. don't reuse xl_btree_insert)

v29 of the patch also doesn't change anything about how LP_DEAD bits
work, apart from going into the LP_DEAD stuff in the commit message.
This doesn't seem to be in the same category as the other three open
items, since it seems like we disagree here -- that must be worked out
through further discussion and/or benchmarking.

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Alvaro Herrera
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions