Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Date
Msg-id 20220922042104.GB464247@nathanxps13
Whole thread Raw
In response to Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
List pgsql-hackers
On Wed, Sep 21, 2022 at 02:41:28PM -0700, Peter Geoghegan wrote:
> On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> > Presumably a
>> > generic WAL record compression mechanism could be reused for other large
>> > records, too.  That could be much easier than devising a deduplication
>> > strategy for every record type.
>>
>> It's quite possible that that's a good idea, but that should probably
>> work as an additive thing. That's something that I think of as a
>> "clever technique", whereas I'm focussed on just not being naive in
>> how we represent this one specific WAL record type.
> 
> BTW, if you wanted to pursue something like this, that would work with
> many different types of WAL record, ISTM that a "medium level" (not
> low level) approach might be the best place to start. In particular,
> the way that page offset numbers are represented in many WAL records
> is quite space inefficient.  A domain-specific approach built with
> some understanding of how page offset numbers tend to look in practice
> seems promising.

I wouldn't mind giving this a try.

> The representation of page offset numbers in PRUNE and VACUUM heapam
> WAL records (and in index WAL records) always just stores an array of
> 2 byte OffsetNumber elements. It probably wouldn't be all that
> difficult to come up with a simple scheme for compressing an array of
> OffsetNumbers in WAL records. It certainly doesn't seem like it would
> be all that difficult to get it down to 1 byte per offset number in
> most cases (even greater improvements seem doable).
> 
> That could also be used for the xl_heap_freeze_page record type --
> though only after this patch is committed. The patch makes the WAL
> record use a simple array of page offset numbers, just like in
> PRUNE/VACUUM records. That's another reason why the approach
> implemented by the patch seems like "the natural approach" to me. It's
> much closer to how heapam PRUNE records work (we have a variable
> number of arrays of page offset numbers in both cases).

Yeah, it seems likely that we could pack offsets in single bytes in many
cases.  A more sophisticated approach could even choose how many bits to
use per offset based on the maximum in the array.  Furthermore, we might be
able to make use of SIMD instructions to mitigate any performance penalty.

I'm tempted to start by just using single-byte offsets when possible since
that should be relatively simple while still yielding a decent improvement
for many workloads.  What do you think?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: make additional use of optimized linear search routines
Next
From: Nathan Bossart
Date:
Subject: Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans