Thread: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

12 September 2022, 21:01:34

My ongoing project to make VACUUM more predictable over time by
proactive freezing [1] will increase the overall number of tuples
frozen by VACUUM significantly (at least in larger tables). It's
important that we avoid any new user-visible impact from extra
freezing, though. I recently spent a lot of time on adding high-level
techniques that aim to avoid extra freezing (e.g. by being lazy about
freezing) when that makes sense. Low level techniques aimed at making
the mechanical process of freezing cheaper might also help. (In any
case it's well worth optimizing.)

I'd like to talk about one such technique on this thread. The attached
WIP patch reduces the size of xl_heap_freeze_page records by applying
a simple deduplication process. This can be treated as independent
work (I think it can, at least). The patch doesn't change anything
about the conceptual model used by VACUUM's lazy_scan_prune function
to build xl_heap_freeze_page records for a page, and yet still manages
to make the WAL records for freeze records over 5x smaller in many
important cases. They'll be ~4x-5x smaller with *most* workloads,
even.

Each individual tuple entry (each xl_heap_freeze_tuple) adds a full 12
bytes to the WAL record right now -- no matter what. So the existing
approach is rather space inefficient by any standard (perhaps because
it was developed under time pressure while fixing bugs in Postgres
9.3). More importantly, there is a lot of natural redundancy among
each xl_heap_freeze_tuple entry -- each tuple's xl_heap_freeze_tuple
details tend to match. We can usually get away with storing each
unique combination of values from xl_heap_freeze_tuple once per
xl_heap_freeze_page record, while storing associated page offset
numbers in a separate area, grouped by their canonical freeze plan
(which is a normalized version of the information currently stored in
xl_heap_freeze_tuple).

In practice most individual tuples that undergo any kind of freezing
only need to have their xmin field frozen. And when xmax is affected
at all, it'll usually just get set to InvalidTransactionId. And so the
actual low-level processing steps for xmax have a high chance of being
shared by other tuples on the page, even in ostensibly tricky cases.
While there are quite a few paths that lead to VACUUM setting a
tuple's xmax to InvalidTransactionId, they all end up with the same
instructional state in the final xl_heap_freeze_tuple entry.

Note that there is a small chance that the patch will be less space
efficient by up to 2 bytes per tuple frozen per page in cases where
we're allocating new Mulits during VACUUM. I think that this should be
acceptable on its own -- even in rare bad cases we'll usually still
come out ahead -- what are the chances that we won't make up the
difference on the same page? Or at least within the same VACUUM? And
that's before we talk about a future world in which freezing will
batch tuples together at the page level (you don't have to bring the
other VACUUM work into this discussion, I think, but it's not
*completely* unrelated either).

[1] https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
--
Peter Geoghegan

Attachment

v1-0001-Shrink-freeze-WAL-records-via-deduplication.patch

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Masahiko Sawada

Date:

16 September 2022, 07:29:32

On Tue, Sep 13, 2022 at 6:02 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> My ongoing project to make VACUUM more predictable over time by
> proactive freezing [1] will increase the overall number of tuples
> frozen by VACUUM significantly (at least in larger tables). It's
> important that we avoid any new user-visible impact from extra
> freezing, though. I recently spent a lot of time on adding high-level
> techniques that aim to avoid extra freezing (e.g. by being lazy about
> freezing) when that makes sense. Low level techniques aimed at making
> the mechanical process of freezing cheaper might also help. (In any
> case it's well worth optimizing.)
>
> I'd like to talk about one such technique on this thread. The attached
> WIP patch reduces the size of xl_heap_freeze_page records by applying
> a simple deduplication process. This can be treated as independent
> work (I think it can, at least).

+1

> The patch doesn't change anything
> about the conceptual model used by VACUUM's lazy_scan_prune function
> to build xl_heap_freeze_page records for a page, and yet still manages
> to make the WAL records for freeze records over 5x smaller in many
> important cases. They'll be ~4x-5x smaller with *most* workloads,
> even.

After a quick benchmark, I've confirmed that the amount of WAL records
for freezing 1 million tuples reduced to about one-fifth (1.2GB vs
250MB). Great.

>
> Each individual tuple entry (each xl_heap_freeze_tuple) adds a full 12
> bytes to the WAL record right now -- no matter what. So the existing
> approach is rather space inefficient by any standard (perhaps because
> it was developed under time pressure while fixing bugs in Postgres
> 9.3). More importantly, there is a lot of natural redundancy among
> each xl_heap_freeze_tuple entry -- each tuple's xl_heap_freeze_tuple
> details tend to match. We can usually get away with storing each
> unique combination of values from xl_heap_freeze_tuple once per
> xl_heap_freeze_page record, while storing associated page offset
> numbers in a separate area, grouped by their canonical freeze plan
> (which is a normalized version of the information currently stored in
> xl_heap_freeze_tuple).

True. I've not looked at the patch in depth yet but I think we need
regression tests for this.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

16 September 2022, 19:24:03

On Fri, Sep 16, 2022 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> After a quick benchmark, I've confirmed that the amount of WAL records
> for freezing 1 million tuples reduced to about one-fifth (1.2GB vs
> 250MB). Great.

I think that the really interesting thing about the patch is how this
changes the way we should think about freezing costs. It makes
page-level batching seem very natural.

The minimum possible size of a Heap2/FREEZE_PAGE record is 64 bytes,
once alignment and so on is taken into account (without the patch).
Once we already know that we have to freeze *some* tuples on a given
heap page, it becomes very reasonable to freeze as many as possible,
in batch, just because we know that it'll be much cheaper if we do it
now versus doing it later on instead. Even if this extra freezing ends
up "going to waste" due to updates against the same tuples that happen
a little later on, the *added* cost of freezing "extra" tuples will
have been so small that it's unlikely to matter. On the other hand, if
it's not wasted we'll be *much* better off.

It's very hard to predict the future, which is kinda what the current
FreezeLimit-based approach to freezing does. It's actually quite easy
to understand the cost of freezing now versus freezing later, though.
At a high level, it makes sense for VACUUM to focus on freezing costs
(including the risk that comes with *not* freezing with larger
tables), and not worry so much about making accurate predictions.
Making accurate predictions about freezing/workload characteristics is
overrated.

> True. I've not looked at the patch in depth yet but I think we need
> regression tests for this.

What did you have in mind?

I think that the best way to test something like this is with
wal_consistency_checking. That mostly works fine. However, note that
heap_mask() won't always be able to preserve the state of a tuple's
xmax when modified by freezing. We sometimes need "hint bits" to
actually reliably be set in REDO, when replaying the records for
freezing. At other times they really are just hints. We have to
conservatively assume that it's just a hint when masking. Not sure if
I can do much about that.

Note that this optimization is one level below lazy_scan_prune(), and
one level above heap_execute_freeze_tuple(). Neither function really
changes at all. This seems useful because there are rare
pg_upgrade-only paths where xvac fields need to be frozen. That's not
tested either.

-- 
Peter Geoghegan

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

20 September 2022, 22:12:00

On Mon, Sep 12, 2022 at 2:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I'd like to talk about one such technique on this thread. The attached
> WIP patch reduces the size of xl_heap_freeze_page records by applying
> a simple deduplication process.

Attached is v2, which I'm just posting to keep CFTester happy. No real
changes here.

--
Peter Geoghegan

Attachment

v2-0001-Shrink-freeze-WAL-records-via-deduplication.patch

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Nathan Bossart

Date:

21 September 2022, 20:13:58

On Tue, Sep 20, 2022 at 03:12:00PM -0700, Peter Geoghegan wrote:
> On Mon, Sep 12, 2022 at 2:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I'd like to talk about one such technique on this thread. The attached
>> WIP patch reduces the size of xl_heap_freeze_page records by applying
>> a simple deduplication process.
> 
> Attached is v2, which I'm just posting to keep CFTester happy. No real
> changes here.

This idea seems promising.  I see that you called this patch a
work-in-progress, so I'm curious what else you are planning to do with it.

As I'm reading this thread and the patch, I'm finding myself wondering if
it's worth exploring using wal_compression for these records instead.  I
think you've essentially created an efficient compression mechanism for
this one type of record, but I'm assuming that lz4/zstd would also yield
some rather substantial improvements for this kind of data.  Presumably a
generic WAL record compression mechanism could be reused for other large
records, too.  That could be much easier than devising a deduplication
strategy for every record type.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

21 September 2022, 21:11:36

On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> This idea seems promising.  I see that you called this patch a
> work-in-progress, so I'm curious what else you are planning to do with it.

I really just meant that the patch wasn't completely finished at that
point. I hadn't yet convinced myself that I mostly had it right. I'm
more confident now.

> As I'm reading this thread and the patch, I'm finding myself wondering if
> it's worth exploring using wal_compression for these records instead.

The term deduplication works better than compression here because
we're not actually decompressing anything in the REDO routine. Rather,
the REDO routine processes each freeze plan by processing all affected
tuples in order. To me this seems like the natural way to structure
things -- the WAL records are much smaller, but in a way that's kind
of incidental. The approach taken by the patch just seems like the
natural approach, given the specifics of how freezing works at a high
level.

> I think you've essentially created an efficient compression mechanism for
> this one type of record, but I'm assuming that lz4/zstd would also yield
> some rather substantial improvements for this kind of data.

I don't think of it that way. I've used the term "deduplication" to
advertise the patch, but that's mostly just a description of what
we're doing in the patch relative to what we do on HEAD today. There
is nothing truly clever in the patch. We see a huge amount of
redundancy among tuples from the same page in practically all cases,
for reasons that have everything to do with what freezing is, and how
it works at a high level. The thought process that led to my writing
this patch was more high level than appearances suggest. (I often
write patches that combine high level and low level insights in some
way or other, actually.)

Theoretically there might not be very much redundancy within each
xl_heap_freeze_page record, with the right workload, but in practice a
decrease of 4x or more is all but guaranteed once you have more than a
few tuples to freeze on each page. If there are other WAL records that
are as space inefficient as xl_heap_freeze_page is, then I'd be
surprised -- it is *unusually* space inefficient (like I said, I
suspect that this may have something to do with the fact that it was
originally designed under time pressure). So I don't expect that this
patch tells us much about what we should do for any other WAL record.
I certainly *hope* that it doesn't, at least.

> Presumably a
> generic WAL record compression mechanism could be reused for other large
> records, too.  That could be much easier than devising a deduplication
> strategy for every record type.

It's quite possible that that's a good idea, but that should probably
work as an additive thing. That's something that I think of as a
"clever technique", whereas I'm focussed on just not being naive in
how we represent this one specific WAL record type.

-- 
Peter Geoghegan

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

21 September 2022, 21:41:28

On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Presumably a
> > generic WAL record compression mechanism could be reused for other large
> > records, too.  That could be much easier than devising a deduplication
> > strategy for every record type.
>
> It's quite possible that that's a good idea, but that should probably
> work as an additive thing. That's something that I think of as a
> "clever technique", whereas I'm focussed on just not being naive in
> how we represent this one specific WAL record type.

BTW, if you wanted to pursue something like this, that would work with
many different types of WAL record, ISTM that a "medium level" (not
low level) approach might be the best place to start. In particular,
the way that page offset numbers are represented in many WAL records
is quite space inefficient.  A domain-specific approach built with
some understanding of how page offset numbers tend to look in practice
seems promising.

The representation of page offset numbers in PRUNE and VACUUM heapam
WAL records (and in index WAL records) always just stores an array of
2 byte OffsetNumber elements. It probably wouldn't be all that
difficult to come up with a simple scheme for compressing an array of
OffsetNumbers in WAL records. It certainly doesn't seem like it would
be all that difficult to get it down to 1 byte per offset number in
most cases (even greater improvements seem doable).

That could also be used for the xl_heap_freeze_page record type --
though only after this patch is committed. The patch makes the WAL
record use a simple array of page offset numbers, just like in
PRUNE/VACUUM records. That's another reason why the approach
implemented by the patch seems like "the natural approach" to me. It's
much closer to how heapam PRUNE records work (we have a variable
number of arrays of page offset numbers in both cases).

-- 
Peter Geoghegan

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Nathan Bossart

Date:

22 September 2022, 04:21:04

On Wed, Sep 21, 2022 at 02:41:28PM -0700, Peter Geoghegan wrote:
> On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> > Presumably a
>> > generic WAL record compression mechanism could be reused for other large
>> > records, too.  That could be much easier than devising a deduplication
>> > strategy for every record type.
>>
>> It's quite possible that that's a good idea, but that should probably
>> work as an additive thing. That's something that I think of as a
>> "clever technique", whereas I'm focussed on just not being naive in
>> how we represent this one specific WAL record type.
> 
> BTW, if you wanted to pursue something like this, that would work with
> many different types of WAL record, ISTM that a "medium level" (not
> low level) approach might be the best place to start. In particular,
> the way that page offset numbers are represented in many WAL records
> is quite space inefficient.  A domain-specific approach built with
> some understanding of how page offset numbers tend to look in practice
> seems promising.

I wouldn't mind giving this a try.

> The representation of page offset numbers in PRUNE and VACUUM heapam
> WAL records (and in index WAL records) always just stores an array of
> 2 byte OffsetNumber elements. It probably wouldn't be all that
> difficult to come up with a simple scheme for compressing an array of
> OffsetNumbers in WAL records. It certainly doesn't seem like it would
> be all that difficult to get it down to 1 byte per offset number in
> most cases (even greater improvements seem doable).
> 
> That could also be used for the xl_heap_freeze_page record type --
> though only after this patch is committed. The patch makes the WAL
> record use a simple array of page offset numbers, just like in
> PRUNE/VACUUM records. That's another reason why the approach
> implemented by the patch seems like "the natural approach" to me. It's
> much closer to how heapam PRUNE records work (we have a variable
> number of arrays of page offset numbers in both cases).

Yeah, it seems likely that we could pack offsets in single bytes in many
cases.  A more sophisticated approach could even choose how many bits to
use per offset based on the maximum in the array.  Furthermore, we might be
able to make use of SIMD instructions to mitigate any performance penalty.

I'm tempted to start by just using single-byte offsets when possible since
that should be relatively simple while still yielding a decent improvement
for many workloads.  What do you think?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Nathan Bossart

Date:

22 September 2022, 04:22:48

On Wed, Sep 21, 2022 at 02:11:36PM -0700, Peter Geoghegan wrote:
> On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Presumably a
>> generic WAL record compression mechanism could be reused for other large
>> records, too.  That could be much easier than devising a deduplication
>> strategy for every record type.
> 
> It's quite possible that that's a good idea, but that should probably
> work as an additive thing. That's something that I think of as a
> "clever technique", whereas I'm focussed on just not being naive in
> how we represent this one specific WAL record type.

Got it.  I think that's a fair point.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

22 September 2022, 16:51:01

On Wed, Sep 21, 2022 at 9:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I wouldn't mind giving this a try.

Definitely seems worth experimenting with. Many WAL records generated
during VACUUM (and during opportunistic pruning/index tuple deletion)
have offset number arrays that can be assumed to be both sorted and
unique. My guess is that these WAL record types are the most amenable
to compression at the WAL record level.

> Yeah, it seems likely that we could pack offsets in single bytes in many
> cases.  A more sophisticated approach could even choose how many bits to
> use per offset based on the maximum in the array.  Furthermore, we might be
> able to make use of SIMD instructions to mitigate any performance penalty.

I guess I'd start with some kind of differential compression that
relies on the arrays being both sorted and unique. While it might be
important to be able to compose together multiple different techniques
(something that is more than the sum of its parts can be very useful),
it seems most important to quickly validate the basic idea first.

One obvious thing that still seems worth pointing out: you may not
need to decompress anything. All that you really need to do is to get
the logic from some routine like PageIndexMultiDelete() to be executed
by a REDO routine. Perhaps it'll make sense to come up with a
representation that can just be passed to the routine directly. (I
really don't know how likely this is, but it's something to consider.)

> I'm tempted to start by just using single-byte offsets when possible since
> that should be relatively simple while still yielding a decent improvement
> for many workloads.  What do you think?

The really big wins for WAL size will come from applying high level
insights about what is really needed, in all likelihood. The main
overhead is very often generic WAL record header overhead. When it is
then it'll be hard to really move the needle just by compressing the
payload. To do that you'd need to find a way to use fewer WAL records
to do the same amount of work.

The thing that I think will really make the biggest difference is
merging PRUNE records with FREEZE records (and maybe even make the
merged record do the work of a VISIBLE record when that's possible).
Just because now you have 1 WAL record instead of 2 (or even 3) WAL
records. Obviously that's a complicated project, but it's another case
where it feels like the more efficient approach might also be simpler.
We often write a PRUNE record with only one or two items in the array,
in which case it's practically free to do some freezing, at least in
terms of WAL space overhead (as long as you can do it with the same
WAL record). Plus freezing is already inescapably tied to pruning --
we always prune a page that we're going to try to freeze in VACUUM (we
can't safely freeze dead tuples, so there is more or less a dependency
already).

Not that you shouldn't pursue compressing the payload from WAL records
as a project -- maybe that'll work very well. I'm just pointing out
that there is a bigger picture, that may or may not end up mattering
here. For the patch on this thread there certainly is a bigger picture
about costs over time. Something like that could be true for this
other patch too.

It's definitely worth considering the size of the WAL records when
there are only one or two items, how common that may be in each
individual case, etc.
For example, FREEZE records have a minimum size of 64 bytes in
practice, due to WAL record alignment overhead (the payload itself
doesn't usually have to be aligned, but the WAL header still does). It
may be possible to avoid going over the critical threshold that makes
the WAL size one MAXALIGN() quantum larger in the event of having only
a few tuples to freeze, a scenario where negative compression is
likely.

Negative compression is always a potential problem, but maybe you can
deal with it very effectively by thinking about the problem
holistically. If you're "wasting" space that was just going to be
alignment padding anyway, does it really matter at all? (Maybe there
is some reason to care, but I offhand I can't think of one.)

-- 
Peter Geoghegan

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

11 November 2022, 00:48:17

On Tue, Sep 20, 2022 at 3:12 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v2, which I'm just posting to keep CFTester happy. No real
> changes here.

Attached is v3. I'd like to move forward with commit soon. I'll do so
in the next few days, barring objections.

v3 has vacuumlazy.c pass NewRelfrozenXid instead of FreezeLimit for
the purposes of generating recovery conflicts during subsequent REDO
of the resulting xl_heap_freeze_page WAL record. This more general
approach is preparation for my patch to add page-level freezing [1].
It might theoretically lead to more recovery conflicts, but in
practice the impact should be negligible. For one thing VACUUM must
freeze *something* before any recovery conflict can happen during
subsequent REDO on a replica in hot standby. It's far more likely that
any disruptive recovery conflicts come from pruning.

It also makes the cutoff_xid field from the xl_heap_freeze_page WAL
record into a "standard latestRemovedXid format" field. In other words
it backs up an XID passed by vacuumlazy.c caller during original
execution (not in the REDO routine, as on HEAD). To make things
clearer, the patch also renames the nearby xl_heap_visible.cutoff_xid
field to xl_heap_visible.latestRemovedXid. Now there are no WAL
records with a field called "cutoff_xid" (they're all called
"latestRemovedXid" now). This matches PRUNE records, and B-Tree DELETE
records.

The overall picture is that all REDO routines (for both heapam and
index AMs) now advertise that they have a field that they use to
generate recovery conflicts that follows a standard format.  All
latestRemovedXid XIDs are applied in a standard way during REDO: by
passing them to ResolveRecoveryConflictWithSnapshot(). Users can grep
the output of tools like pg_waldump to find latestRemovedXid fields,
without necessarily needing to give any thought to which kind of WAL
records are involved, or even the rmgr. Presenting this information
precisely and uniformity seems useful to me. (Perhaps we should have a
truly generic name, which latestRemovedXid isn't, but that can be
handled separately.)

[1] https://commitfest.postgresql.org/39/3843/
--
Peter Geoghegan

Attachment

v3-0001-Shrink-freeze-WAL-records-via-deduplication.patch

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Justin Pryzby

Date:

11 November 2022, 02:59:58

On Thu, Nov 10, 2022 at 04:48:17PM -0800, Peter Geoghegan wrote:
> On Tue, Sep 20, 2022 at 3:12 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Attached is v2, which I'm just posting to keep CFTester happy. No real
> > changes here.
> 
> Attached is v3. I'd like to move forward with commit soon. I'll do so
> in the next few days, barring objections.

Note that this comment is dangling in your patch:

+{
+       Page            page = BufferGetPage(buffer);
+
+       /* nor when there are no tuples to freeze */
...
-       /* Caller should not call me on a non-WAL-logged relation */
-       Assert(RelationNeedsWAL(reln));
-       /* nor when there are no tuples to freeze */
-       Assert(ntuples > 0);

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

11 November 2022, 18:38:57

On Thu, Nov 10, 2022 at 7:00 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Note that this comment is dangling in your patch:

Attached is v4, which removes the old comments you pointed out were
now out of place (they weren't adding much anyway). Also fixed bitrot
against HEAD from today's visibility map commit from Jeff Davis.

There is a more substantive change here, too. Like v3, v4 refactors
the *mechanical* details of how the XID based cutoff is handed down.
However, unlike v3, v4 goes back to using vacuumlazy.c's FreezeLimit
as the starting point for generating a latestRemovedXid. It seemed
better to deal with the recovery conflict issues created by my big
page-level freezing/freezing strategy patch set in the patch set
itself.

Will commit this early next week barring any objections.

Thanks
--
Peter Geoghegan

Attachment

v4-0001-Shrink-freeze-WAL-records-via-deduplication.patch

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

15 November 2022, 18:26:05

On Fri, Nov 11, 2022 at 10:38 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v4, which removes the old comments you pointed out were
> now out of place (they weren't adding much anyway). Also fixed bitrot
> against HEAD from today's visibility map commit from Jeff Davis.

Pushed something like this earlier today, though without any changes
to VISIBLE records.

I just started a new thread to discuss standardizing the symbol name
for recovery conflict XID cutoffs reported by tools like pg_waldump.
Seemed better to deal with VISIBLE records in the scope of that new
refactoring patch.

Thanks
-- 
Peter Geoghegan

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Andres Freund

Date:

09 January 2023, 21:43:08

Hi,

On 2022-11-15 10:26:05 -0800, Peter Geoghegan wrote:
> Pushed something like this earlier today, though without any changes
> to VISIBLE records.

While updating a patch to log various offsets in pg_waldump, I noticed a few
minor issues in this patch:

ISTM that some of the page level freezing functions are misnamed. In heapam.c
the heap_xlog* routines are for replay, afaict. However
heap_xlog_freeze_plan() is used to WAL log the freeze
plan. heap_xlog_freeze_page() is used to replay that WAL record. Probably your
brain is too used to nbtree/ :).

I think s/heap_xlog_freeze/heap_log_freeze/ would mostly do the trick, except
that heap_xlog_new_freeze_plan() doesn't quite fit in the scheme.

The routines then also should be moved a bit up, because right now they're
inbetween other routines doing WAL replay, adding to the confusion.

The memcpy in heap_xlog_freeze_page() seems a tad odd. I assume that the
alignment guarantees for xl_heap_freeze_plan are too weak? But imo it's
failure prone (and I'm not sure strictly speaking legal from an undefined
behaviour POV) to form a pointer to a misaligned array. Even if we then later
just memcpy() from those pointers.

Greetings,

Andres Freund

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

09 January 2023, 22:18:21

On Mon, Jan 9, 2023 at 1:43 PM Andres Freund <andres@anarazel.de> wrote:
> ISTM that some of the page level freezing functions are misnamed. In heapam.c
> the heap_xlog* routines are for replay, afaict. However
> heap_xlog_freeze_plan() is used to WAL log the freeze
> plan. heap_xlog_freeze_page() is used to replay that WAL record. Probably your
> brain is too used to nbtree/ :).

Sometimes I wonder why other people stubbornly insist on not starting
every function name with an underscore.   :-)

> I think s/heap_xlog_freeze/heap_log_freeze/ would mostly do the trick, except
> that heap_xlog_new_freeze_plan() doesn't quite fit in the scheme.

> The routines then also should be moved a bit up, because right now they're
> inbetween other routines doing WAL replay, adding to the confusion.

I believe that I used this scheme because of the fact that the new
functions were conceptually related to REDO routines, even though they
run during original execution. I'm quite happy to revise the code
based on your suggestions, though.

> The memcpy in heap_xlog_freeze_page() seems a tad odd. I assume that the
> alignment guarantees for xl_heap_freeze_plan are too weak?

They're not too weak. I'm not sure why the memcpy() was used. I see
your point; it makes you wonder if it must be necessary, which then
seems to call into question why it's okay to access the main array as
an array. I can change this detail, too.

I'll try to get back to it this week.

--
Peter Geoghegan

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

12 January 2023, 00:06:31

On Mon, Jan 9, 2023 at 2:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I'll try to get back to it this week.

Attached patch fixes up these issues. It's almost totally mechanical.

(Ended up using "git diff --color-moved=dimmed-zebra
--color-moved-ws=ignore-all-space" with this, per your recent tip,
which did help.)

--
Peter Geoghegan

Attachment

v1-0001-Rename-freeze-plan-dedup-routines.patch

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Andres Freund

Date:

12 January 2023, 00:44:30

Hi,

On 2023-01-11 16:06:31 -0800, Peter Geoghegan wrote:
> On Mon, Jan 9, 2023 at 2:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > I'll try to get back to it this week.
> 
> Attached patch fixes up these issues. It's almost totally mechanical.

Looks better, thanks!


> (Ended up using "git diff --color-moved=dimmed-zebra
> --color-moved-ws=ignore-all-space" with this, per your recent tip,
> which did help.)

It's a really useful feature. I configured git to always use
--color-moved=dimmed-zebra, but haven't quite dared to enable
--color-moved-ws=ignore-all-space by default.

Greetings,

Andres Freund

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From

Peter Geoghegan

Date:

12 January 2023, 01:32:18

On Wed, Jan 11, 2023 at 4:44 PM Andres Freund <andres@anarazel.de> wrote:
> > Attached patch fixes up these issues. It's almost totally mechanical.
>
> Looks better, thanks!

Pushed that just now.

Thanks
-- 
Peter Geoghegan