Re: Combine Prune and Freeze records emitted by vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Combine Prune and Freeze records emitted by vacuum
Date
Msg-id 20240320191710.y6b4hterii5qf76y@liskov
Whole thread Raw
In response to Re: Combine Prune and Freeze records emitted by vacuum  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Combine Prune and Freeze records emitted by vacuum
List pgsql-hackers
On Wed, Mar 20, 2024 at 03:15:49PM +0200, Heikki Linnakangas wrote:
> On 20/03/2024 03:36, Melanie Plageman wrote:
> > On Mon, Mar 18, 2024 at 01:15:21AM +0200, Heikki Linnakangas wrote:
> > > On 15/03/2024 02:56, Melanie Plageman wrote:
> > > > Okay, so I was going to start using xl_heap_prune for vacuum here too,
> > > > but I realized it would be bigger because of the
> > > > snapshotConflictHorizon. Do you think there is a non-terrible way to
> > > > make the snapshotConflictHorizon optional? Like with a flag?
> > > 
> > > Yeah, another flag would do the trick.
> > 
> > Okay, I've done this in attached v4 (including removing
> > XLOG_HEAP2_VACUUM). I had to put the snapshot conflict horizon in the
> > "main chunk" of data available at replay regardless of whether or not
> > the record ended up including an FPI.
> > 
> > I made it its own sub-record (xlhp_conflict_horizon) less to help with
> > alignment (though we can use all the help we can get there) and more to
> > keep it from getting lost. When you look at heapam_xlog.h, you can see
> > what a XLOG_HEAP2_PRUNE record will contain starting with the
> > xl_heap_prune struct and then all the sub-record types.
> 
> Ok, now that I look at this, I wonder if we're being overly cautious about
> the WAL size. We probably could just always include the snapshot field, and
> set it to InvalidTransactionId and waste 4 bytes when it's not needed. For
> the sake of simplicity. I don't feel strongly either way though, the flag is
> pretty simple too.

That will mean that all vacuum records are at least 3 bytes bigger than
before -- which makes it somewhat less defensible to get rid of
xl_heap_vacuum.

That being said, I ended up doing an unaligned access when I
packed it and made it optional, so maybe it is less user-friendly.

But I also think that making it optional is more clear for vacuum which
will never use it.

> I realized that the WAL record format changes are pretty independent from
> the rest of the patches. They could be applied before the rest. Without the
> rest of the changes, we'll still write two WAL records per page in vacuum,
> one to prune and another one to freeze, but it's another meaningful
> incremental step. So I reshuffled the patches, so that the WAL format is
> changed first, before the rest of the changes.

Ah, great idea! That eliminates the issue of preliminary commits having
larger WAL records that then get streamlined.

> 0001-0008: These are the WAL format changes. There's some comment cleanup
> needed, but as far as the code goes, I think these are pretty much ready to
> be squashed & committed.

My review in this email is *only* for 0001-0008. I have not looked at
the rest yet.

> From 06d5ff5349a8aa95cbfd06a8043fe503b7b1bf7b Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 14:50:14 +0200
> Subject: [PATCH v5 01/26] Merge prune, freeze and vacuum WAL record formats
> 
> The new combined WAL record is now used for pruning, freezing and 2nd
> pass of vacuum. This is in preparation of changing vacuuming to write
> a combined prune+freeze record per page, instead of separate two
> records. The new WAL record format now supports that, but the code
> still always writes separate records for pruning and freezing.

Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).

> 
> XXX I tried to lift-and-shift the code from v4 patch set as unchanged
> as possible, for easier review, but some noteworthy changes:

In the final commit message, I think it is worth calling out that all of
those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and
shifted from one file to another. When I am reviewing a big diff, it is
always helpful to know where I need to review line-by-line.

> 
> - Instead of passing PruneState and PageFreezeResult to
>   log_heap_prune_and_freeze(), pass the arrays of frozen, redirected
>   et al offsets directly. That way, it can be called from other places.

good idea.

> - moved heap_xlog_deserialize_prune_and_freeze() from xactdesc.c to
>   heapdesc.c. (Because that's clearly where it belongs)

:)

> From cd6cdaebb362b014733e99ecd868896caf0fb3aa Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 13:45:01 +0200
> Subject: [PATCH v5 02/26] Keep the original numbers for existing WAL records
> 
> Doesn't matter much because the WAL format is not compatible across
> major versions anyway. But still seems nice to keep the identifiers
> unchanged when we can. (There's some precedence for this if you search
> the git history for "is free, was").

sounds good.

> From d3207bb557aa1d2868a50d357a06318a6c0cb5cd Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 13:48:29 +0200
> Subject: [PATCH v5 03/26] Rename record to XLOG_HEAP2_PRUNE_FREEZE
> 
> To clarify that it also freezes now, and to make it clear that it's
> significantly different from the old XLOG_HEAP2_PRUNE format.

+1 

> From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 13:49:59 +0200
> Subject: [PATCH v5 04/26] 'nplans' is a pointer
> 
> I'm surprised the compiler didn't warn about this

oops. while looking at this, I noticed that the asserts I added that
nredirected > 0, ndead > 0, and nunused > 0 have the same problem.

> ---
>  src/backend/access/rmgrdesc/heapdesc.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
> index 8b94c869faf..9ef8a745982 100644
> --- a/src/backend/access/rmgrdesc/heapdesc.c
> +++ b/src/backend/access/rmgrdesc/heapdesc.c
> @@ -155,8 +155,7 @@ heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
>          cursor += sizeof(OffsetNumber) * *nunused;
>      }
>  
> -    if (nplans > 0)

> From 59f3f80f82ed7a63d86c991d0cb025e4cde2caec Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 13:36:41 +0200
> Subject: [PATCH v5 06/26] Fix logging snapshot conflict horizon.
> 
> - it was accessed without proper alignment, which won't work on
>   architectures that are strict about alignment. Use memcpy.

wow, oops. thanks for fixing this!

> - in heap_xlog_prune_freeze, the code tried to access the xid with
>   "(xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);" But 'xlrec'
>   was "xl_heap_prune *" rather than "char *". That happened to work,
>   because sizeof(xl_heap_prune) == 1, but make it more robust by
>   adding a cast to char *.

good catch.

> - remove xlhp_conflict_horizon and store a TransactionId directly. A
>   separate struct would make sense if we needed to store anything else
>   there, but for now it just seems like more code.

makes sense. I just want to make sure heapam_xlog.h makes it extra clear
that this is happening. I see your comment addition. If you combine it
with my comment additions in the attached patch for 0001, hopefully that
makes it clear enough.

> From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 14:03:06 +0200
> Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().
> 
> XXX: This should be rewritten, but I tried to at least list some
> important points.

Are you thinking that it needs to mention more things or that the things
it mentions need more detail?

> From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 20 Mar 2024 14:53:31 +0200
> Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()
> 
> Mostly to make local variables more tightly-scoped.

So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.

The WAL record data isn't actually copied into the buffer until

    recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);

after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.

I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.

I would like to review the rest of the suggested changes in this patch
after we fix the issue I just mentioned.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Possibility to disable `ALTER SYSTEM`
Next
From: Nathan Bossart
Date:
Subject: Re: cleanup patches for incremental backup