Re: Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id CA+TgmoaMAxnm7kPW8auJ-T1-nZ+D8LrBxkAHZS75utvsk_ZfiA@mail.gmail.com
Whole thread Raw
In response to Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Emit fewer vacuum records by reaping removable tuples during pruning
Re: Emit fewer vacuum records by reaping removable tuples during pruning
List pgsql-hackers
On Fri, Jan 5, 2024 at 3:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Yes, attached is a patch set which does this. My previous patchset
> already reduced the number of places we unlock the buffer and update
> the freespace map in lazy_scan_heap(). This patchset combines the
> lazy_scan_prune() and lazy_scan_noprune() FSM update cases. I also
> have a version which moves the freespace map updates into
> lazy_scan_prune() and lazy_scan_noprune() -- eliminating all of these
> from lazy_scan_heap(). This is arguably more clear. But Andres
> mentioned he wanted fewer places unlocking the buffer and updating the
> FSM.

Hmm, interesting. I haven't had time to study this fully today, but I
think 0001 looks fine and could just be committed. Hooray for killing
useless variables with dumb names.

This part of 0002 makes me very, very uncomfortable:

+               /*
+                * Update all line pointers per the record, and repair
fragmentation.
+                * We always pass no_indexes as true, because we don't
know whether or
+                * not this option was used when pruning. This reduces
the validation
+                * done on replay in an assert build.
+                */
+               heap_page_prune_execute(buffer, true,

redirected, nredirected,
                                                                nowdead, ndead,

nowunused, nunused);

The problem that I have with this is that we're essentially saying
that it's ok to lie to heap_page_prune_execute because we know how
it's going to use the information, and therefore we know that the lie
is harmless. But that's not how things are supposed to work. We should
either find a way to tell the truth, or change the name of the
parameter so that it's not a lie, or change the function so that it
doesn't need this parameter in the first place, or something. I can
occasionally stomach this sort of lying as a last resort when there's
no realistic way of adapting the code being called, but that's surely
not the case here -- this is a newborn parameter, and it shouldn't be
a lie on day 1. Just imagine if some future developer thought that the
no_indexes parameter meant that the relation actually didn't have
indexes (the nerve of them!).

I took a VERY fast look through the rest of the patch set and I think
that the overall direction looks like it's probably reasonable, but
that's a very preliminary conclusion which I reserve the right to
revise after studying further. @Andres: Are you planning to
review/commit any of this? Are you hoping that I'm going to do that?
Somebody else want to jump in here?

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Make psql ignore trailing semicolons in \sf, \ef, etc
Next
From: Nathan Bossart
Date:
Subject: Re: Adding a pg_get_owned_sequence function?