Thread: Re: Fixes inconsistent behavior in vacuum when it processes multiple relations

On Wed, Jun 18, 2025 at 11:15:31AM -0400, shihao zhong wrote:
> I investigated the code and found a small bug with how we're passing
> the VacuumParams pointer.
> 
> The call flow is
> ExecVacuum -> vacuum -> vacuum_rel
> 
> The initial VaccumParams pointer is set in ExecVacuum
> In vacuum_rel, this pointer might change because it needs to determine
> whether to truncate and perform index_cleanup.

Nice find!

My first reaction is to wonder whether we should 1) also make a similar
change to vacuum() for some future-proofing or 2) just teach vacuum_rel()
to make a local copy of the parameters that it can scribble on.  In the
latter case, we might want to assert that the parameters don't change after
calls to vacuum() and vacuum_rel() to prevent this problem from recurring.
That leads me to think (1) might be the better option, although I'm not too
wild about the subtlety of the fix.

-- 
nathan



>> However, Option 1) would be my go-to option for HEAD ...

Updated my patch to apply the same rules to all VacuumParams. Also I
am seeing clusterParams as a pass reference, not sure if we should
also change that to prevent future issues. But that should be another
patch.

Attachment
On Fri, Jun 20, 2025 at 9:34 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jun 19, 2025 at 03:30:26PM -0500, Nathan Bossart wrote:
> > After thinking about this some more, I'm wondering if it would be better to
> > pursue option (2) because it's a little less invasive (which is important
> > because this will need to be back-patched).  In any case, we have a similar
> > problem when recursing to the TOAST table, which can be fixed by copying
> > the params at the top of vacuum_rel().
> >
> > While testing out the attached patch, I noticed a couple of other
> > interesting problems in this area [0].
> >
> > [0] https://postgr.es/m/aFRxC1W_kZU9OjJ9%40nathan
>
> Hmm.  I like the simplicity of option 2) for the purpose the back
> branches and the post-feature-freeze v18.
>
> However, Option 1) would be my go-to option for HEAD (as of v19
> opening for business), but I think that we should harden the code more
> than suggested and treat all VacuumParams as purely input arguments
> rather keeping some pointers to it depending on the code path we are
> dealing with, so as no callers of these inner routines is surprised by
> changes that may happen internally.

I'd suggest just marking the VacuumParams *params with const, so
that the user can not change its content, or the compiler will error
out, it will force the user to make a copy if changes are needed.
I see vacuum_get_cutoffs already has the const signature.

Passing by const pointer is more efficient than passing by structure value.

> Hence, reading the code of v2,
> I'd suggest to apply the same rule to vacuum_get_cutoffs(),
> do_analyze_rel() and heap_vacuum_rel().  Except if I am missing
> something, it looks like all these calls should be OK with this new
> policy.  This implies also changing relation_vacuum() in tableam.h,
> which can be a HEAD-only change anyway.
> --
> Michael



--
Regards
Junwang Zhao



On Fri, Jun 20, 2025 at 10:14 PM shihao zhong <zhong950419@gmail.com> wrote:
>
> >> However, Option 1) would be my go-to option for HEAD ...
>
> Updated my patch to apply the same rules to all VacuumParams. Also I
> am seeing clusterParams as a pass reference, not sure if we should
> also change that to prevent future issues. But that should be another
> patch.

 static inline void
-table_relation_vacuum(Relation rel, struct VacuumParams *params,
+table_relation_vacuum(Relation rel, struct VacuumParams params,
    BufferAccessStrategy bstrategy)
 {
  rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index bc37a80dc74..9a8c63352da 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -336,7 +336,7 @@ extern PGDLLIMPORT int64 parallel_vacuum_worker_delay_ns;

 /* in commands/vacuum.c */
 extern void ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool
isTopLevel);
-extern void vacuum(List *relations, VacuumParams *params,
+extern void vacuum(List *relations, VacuumParams params,
     BufferAccessStrategy bstrategy, MemoryContext vac_context,
     bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
@@ -357,7 +357,7 @@ extern void vac_update_relstats(Relation relation,
  bool *frozenxid_updated,
  bool *minmulti_updated,
  bool in_outer_xact);
-extern bool vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
+extern bool vacuum_get_cutoffs(Relation rel, const VacuumParams params,
     struct VacuumCutoffs *cutoffs);
 extern bool vacuum_xid_failsafe_check(const struct VacuumCutoffs *cutoffs);
 extern void vac_update_datfrozenxid(void);
@@ -398,7 +398,7 @@ extern void parallel_vacuum_main(dsm_segment *seg,
shm_toc *toc);

 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation,
- VacuumParams *params, List *va_cols, bool in_outer_xact,
+ VacuumParams params, List *va_cols, bool in_outer_xact,

It's a bit odd that we have both `VacuumParams *params` and
`struct VacuumParams *params`. Perhaps you could remove
the struct keyword in this patch to make it consistent.


--
Regards
Junwang Zhao



On Mon, Jun 23, 2025 at 08:48:35AM +0900, Michael Paquier wrote:
> Anyway, here is an attempt at leaning all that.  I am really tempted
> to add a couple of const markers to force VacuumParams to be in
> read-only mode, even if it means doing so for vacuum() but not touch
> at vacuum_rel() where we double-check the reloptions for the truncate
> and index cleanup options.  That would be of course v19-only material.
> Thoughts or opinions?

Is the idea to do something like this for v19 and this [0] for the
back-branches?

I think the recurse-to-TOAST-table case is still broken with your patch.
We should probably move the memcpy() for toast_vacuum_params to the very
top of vacuum_rel().  Otherwise, your patch looks generally reasonable to
me.

[0] https://postgr.es/m/aFRzYhOTZcRgKPLu%40nathan

-- 
nathan



On Mon, Jun 23, 2025 at 10:38:40AM -0500, Nathan Bossart wrote:
> Is the idea to do something like this for v19 and this [0] for the
> back-branches?

Yes, that would be the idea.  Let's do the bug fix first on all the
branches, then do the refactoring.  Knowing that I'm indirectly
responsible for this mess, I would like to take care of that myself.
Would that be OK for you?

> I think the recurse-to-TOAST-table case is still broken with your patch.
> We should probably move the memcpy() for toast_vacuum_params to the very
> top of vacuum_rel().  Otherwise, your patch looks generally reasonable to
> me.

Ah, right, I have managed to mess up this part.  Sounds to me that we
should provide more coverage for both the truncate and index cleanup
cases, as well.

The updated version attached includes tests that were enough to make
the recursive TOAST table case fail for TRUNCATE and INDEX_CLEANUP,
where I have relied on an EXTERNAL column to fill in the TOAST
relation with data cheaply.  The pgstattuple case is less cheap but
I've minimized the load to be able to trigger the index cleanup.  It
was tricky to get the threshold right for INDEX_CLEANUP in the case of
TOAST table, but that's small enough to have a short run time while it
should be large enough to trigger a cleanup of the TOAST indexes (my
CI quota has been reached this month, so I can only rely on the CF bot
for some pre-validation job).  If the thresholds are too low, we may
need to bump the number of tuples.  I'd be OK to remove the TOAST
parts if these prove to be unstable, but let's see.  At least these
tests are validating the contents of the patch for the TOAST options,
and they were looking stable on my machine.

Another approach that we could use is an injection point with some
LOGs that show some information based on what VacuumParams holds.
That would be the cheapest method (no need for any data), and entirely
stable as we would look at the stack.  Perhaps going down to that is
not really necessary for the sake of this thread.

Attached are two patches to have a clean git history:
- 0001 is the basic fix, to-be-applied first on all the branches.
- 0002 is the refactoring, for v19 once it opens up.

What do you think?
--
Michael

Attachment
On Tue, Jun 24, 2025 at 10:18:18AM +0900, Michael Paquier wrote:
> Knowing that I'm indirectly responsible for this mess, I would like to
> take care of that myself.  Would that be OK for you?

I would be grateful if you took care of it.

> Another approach that we could use is an injection point with some
> LOGs that show some information based on what VacuumParams holds.
> That would be the cheapest method (no need for any data), and entirely
> stable as we would look at the stack.  Perhaps going down to that is
> not really necessary for the sake of this thread.

+1 for this.  I did something similar to verify the other bugs I reported,
and this seems far easier to maintain than potentially-unstable tests that
require lots of setup and that depend on secondary effects.

-- 
nathan



On Wed, Jun 25, 2025 at 10:31:35AM +0900, Michael Paquier wrote:
> On Tue, Jun 24, 2025 at 11:30:13AM -0500, Nathan Bossart wrote:
>> On Tue, Jun 24, 2025 at 10:18:18AM +0900, Michael Paquier wrote:
>> > Knowing that I'm indirectly responsible for this mess, I would like to
>> > take care of that myself.  Would that be OK for you?
>> 
>> I would be grateful if you took care of it.
> 
> Okay, applied the main fix down to v13 then.

Thanks!

> Attached is the remaining patch for HEAD, planned once v19 opens, and
> the tests I have used on the back-branches as a txt to not confuse the
> CI, for reference.

Looks reasonable to me.

-- 
nathan