Re: BUG #17821: Assertion failed in heap_update() due to heap pruning - Mailing list pgsql-bugs
From | Noah Misch |
---|---|
Subject | Re: BUG #17821: Assertion failed in heap_update() due to heap pruning |
Date | |
Msg-id | 20250402170436.4a.nmisch@google.com Whole thread Raw |
In response to | Re: BUG #17821: Assertion failed in heap_update() due to heap pruning (Noah Misch <noah@leadboat.com>) |
List | pgsql-bugs |
On Mon, Mar 03, 2025 at 07:34:42PM -0800, Noah Misch wrote: > On Mon, Mar 03, 2025 at 05:07:10PM -0500, Andres Freund wrote: > > I just fixed skink, the valgrind animal, so it runs not just the main > > regression tests but all tests with a valgrind-ified postgres. > > Thanks. > > > Unfortunately, > > the next run triggered a failure in the test added in this thread: > > > > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-03-03%2019%3A52%3A38&stg=injection_points-check > > > > diff -U3 /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/modules/injection_points/expected/syscache-update-pruned.out /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/injection_points/isolation/results/syscache-update-pruned.out > > --- /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/modules/injection_points/expected/syscache-update-pruned.out 2025-01-25 19:30:50.005386842 +0000 > > +++ /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/injection_points/isolation/results/syscache-update-pruned.out 2025-03-0321:08:02.025314915 +0000 > > @@ -75,6 +75,7 @@ > > SELECT FROM injection_points_wakeup('heap_update-before-pin'); > > <waiting ...> > > step grant1: <... completed> > > +ERROR: tuple concurrently deleted > > step wakegrant4: <... completed> > > step inspect4: > > SELECT relhastriggers, relhassubclass FROM pg_class > > @@ -82,6 +83,6 @@ > > > > relhastriggers|relhassubclass > > --------------+-------------- > > -f |f > > +t |t > > (1 row) > > That isolation permutation tests an unfixed bug. Here, it's giving a result > as though the bug were fixed. The suite passed the next two skink runs. I'd > tend to defer debugging exactly what went wrong until the project to fix the > bug under test. I could delete the permutation, or I could leave it awhile to > see how high-probability this failure is. I'm inclined to leave it until it > gets four failures, then delete the permutation. Alexander Lakhin shared a recipe that reproduced it. I found permutation 2 also failed under that recipe, so removing permutation 3 got less attractive. These tests need a concept of "wait until VACUUM definitely will prune a particular tuple". My removable_cutoff() header comment made an argument for having achieved that, but it had two gaps. The attached patch passes 1000 Valgrind iterations of the spec. Without the patch, ~3% of iterations failed. Commit c2dc1a7 added DISABLE_PAGE_SKIPPING to stop buffer pins thwarting other VACUUM tests. However, I suspect FREEZE is necessary and sufficient in all these tests, since we need to coax this code to insist on the cleanup lock: /* * If we didn't get the cleanup lock, we can still collect LP_DEAD * items in the dead_items area for later vacuuming, count live and * recently dead tuples for vacuum logging, and determine if this * block could later be truncated. If we encounter any xid/mxids that * require advancing the relfrozenxid/relminxid, we'll have to wait * for a cleanup lock and call lazy_scan_prune(). */ if (!got_cleanup_lock && !lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items)) { /* * lazy_scan_noprune could not do all required processing. Wait * for a cleanup lock, and call lazy_scan_prune in the usual way. */ Assert(vacrel->aggressive); LockBuffer(buf, BUFFER_LOCK_UNLOCK); LockBufferForCleanup(buf); got_cleanup_lock = true; } FREEZE causes the VACUUM to "require advancing the relfrozenxid/relminxid", hence the effectiveness of FREEZE for this purpose. The code has moved around, but I think the policy was the same at the time of that commit c2dc1a7. Per the doc for DISABLE_PAGE_SKIPPING, it's for working around a corrupt vm, not for pin contention. The spec still passes 1000 Valgrind iterations if I remove DISABLE_PAGE_SKIPPING.
Attachment
pgsql-bugs by date: