Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date
Msg-id CAH2-Wzk2g-muJ8ndNvgf9B=GsnSONRuW-0KQ9+ge-x5-NNyBmw@mail.gmail.com
Whole thread Raw
In response to Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Andres Freund <andres@anarazel.de>)
Responses Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
List pgsql-hackers
On Thu, Jun 10, 2021 at 5:58 PM Andres Freund <andres@anarazel.de> wrote:
> The problem with writing a test is likely to find a way to halfway
> reliably schedule a transaction abort after pruning, but before the
> tuple-removal loop? Does anybody see a trick to do so?

I asked Alexander about using his pending stop events infrastructure
patch to test this code, back when it did the tupgone stuff rather
than loop:

https://postgr.es/m/CAH2-Wz=Tb7bAgCFt0VFA0YJ5Vd1RxJqZRc

I can't see any better way.

ISTM that it would be much more useful to focus on adding an assertion
(or maybe even a "can't happen" error) that fails when the DEAD/goto
path is reached with a tuple whose xmin wasn't aborted. If that was in
place then we would have caught the bug in
GetOldestNonRemovableTransactionId() far sooner. That might actually
catch other bugs in the future.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: "an SQL" vs. "a SQL"
Next
From: Tom Lane
Date:
Subject: Re: Race condition in recovery?