Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id 20220220032833.tvdxa3ddfuvu3ezu@alap3.anarazel.de
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2022-02-19 19:07:39 -0800, Peter Geoghegan wrote:
> On Sat, Feb 19, 2022 at 7:01 PM Andres Freund <andres@anarazel.de> wrote:
> > > We can either do that, or we can throw an error concerning corruption
> > > when heap_page_prune notices orphaned tuples. Neither seems
> > > particularly appealing. But it definitely makes no sense to allow
> > > lazy_scan_prune to spin in a futile attempt to reach agreement with
> > > heap_page_prune about a DEAD tuple really being DEAD.
> >
> > Yea, this sucks. I think we should go for the rewrite of the
> > heap_prune_chain() logic. The current approach is just never going to be
> > robust.
> 
> No, it just isn't robust enough. But it's not that hard to fix. My
> patch really wasn't invasive.

I think we're in agreement there. We might think at some point about
backpatching too, but I'd rather have it stew in HEAD for a bit first.


> I confirmed that HeapTupleSatisfiesVacuum() and
> heap_prune_satisfies_vacuum() agree that the heap-only tuple at offnum
> 2 is HEAPTUPLE_DEAD -- they are in agreement, as expected (so no
> reason to think that there is a new bug involved). The problem here is
> indeed just that heap_prune_chain() can't "get to" the tuple, given
> its current design.

Right.

The reason that the "adversarial" patch makes a different is solely that it
changes the heap_surgery test to actually kill an item, which it doesn't
intend:

create temp table htab2(a int);
insert into htab2 values (100);
update htab2 set a = 200;
vacuum htab2;

-- redirected TIDs should be skipped
select heap_force_kill('htab2'::regclass, ARRAY['(0, 1)']::tid[]);


If the vacuum can get the cleanup lock due to the adversarial patch, the
heap_force_kill() doesn't do anything, because the first item is a
redirect. However if it *can't* get a cleanup lock, heap_force_kill() instead
targets the root item. Triggering the endless loop.


Hm. I think this might be a mild regression in 14. In < 14 we'd just skip the
tuple in lazy_scan_heap(), but now we have an uninterruptible endless
loop.


We'd do completely bogus stuff later in < 14 though, I think we'd just leave
it in place despite being older than relfrozenxid, which obviously has its own
set of issues.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Next
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations