Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id 20220318145309.dcncwwc4q3ggkdo4@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-bugs
Hi,

On 2022-03-14 12:05:14 -0700, Peter Geoghegan wrote:
> On Sat, Mar 12, 2022 at 1:49 PM Andres Freund <andres@anarazel.de> wrote:
> > > Periodically refreshing OldestXmin for freezing won't work, though. At
> > > the very least it seems much less compelling than the vistest idea.
> >
> > I think there's a fair bit of value in using an "as aggressive as possible"
> > initial OldestXmin. Which is used to set a bunch of cutoffs / determine
> > aggressiveness etc.
> 
> I just don't think that it makes very much difference, since we're
> only talking about freezing here -- which is not something we tend to
> be too eager with anyway (FreezeLimit is very seldom the same as
> OldestXmin, etc).

Well, it's not uncommon to VACUUM FREEZE after ETL etc. But more importantly,
the concrete OldestXmin value matters, because it's the difference between
removing or keeping dead rows.


> Maybe it could take a long time for vac_open_indexes() to return, but
> that's really an edge case. I'm puzzled why you're placing so much
> emphasis on this. I can see why that's valuable in a theoretical
> abstract kind of way -- but that's about it. And so it feels like I'm
> still missing something.

I'm not saying it's a *huge* improvement. But especially for smaller tables
it's not uncommon to see multiple autovacuums on a table in quick succession
just because of a slightly outdated horizon. Of course you're not going to see
that in a workload with a handfull of heavily modified tables, but if there's
many tables it's a different story.


> > Once we compute "measured" relfrozenxid however, there's afaict otherwise not
> > much point in updating OldestXmin as we go, at least if we start to use
> > vistest for the horizon determinations inside vacuumlazy.c.
> 
> > If we don't switch to using vistest for HTSV determinations in vacuumlazy.c,
> > then I don't think we can refresh vistest without causing problems in
> > lazy_scan_prune().
> 
> Why? Right now we could easily have a concurrent opportunistic prune
> that uses a vistest that's well ahead of the one in VACUUM. And so
> AFAICT it's already quite possible that any page encountered within
> lazy_scan_prune was already pruned using a much more recent
> vistest. So what's the problem with doing the same thing inside the
> backend running VACUUM instead?

That was a brainfart on my end...


> > But I think refreshing horizons is a different discussion from using
> > as-recent-as-possible initial horizons...
> 
> Agreed. This is why I find your emphasis on as-recent-as-possible
> initial horizons confusing. It seems as if you're giving almost equal
> emphasis to both issues, even though one issue (more eager pruning
> during VACUUM) is of obvious practical value, while the other
> (as-recent-as-possible initial horizons) is pretty theoretical and
> academic -- at best.

I'm not intending to weigh them the same. I think using a more recent horizon
is more important than you describe here, but "computed horizons" will be a
considerably larger win.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: Re: Logical replication stops dropping used initial-sync replication slots
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum