Home > mailing lists

Re: Parallel vacuum workers prevent the oldest xmin from advancing - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Parallel vacuum workers prevent the oldest xmin from advancing
Date	October 8, 2021 17:21:31
Msg-id	CAH2-WznnLaBADKrTFR4-uny-FT0GwDu0gj13-jZTTH6ioXuT4Q@mail.gmail.com Whole thread Raw
In response to	Re: Parallel vacuum workers prevent the oldest xmin from advancing (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List	pgsql-hackers

Tree view

On Fri, Oct 8, 2021 at 8:13 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2021-Oct-06, Masahiko Sawada wrote:
> > A customer reported that during parallel index vacuum, the oldest xmin
> > doesn't advance. Normally, the calculation of oldest xmin
> > (ComputeXidHorizons()) ignores xmin/xid of processes having
> > PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum
> > workers don’t set their statusFlags, the xmin of the parallel vacuum
> > worker is considered to calculate the oldest xmin. This issue happens
> > from PG13 where the parallel vacuum was introduced. I think it's a
> > bug.
>
> Augh, yeah, I agree this is a pretty serious problem.

So is this comparable problem, which happens to be much older:
https://postgr.es/m/CAH2-WzkjrK556enVtFLmyXEdw91xGuwiyZVep2kp5yQT_-3JDg@mail.gmail.com

In both cases we see bugs (or implementation deficiencies) that
accidentally block ComputeXidHorizons() for hours, when that isn't
truly necessary. Practically all users are not sure of whether or not
VACUUM behaves like a long running transaction already, in general, so
we shouldn't be surprised that it takes so long for us to hear about
issues like this.

I think that we should try to find a way of making this whole class of
problems easier to identify in production. There needs to be greater
visibility into what process holds back VACUUM, and how long that
lasts -- something easy to use, and *obvious*. That would be a very
useful feature in general. It would also make catching these issues
early far more likely. It's just *not okay* that you have to follow long
and complicated instructions [1] to get just some of this information.
How can something this important just be an afterthought?

[1] https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/

--
Peter Geoghegan

pgsql-hackers by date:

From: Matthias van de Meent
Date: 08 October 2021, 17:17:39
Subject: Re: RFC: compression dictionaries for JSONB

From: Lily Liu
Date: 08 October 2021, 17:24:33
Subject: Query rewrite(optimization) using constraints

Re: Parallel vacuum workers prevent the oldest xmin from advancing - Mailing list pgsql-hackers

Previous

Next