Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: decoupling table and index vacuum
Date
Msg-id CA+Tgmob6CnSyyB+tU52727nMYnmecq00g-c0jV-y5GhqVh83NQ@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: decoupling table and index vacuum  (Andres Freund <andres@anarazel.de>)
Re: decoupling table and index vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Apr 22, 2021 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> The dead TID fork needs to also be efficiently searched. If the heap
> scan runs twice, the collected dead TIDs on each heap pass could be
> overlapped. But we would not be able to merge them if we did index
> vacuuming on one of indexes at between those two heap scans. The
> second time heap scan would need to record only TIDs that are not
> collected by the first time heap scan.

I agree that there's a problem here. It seems to me that it's probably
possible to have a dead TID fork that implements "throw away the
oldest stuff" efficiently, and it's probably also possible to have a
TID fork that can be searched efficiently. However, I am not sure that
it's possible to have a dead TID fork that does both of those things
efficiently. Maybe you have an idea. My intuition is that if we have
to pick one, it's MUCH more important to be able to throw away the
oldest stuff efficiently. I think we can work around the lack of
efficient lookup, but I don't see a way to work around the lack of an
efficient operation to discard the oldest stuff.

> Right. Given decoupling index vacuuming, I think the index’s garbage
> statistics are important which preferably need to be fetchable without
> accessing indexes. It would be not hard to estimate how many index
> tuples might be able to be deleted by looking at the dead TID fork but
> it doesn’t necessarily match the actual number.

Right, and to appeal (I think) to Peter's quantitative vs. qualitative
principle, it could be way off. Like, we could have a billion dead
TIDs and in one index the number of index entries that need to be
cleaned out could be 1 billion and in another index it could be zero
(0). We know how much data we will need to scan because we can fstat()
the index, but there seems to be no easy way to estimate how many of
those pages we'll need to dirty, because we don't know how successful
previous opportunistic cleanup has been. It is not impossible, as
Peter has pointed out a few times now, that it has worked perfectly
and there will be no modifications required, but it is also possible
that it has done nothing.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: decoupling table and index vacuum
Next
From: Andres Freund
Date:
Subject: Re: decoupling table and index vacuum