Home > mailing lists

Re: More aggressive vacuuming of temporary tables - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: More aggressive vacuuming of temporary tables
Date	September 8, 2020 22:01:29
Msg-id	20200908190129.bxujcfkxpldljus4@alap3.anarazel.de Whole thread Raw
In response to	More aggressive vacuuming of temporary tables (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: More aggressive vacuuming of temporary tables (Tom Lane <tgl@sss.pgh.pa.us>) Re: More aggressive vacuuming of temporary tables (Stephen Frost <sfrost@snowman.net>)
List	pgsql-hackers

Tree view

Hi,

On 2020-08-28 11:46:49 -0400, Tom Lane wrote:
> It strikes me that when we are vacuuming a temporary table (which
> necessarily will be one of our own session), we don't really need
> to care what the global xmin horizon is.  If we're not inside a
> user transaction block, then there are no tuples in the table that
> could be in-doubt anymore.  Neither are there any snapshots in our
> session that could see any dead tuples.  Nor do we give a fig what
> other sessions might think of those tuples.  So we could just set
> the xmin cutoff as aggressively as possible, which is to say
> equal to the nextXid counter.  While vacuuming a temp table is
> perhaps not something people do very often, I think when they do
> do it they would like us to clean out all the dead tuples not just
> some.

That seems like a good idea.

I've been toying with a patch that introduces more smarts about when a
row is removable, by looking more closely whether a specific row
versions are visible (e.g. in the common case of one old snapshot and
lots of newer rows). But that's orders of magnitude more complicated. So
going for something as simple as this seems like a good idea.

> Hence I propose 0001 attached.  80% of it is just API additions to allow
> passing down the isTopLevel flag so that we can do the right thing in
> the CLUSTER case; the important change is in vacuum_set_xid_limits.
> (For ease of review, I didn't reindent the existing code in
> vacuum_set_xid_limits, but that would be something to do before commit.)

I did wonder for a moment if it could be worthwhile to just implement
this for VACUUM and leave CLUSTER alone, to avoid having to deal with
the toplevel stuff. But I think it's better to be consistent.

But now I do wonder why we need to know whether the command is top level
or not? Why isn't the correct thing to instead look at what the current
backend's xmin is? Seems like you could just replace
    *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
with
    *oldestXmin = MyProc->xmin;
    Assert(TransactionIdIsValid(*oldestXmin));

and you'd not need to care about whether the CLUSTER is top-level or
not? Without, I think, loosing any aggressiveness in the top-level case?
Perhaps even worth moving this logic into
GetOldestNonRemovableTransactionId().

I know you already pushed this, but I vote for revising it this way if
you don't see an issue?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Andres Freund
Date: 08 September 2020, 21:49:36
Subject: VACUUM (INTERRUPTIBLE)?

From: Heikki Linnakangas
Date: 08 September 2020, 22:05:37
Subject: Re: Yet another fast GiST build

Re: More aggressive vacuuming of temporary tables - Mailing list pgsql-hackers

Previous

Next