Re: More aggressive vacuuming of temporary tables - Mailing list pgsql-hackers

From Andres Freund
Subject Re: More aggressive vacuuming of temporary tables
Date
Msg-id 20200908190129.bxujcfkxpldljus4@alap3.anarazel.de
Whole thread Raw
In response to More aggressive vacuuming of temporary tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: More aggressive vacuuming of temporary tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: More aggressive vacuuming of temporary tables  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Hi,

On 2020-08-28 11:46:49 -0400, Tom Lane wrote:
> It strikes me that when we are vacuuming a temporary table (which
> necessarily will be one of our own session), we don't really need
> to care what the global xmin horizon is.  If we're not inside a
> user transaction block, then there are no tuples in the table that
> could be in-doubt anymore.  Neither are there any snapshots in our
> session that could see any dead tuples.  Nor do we give a fig what
> other sessions might think of those tuples.  So we could just set
> the xmin cutoff as aggressively as possible, which is to say
> equal to the nextXid counter.  While vacuuming a temp table is
> perhaps not something people do very often, I think when they do
> do it they would like us to clean out all the dead tuples not just
> some.

That seems like a good idea.

I've been toying with a patch that introduces more smarts about when a
row is removable, by looking more closely whether a specific row
versions are visible (e.g. in the common case of one old snapshot and
lots of newer rows). But that's orders of magnitude more complicated. So
going for something as simple as this seems like a good idea.


> Hence I propose 0001 attached.  80% of it is just API additions to allow
> passing down the isTopLevel flag so that we can do the right thing in
> the CLUSTER case; the important change is in vacuum_set_xid_limits.
> (For ease of review, I didn't reindent the existing code in
> vacuum_set_xid_limits, but that would be something to do before commit.)

I did wonder for a moment if it could be worthwhile to just implement
this for VACUUM and leave CLUSTER alone, to avoid having to deal with
the toplevel stuff. But I think it's better to be consistent.

But now I do wonder why we need to know whether the command is top level
or not? Why isn't the correct thing to instead look at what the current
backend's xmin is? Seems like you could just replace
    *oldestXmin = XidFromFullTransactionId(ReadNextFullTransactionId());
with
    *oldestXmin = MyProc->xmin;
    Assert(TransactionIdIsValid(*oldestXmin));

and you'd not need to care about whether the CLUSTER is top-level or
not? Without, I think, loosing any aggressiveness in the top-level case?
Perhaps even worth moving this logic into
GetOldestNonRemovableTransactionId().

I know you already pushed this, but I vote for revising it this way if
you don't see an issue?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: VACUUM (INTERRUPTIBLE)?
Next
From: Heikki Linnakangas
Date:
Subject: Re: Yet another fast GiST build