Re: Autovacuum loose ends - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Autovacuum loose ends
Date
Msg-id 21177.1121364298@sss.pgh.pa.us
Whole thread Raw
In response to Re: Autovacuum loose ends  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Oh, is that right?  Actually in the end I forgot about temp tables so I
> didn't handle them specially, but now I remember that when I started
> looking at Matthew's integration code I thought that temp tables should
> be analyzed if they happen to have a lot of new tuples, so that the
> planner would have good stats about them.

You can't analyze temp tables of other sessions at all (and analyze.c
won't try) because you can't be certain of being able to access their
data.  The valid data might only exist inside the local buffers of the
owning backend.  Same goes for VACUUM --- basically, autovac has to keep
its fingers off temp tables altogether.  (See nearby discussion with
Tom O'Connell for graphic evidence.)

>> * For that matter I'm unconvinced that it's a good idea to try to force
>> the pgstat DB to pick up every table in every database.  If there's no
>> entry it's because the table is not getting modified, and therefore it
>> seems to me that we can just leave well enough alone.  The code really
>> is not very good about doing nothing where nothing is called for ;-)

> Hmm.  The problem is that the table may merit a first ANALYZE, and in a
> second run we need to know that another one is not needed.  How would we
> know that, if we don't keep track on it in the pgstat DB?

If it's being touched at all, then it will get into the pgstat DB
because of the actions of regular backends.  I am questioning the need
for autovac to force things into the pgstat DB when they are evidently
not being used otherwise.

(This argument becomes stronger if we don't reset the stats contents
at postmaster start, which is another open issue.  I think we probably
do need to toss the old stats file after a WAL recovery, but maybe it
need not happen otherwise.)

> Also, remember that there were mentions of changing wraparound Xid to be
> kept track of on a per-table basis, instead of per-database (for 8.2 I
> assume).  If this happens we will _need_ to check every table.

Exactly ... whether it is in pgstat or not.  That does not translate to
a need to force pgstat entries to be made.

>> * Or actually, it would vacuum template0, except that since no regular
>> backend ever connects to template0, there will be no stats DB entry for
>> it and so the loop in AutoVacMain will ignore it.  This is definitely
>> BAD as it means that a database that's not been touched since postmaster
>> start will never be vacuumed, not even for XID wraparound prevention.
>> That test needs to be weakened.

> See, that's what I'm talking about :-)

Yeah, but you're drawing the wrong conclusion.  I'm saying that for a DB
or table that is not present in pgstat, it is reasonable to assume it is
not being used, and so our only responsibility is to prevent XID
wraparound on it --- which we can determine from the pg_database entry
(or pg_class entry if per-table wrap management happens).  We do not need
to force a pgstat entry to be created, and we should not try.

> Another idea would be keeping a per-database dead tuple counter, or some
> other metric, and use that as a parameter in choosing what database to
> vacuum.  The current test (last autovac start time) is certainly very
> naive.

Yeah, keeping per-database totals of the dead tuple counts would help,
and would cost little inside the stats collector AFAICS.

> Not sure if it's possible to check current Xid without
> connecting to a database first.

We could trivially expand the database flat file to include its
datfrozenxid, and any other fields we need from pg_database.

>> * I'm still pretty concerned about the handling of shared catalogs.

> Maybe we can store them in pgstat in a pseudo-database with Oid=0, and
> special case them everywhere.

Yeah, that's what I suggested before.  I haven't thought of any holes in
the idea yet.  The "special casing" shouldn't be hard --- you can just
use the same info set up for the Relation's rd_lockInfo.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: windows regression failure - prepared xacts
Next
From: "Merlin Moncure"
Date:
Subject: Re: Simplifying identification of temporary tables