Thread: Possible TODO item? VACUUM on empty table

Possible TODO item? VACUUM on empty table

From
Richard Huxton
Date:
Over on the support lists we've seen a couple of examples this week of 
people doing a bulk import/delete/replace cycle with a vacuum analyse 
just after the delete.

Instinctively, it's where you'd vacuum and I'm guessing the analyse just 
gets typed at the same time from habit (I know I've done so more than 
once). Of course, it flattens your stats and your import can then take 
forever.

So - would it be worth, in the case of vacuum+analyse (but not analyse 
alone), where the table is completely empty then just halve the existing 
stats.

That way, if you're in the middle of a delete/replace cycle your stats 
will still be in the realms of reasonableness. On the other hand, if the 
table is genuinely empty and you're running a nightly vacuum the stats 
will tend toward 0.

Against the idea:
1. It's a special-case in the code
2. There's a work-around (unless you're running pg_autovacuum I'd guess)
2. We perhaps want a more general table description (this is a 
bulk-loading table, this is a log table, this is a mostly static lookup 
table)

--  Richard Huxton  Archonet Ltd


Re: Possible TODO item? VACUUM on empty table

From
Greg Stark
Date:
Richard Huxton <dev@archonet.com> writes:

> So - would it be worth, in the case of vacuum+analyse (but not analyse alone),
> where the table is completely empty then just halve the existing stats.
> 
> That way, if you're in the middle of a delete/replace cycle your stats will
> still be in the realms of reasonableness. On the other hand, if the table is
> genuinely empty and you're running a nightly vacuum the stats will tend toward
> 0.
> 
> Against the idea:
> 1. It's a special-case in the code

Hm. I suppose the generalization of this would be to always average the new
stats with the existing stats.

I suppose you could make the argument that if I run analyze and then run
analyze again without changing anything then I've effectively doubled the
sample size and the stats can take advantage of that.

But not all of the stats are simple totals. I'm not sure this would make sense
for the histogram arrays.

-- 
greg



Re: Possible TODO item? VACUUM on empty table

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> But not all of the stats are simple totals. I'm not sure this would make sense
> for the histogram arrays.

Yeah, I don't know how you "halve" a histogram.

But the problem is not with the pg_statistic stats, I think.  It is
already true that ANALYZE punts without touching pg_statistic if it was
unable to find any live rows, and AFAIR it always has.  So the problem
case of delete all/analyze doesn't clobber pg_statistic.

The issue is only with the relpages and reltuples counts in pg_class.
I already have a proposal on the table to get rid of these in favor of
a "tuples per page" entry, see eg
http://archives.postgresql.org/pgsql-general/2004-08/msg01422.php

An objection that I forgot to mention in that message is that any such
change would break autovacuum in its current form; although that issue
largely vanishes if we integrate autovacuum into the backend, and in any
case we could offer a built-in function to return the current number of
pages in a table.
        regards, tom lane