Thread: Possible TODO item? VACUUM on empty table
Over on the support lists we've seen a couple of examples this week of people doing a bulk import/delete/replace cycle with a vacuum analyse just after the delete. Instinctively, it's where you'd vacuum and I'm guessing the analyse just gets typed at the same time from habit (I know I've done so more than once). Of course, it flattens your stats and your import can then take forever. So - would it be worth, in the case of vacuum+analyse (but not analyse alone), where the table is completely empty then just halve the existing stats. That way, if you're in the middle of a delete/replace cycle your stats will still be in the realms of reasonableness. On the other hand, if the table is genuinely empty and you're running a nightly vacuum the stats will tend toward 0. Against the idea: 1. It's a special-case in the code 2. There's a work-around (unless you're running pg_autovacuum I'd guess) 2. We perhaps want a more general table description (this is a bulk-loading table, this is a log table, this is a mostly static lookup table) -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > So - would it be worth, in the case of vacuum+analyse (but not analyse alone), > where the table is completely empty then just halve the existing stats. > > That way, if you're in the middle of a delete/replace cycle your stats will > still be in the realms of reasonableness. On the other hand, if the table is > genuinely empty and you're running a nightly vacuum the stats will tend toward > 0. > > Against the idea: > 1. It's a special-case in the code Hm. I suppose the generalization of this would be to always average the new stats with the existing stats. I suppose you could make the argument that if I run analyze and then run analyze again without changing anything then I've effectively doubled the sample size and the stats can take advantage of that. But not all of the stats are simple totals. I'm not sure this would make sense for the histogram arrays. -- greg
Greg Stark <gsstark@mit.edu> writes: > But not all of the stats are simple totals. I'm not sure this would make sense > for the histogram arrays. Yeah, I don't know how you "halve" a histogram. But the problem is not with the pg_statistic stats, I think. It is already true that ANALYZE punts without touching pg_statistic if it was unable to find any live rows, and AFAIR it always has. So the problem case of delete all/analyze doesn't clobber pg_statistic. The issue is only with the relpages and reltuples counts in pg_class. I already have a proposal on the table to get rid of these in favor of a "tuples per page" entry, see eg http://archives.postgresql.org/pgsql-general/2004-08/msg01422.php An objection that I forgot to mention in that message is that any such change would break autovacuum in its current form; although that issue largely vanishes if we integrate autovacuum into the backend, and in any case we could offer a built-in function to return the current number of pages in a table. regards, tom lane