Thread: Vacuum and inserts
My coworker and I are having an argument about whether it's necessary to VACUUM an insert-only table. My theory is that since there are no outdated nor deleted rows, VACUUM doesn't do anything. I just loaded a TRUNCATEd table with no indexes with 4 million records, indexed it, then ran VACUUM. There was nothing to do. So I don't see VACUUM as necessary with an insert-only table (not to be confused with ANALYZE, which clearly is necessary). My coworker is mainly going from the article at http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT . In particular, the author says "Consider this scenario: a row is inserted into a table that has a couple indexes, and that transaction commits. Several updates happen on that row. Each update will create a new row in all indexes, even if the index key didn't change. And each update will also leave an old version of the row in the base table, one that has been updated to point to the location of the new version of the row that replaces it. " I don't get this at all. He starts by talking about inserts, but then talks only about updates (and VACUUM is clearly needed when rows are updated or deleted). Can someone disambiguate? adthanksvance ///ark
On Wed, Jun 18, 2008 at 12:29:46PM -0700, Mark Wilden wrote: > My coworker and I are having an argument about whether it's necessary > to VACUUM an insert-only table. > > My theory is that since there are no outdated nor deleted rows, VACUUM > doesn't do anything. Rolled back transactions on an insert-only table can leave behind dead rows. Also, even if the table never has dead rows you still have to vacuum periodically to prevent transaction ID wraparound. http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND "To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions." -- Michael Fuhr
In response to Mark Wilden <mark.wilden@gmail.com>: > > My coworker and I are having an argument about whether it's necessary > to VACUUM an insert-only table. > > My theory is that since there are no outdated nor deleted rows, VACUUM > doesn't do anything. I just loaded a TRUNCATEd table with no indexes > with 4 million records, indexed it, then ran VACUUM. There was nothing > to do. So I don't see VACUUM as necessary with an insert-only table > (not to be confused with ANALYZE, which clearly is necessary). The only counter-argument to this is the case where transactions are aborted/rolled back, which _does_ result in dead rows. If your application doesn't usually do ROLLBACKs, this probably won't ever be a problem. It's a great argument for autovacuum, however, which won't bother to vacuum the table if it doesn't have any dead rows. > My coworker is mainly going from the article at > http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT > . In particular, the author says > > "Consider this scenario: a row is inserted into a table that has a > couple indexes, and that transaction commits. Several updates happen > on that row. Each update will create a new row in all indexes, even if > the index key didn't change. And each update will also leave an old > version of the row in the base table, one that has been updated to > point to the location of the new version of the row that replaces it. > " > > I don't get this at all. He starts by talking about inserts, but then > talks only about updates (and VACUUM is clearly needed when rows are > updated or deleted). > > Can someone disambiguate? I seems to me that it's your coworker that needs the disambiguation. Based on the argument you describe, he doesn't seem to understand the difference between UPDATE and INSERT. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Michael Fuhr <mike@fuhr.org> writes: > On Wed, Jun 18, 2008 at 12:29:46PM -0700, Mark Wilden wrote: >> My theory is that since there are no outdated nor deleted rows, VACUUM >> doesn't do anything. > Rolled back transactions on an insert-only table can leave behind > dead rows. Also, even if the table never has dead rows you still > have to vacuum periodically to prevent transaction ID wraparound. Another reason why you might want to vacuum after a bulk insertion is to get all the commit hint bits set, rather than leaving that overhead to be done by live queries. In 8.4 there might be yet another reason, if you have a GIN index on the table: Teodor is talking about postponing some of the GIN insertion work to VACUUM ... regards, tom lane
Thanks to everyone for your comments. It does sound like my understanding was basically correct, but also that autovacuum is still worthwhile in my situation, for reasons other than concurrency.
///ark
///ark