Thread: Vacuum and inserts

Vacuum and inserts

From
Mark Wilden
Date:
My coworker and I are having an argument about whether it's necessary
to VACUUM an insert-only table.

My theory is that since there are no outdated nor deleted rows, VACUUM
doesn't do anything. I just loaded a TRUNCATEd table with no indexes
with 4 million records, indexed it, then ran VACUUM. There was nothing
to do. So I don't see VACUUM as necessary with an insert-only table
(not to be confused with ANALYZE, which clearly is necessary).

My coworker is mainly going from the article at
http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT
. In particular, the author says

"Consider this scenario: a row is inserted into a table that has a
couple indexes, and that transaction commits. Several updates happen
on that row. Each update will create a new row in all indexes, even if
the index key didn't change. And each update will also leave an old
version of the row in the base table, one that has been updated to
point to the location of the new version of the row that replaces it.
"

I don't get this at all. He starts by talking about inserts, but then
talks only about updates (and VACUUM is clearly needed when rows are
updated or deleted).

Can someone disambiguate?

adthanksvance

///ark

Re: Vacuum and inserts

From
Michael Fuhr
Date:
On Wed, Jun 18, 2008 at 12:29:46PM -0700, Mark Wilden wrote:
> My coworker and I are having an argument about whether it's necessary
> to VACUUM an insert-only table.
>
> My theory is that since there are no outdated nor deleted rows, VACUUM
> doesn't do anything.

Rolled back transactions on an insert-only table can leave behind
dead rows.  Also, even if the table never has dead rows you still
have to vacuum periodically to prevent transaction ID wraparound.

http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

"To avoid this, it is necessary to vacuum every table in every
database at least once every two billion transactions."

--
Michael Fuhr

Re: Vacuum and inserts

From
Bill Moran
Date:
In response to Mark Wilden <mark.wilden@gmail.com>:
>
> My coworker and I are having an argument about whether it's necessary
> to VACUUM an insert-only table.
>
> My theory is that since there are no outdated nor deleted rows, VACUUM
> doesn't do anything. I just loaded a TRUNCATEd table with no indexes
> with 4 million records, indexed it, then ran VACUUM. There was nothing
> to do. So I don't see VACUUM as necessary with an insert-only table
> (not to be confused with ANALYZE, which clearly is necessary).

The only counter-argument to this is the case where transactions are
aborted/rolled back, which _does_ result in dead rows.  If your
application doesn't usually do ROLLBACKs, this probably won't ever
be a problem.  It's a great argument for autovacuum, however, which
won't bother to vacuum the table if it doesn't have any dead rows.

> My coworker is mainly going from the article at
> http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT
> . In particular, the author says
>
> "Consider this scenario: a row is inserted into a table that has a
> couple indexes, and that transaction commits. Several updates happen
> on that row. Each update will create a new row in all indexes, even if
> the index key didn't change. And each update will also leave an old
> version of the row in the base table, one that has been updated to
> point to the location of the new version of the row that replaces it.
> "
>
> I don't get this at all. He starts by talking about inserts, but then
> talks only about updates (and VACUUM is clearly needed when rows are
> updated or deleted).
>
> Can someone disambiguate?

I seems to me that it's your coworker that needs the disambiguation.
Based on the argument you describe, he doesn't seem to understand the
difference between UPDATE and INSERT.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Vacuum and inserts

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> On Wed, Jun 18, 2008 at 12:29:46PM -0700, Mark Wilden wrote:
>> My theory is that since there are no outdated nor deleted rows, VACUUM
>> doesn't do anything.

> Rolled back transactions on an insert-only table can leave behind
> dead rows.  Also, even if the table never has dead rows you still
> have to vacuum periodically to prevent transaction ID wraparound.

Another reason why you might want to vacuum after a bulk insertion
is to get all the commit hint bits set, rather than leaving that
overhead to be done by live queries.

In 8.4 there might be yet another reason, if you have a GIN index
on the table: Teodor is talking about postponing some of the GIN
insertion work to VACUUM ...

            regards, tom lane

Re: Vacuum and inserts

From
"Mark Wilden"
Date:
Thanks to everyone for your comments. It does sound like my understanding was basically correct, but also that autovacuum is still worthwhile in my situation, for reasons other than concurrency.

///ark