Thread: Vacuum advice

Vacuum advice

From
David Mitchell
Date:
Hi,

I have a system that has a moderate amount of activity on it, nothing
strenuous. The activity is a real mixture of operations: selects,
updates, inserts and deletes. One thing strange about our database is
that we have a log of stored procedures that use temporary tables. Most
of the time the system runs fine, and a nightly vacuum does the job.
However, occasionally we need to import data, and this involves
inserting several million rows into a table, but this just *cripples*
postgres. After the import has been running for a while, simple selects
take a long time, and strangely, the query planner starts using a
sequential scan on a table with millions of items in it. Surely inserts
shouldn't be doing this? Should I be running a vacuum during the import?
After how many operations should I be calling vacuum? We tried using
pg_autovacuum but it didn't seem to do a very good job since we would
often find the query planner getting it wrong, and a vacuum analyze
would fix it.

Part of the import is to update items in a table to keep track of where
the import is up to. The update is just setting an integer column, so
does this create a dead tuple or will postgres alter the column in-place?

Thanks for your help
--
David Mitchell
Software Engineer
Telogis

Re: Vacuum advice

From
Tom Lane
Date:
David Mitchell <david.mitchell@telogis.com> writes:
> However, occasionally we need to import data, and this involves
> inserting several million rows into a table, but this just *cripples*
> postgres. After the import has been running for a while, simple selects
> take a long time, and strangely, the query planner starts using a
> sequential scan on a table with millions of items in it. Surely inserts
> shouldn't be doing this? Should I be running a vacuum during the import?

PG 8.0 and up should handle this a bit more nicely.  Before 8.0 the
planner looked only at the table size recorded in pg_catalog (the
relpages and reltuples fields) and so of course it wouldn't react to a
still-in-progress mass insert.  Now it looks at the current actual
physical table size, and so it should be willing to switch to indexscan
if needed before the import completes.

If you *are* using 8.0 then we need to look closer.

> After how many operations should I be calling vacuum? We tried using
> pg_autovacuum but it didn't seem to do a very good job since we would
> often find the query planner getting it wrong, and a vacuum analyze
> would fix it.

autovacuum has a problem with responding to transactions that are
still-in-progress too :-(

> Part of the import is to update items in a table to keep track of where
> the import is up to. The update is just setting an integer column, so
> does this create a dead tuple or will postgres alter the column in-place?

Postgres doesn't *ever* update anything in-place.

            regards, tom lane

Re: Vacuum advice

From
David Mitchell
Date:
Thanks Tom,

> If you *are* using 8.0 then we need to look closer.
>

Sorry, I should have mentioned, I am using PG 8.0. Also, although this
is a 'mass insert', it's only kind of mass. While there are millions of
rows, they are inserted in blocks of 500 (with a commit in between).

We're thinking we might set up vacuum_cost_limit to around 100 and put
vacuum_cost_delay at 100 and then just run vacuumdb in a cron job every
15 minutes or so, does this sound silly?


--
David Mitchell
Software Engineer
Telogis


Re: Vacuum advice

From
Tom Lane
Date:
David Mitchell <david.mitchell@telogis.com> writes:
>> If you *are* using 8.0 then we need to look closer.

> Sorry, I should have mentioned, I am using PG 8.0. Also, although this
> is a 'mass insert', it's only kind of mass. While there are millions of
> rows, they are inserted in blocks of 500 (with a commit in between).

> We're thinking we might set up vacuum_cost_limit to around 100 and put
> vacuum_cost_delay at 100 and then just run vacuumdb in a cron job every
> 15 minutes or so, does this sound silly?

It doesn't sound completely silly, but if you are doing inserts and not
updates/deletes then there's not anything for VACUUM to do, really.
An ANALYZE command might get the same result with less effort.

I am however still wondering why 8.0 doesn't get it right without help.
Can you try a few EXPLAIN ANALYZEs as the table grows and watch whether
the cost estimates change?

(Also, if this is actually 8.0.0 and not a more recent dot-release,
I believe there were some bug fixes in this vicinity in 8.0.2.)

            regards, tom lane

Re: Vacuum advice

From
David Mitchell
Date:
>
>>We're thinking we might set up vacuum_cost_limit to around 100 and put
>>vacuum_cost_delay at 100 and then just run vacuumdb in a cron job every
>>15 minutes or so, does this sound silly?
>
>
> It doesn't sound completely silly, but if you are doing inserts and not
> updates/deletes then there's not anything for VACUUM to do, really.
> An ANALYZE command might get the same result with less effort.

I think that perhaps the fact we are doing updates in the secondary
table to track the import is the culprit here. It gets updated for each
item inserted into the main table, so even though it has 500 rows, it
ended up with about 2million dead tuples, which left a lot to be desired
in terms of seq scan speed. Vacuum full cleared this up, so I assume a
frequent regular vacuum would keep it in tip top condition.

We are using PG 8.0.1.

Thanks for your help Tom.

--
David Mitchell
Software Engineer
Telogis

Re: Vacuum advice

From
"Karl O. Pinc"
Date:
On 06/22/2005 04:39:00 PM, Tom Lane wrote:
> David Mitchell <david.mitchell@telogis.com> writes:
> > However, occasionally we need to import data, and this involves
> > inserting several million rows into a table, but this just
> *cripples*
> > postgres. After the import has been running for a while, simple
> selects
> > take a long time, and strangely, the query planner starts using a
> > sequential scan on a table with millions of items in it. Surely
> inserts
> > shouldn't be doing this? Should I be running a vacuum during the
> import?
>
> PG 8.0 and up should handle this a bit more nicely.  Before 8.0 the
> planner looked only at the table size recorded in pg_catalog (the
> relpages and reltuples fields) and so of course it wouldn't react to a
> still-in-progress mass insert.  Now it looks at the current actual
> physical table size, and so it should be willing to switch to
> indexscan
> if needed before the import completes.

I've made a habit of disconnecting, analyzing and re-connecting
every so often on mass data loading because any stored
procedures I'm using for the load, including triggers, will not have
their query plans updated to account for the new data.
It's a pain, so I figured I'd ask whether I'm missing
something and this is somehow unnecessary.


Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein