Home > mailing lists

Re: performance problem - Mailing list pgsql-general

From	Rick Gigger
Subject	Re: performance problem
Date	November 20, 2003 18:21:20
Msg-id	01d701c3afb4$90b68280$0700a8c0@trogdor Whole thread Raw
In response to	Point-in-time data recovery - v.7.4 (Rafael Martinez Guerrero <r.m.guerrero@usit.uio.no>)
List	pgsql-general

Tree view

> If you examine the behavior of pg_dump output, you'll notice that it
> doesn't built indexes until after the COPY command has completed the
> data import. It's a waste of cpu cycles and disk bandwidth to update
> indexes on upon every insert. Your script should:
>
> 1) Drop all indexes on the relation
> 2) Use COPY if possible instead of INSERT to import the data
> 3) Recreate the indexes
> 4) Use UPDATE to update as necessary

I never thought of dropping the indexes in the middle of the script and
readding them.  I guess that would be fine as long as nothing else was
running that needed the index while it was gone.  I don't think 7.2.x
supports using COPY unless you are inserting all of the fields which I don't
want to do although I suppose I could just insert all of the defaults.  Is
that correct?


> > I am using 7.2.4.  Has this improved in later versions?  I'm not
concerened
> > since this is a very rare thing to need to do and it's obviously
possible to
> > work around but it would be nice if postgres could figure things like
that
> > out on it's own.  (It certainly would have saved me a lot of time and
> > confusion last night at about 3 am).  Is there a way to for the use of a
> > specific index on a query?
>
> You can force the use of an index scan by turning sequential scans to off:
>
SET ENABLE_SEQSCAN TO OFF;

Thanks.


> But the problem is that the statistics didn't match the data. You
> could have:
>
> 1) Used INSERTs to insert the data into a relation with an index
> 2) Executed ANALYZE <foo> to update the statistics
> 3) Perform the UPDATE

That would have been the easiest thing.  I wanted to do that but I thought
that you had to do a vacuum (which I couldn't do in the transaction)
together with analyze.  I didn't realize that analyze was a command all by
itself.

> After the UPDATE, you'll still have dead tuples (the original rows)
> which require that they be marked as dead, and so you should
> occassionally run either VACUUM to mark them as such or VACUUM FULL to
> reclaim the dead space or VACUUM FULL ANALYZE to also update the
> relation's statistics.

I do them all nightly with cron on all my important databases.  I just had a
problem with need the analyze to happen in the middle of the transaction in
this one special case.

> I would just execute the steps I outlined above with COPY and not
> worry about ANALYZEs and VACUUMs in a script. People often run VACUUM
> ANALYZE in a 1/day cron job and VACUUM FULL ANALYZE in a 1/week cron
> job. Then, of course, there's REINDEX...

Doing the vacuum full analyze doesn't take long on to do if I do it once a
day so I just do that.

pgsql-general by date:

From: Jan Wieck
Date: 20 November 2003, 17:21:05
Subject: Re: Point-in-time data recovery - v.7.4

From: "Rick Gigger"
Date: 20 November 2003, 18:22:22
Subject: Re: performance problem

Re: performance problem - Mailing list pgsql-general

Previous

Next