Re: performance problem - Mailing list pgsql-general
From | Rick Gigger |
---|---|
Subject | Re: performance problem |
Date | |
Msg-id | 01d701c3afb4$90b68280$0700a8c0@trogdor Whole thread Raw |
In response to | Point-in-time data recovery - v.7.4 (Rafael Martinez Guerrero <r.m.guerrero@usit.uio.no>) |
List | pgsql-general |
> If you examine the behavior of pg_dump output, you'll notice that it > doesn't built indexes until after the COPY command has completed the > data import. It's a waste of cpu cycles and disk bandwidth to update > indexes on upon every insert. Your script should: > > 1) Drop all indexes on the relation > 2) Use COPY if possible instead of INSERT to import the data > 3) Recreate the indexes > 4) Use UPDATE to update as necessary I never thought of dropping the indexes in the middle of the script and readding them. I guess that would be fine as long as nothing else was running that needed the index while it was gone. I don't think 7.2.x supports using COPY unless you are inserting all of the fields which I don't want to do although I suppose I could just insert all of the defaults. Is that correct? > > I am using 7.2.4. Has this improved in later versions? I'm not concerened > > since this is a very rare thing to need to do and it's obviously possible to > > work around but it would be nice if postgres could figure things like that > > out on it's own. (It certainly would have saved me a lot of time and > > confusion last night at about 3 am). Is there a way to for the use of a > > specific index on a query? > > You can force the use of an index scan by turning sequential scans to off: > SET ENABLE_SEQSCAN TO OFF; Thanks. > But the problem is that the statistics didn't match the data. You > could have: > > 1) Used INSERTs to insert the data into a relation with an index > 2) Executed ANALYZE <foo> to update the statistics > 3) Perform the UPDATE That would have been the easiest thing. I wanted to do that but I thought that you had to do a vacuum (which I couldn't do in the transaction) together with analyze. I didn't realize that analyze was a command all by itself. > After the UPDATE, you'll still have dead tuples (the original rows) > which require that they be marked as dead, and so you should > occassionally run either VACUUM to mark them as such or VACUUM FULL to > reclaim the dead space or VACUUM FULL ANALYZE to also update the > relation's statistics. I do them all nightly with cron on all my important databases. I just had a problem with need the analyze to happen in the middle of the transaction in this one special case. > I would just execute the steps I outlined above with COPY and not > worry about ANALYZEs and VACUUMs in a script. People often run VACUUM > ANALYZE in a 1/day cron job and VACUUM FULL ANALYZE in a 1/week cron > job. Then, of course, there's REINDEX... Doing the vacuum full analyze doesn't take long on to do if I do it once a day so I just do that.
pgsql-general by date: