On Wednesday 23 July 2003 01:40, Reece Hart wrote:
> I'm trying to update a table but it's taking a very long time. I would
> appreciate any tips folks may have about ways to speed it up.
[snip]
> paprospect2 contains ~40M rows. The goal now is to migrate the data to
> the supertable-inherited column with
>
> update paprospect2 set run_id_new=run_id;
>
>
> The update's been running for 5 hours (unloaded dual 2.4 GHz Xeon w/2GB
> RAM, SCSI160 10K drive). There are no other jobs running. Load is ~1.2
> and the update's using ~3-5% of the CPU.
[snip]
> This suggests that the update is I/O bound (duh) and vmstat supports
> this:
[snip]
> Presumably the large number of blocks written (bo) versus blocks read
> (bi) reflects an enormous amount of bookkeeping that has to be done for
> MVCC, logging, perhaps rewriting a row for the new definition (a guess
> -- I don't know how this is handled), indicies, etc. There's no swapping
> and no processes are waiting. In short, it seems that this is ENTIRELY
> an I/O issue. Obviously, faster drives will help (but probably only by
> small factor).
>
> Any ideas how I might speed this up? Presumably this is all getting
> wrapped in a transaction -- does that hurt me for such a large update?
Well, it needs to keep enought bookkeeping to be able to rollback the whole
transaction if it encounters a problem, or 40M rows in your case. Looks like
you're right and it's an I/O issue. I must admit, I'm a bit puzzled that your
CPU is quite so low, but I suppose you've got two fast CPUs so it shouldn't
be high.
[note the following is more speculation than experience]
What might be happening is that the drive is spending all its time seeking
between the WAL, index and table as it updates. I would also tend to be
suspicious of the foreign keys - PG might be re-checking these, and obviously
that would take time too.
What you might want to try in future:
1. begin transaction
2. drop indexes, foreign keys
3. update table
4. vacuum it
5. recreate indexes, foreign keys etc
6. commit
Now that's just moving the index updating/fk stuff to the end of the task, but
it does seem to help sometimes.
HTH
--
Richard Huxton
Archonet Ltd