Thread: slow table updates

slow table updates

From
Reece Hart
Date:
I'm trying to update a table but it's taking a very long time. I would appreciate any tips folks may have about ways to speed it up.

The table is paprospect2, as below:
\d paprospect2
   Column    |  Type   |                             Modifiers
-------------+---------+-------------------------------------------------------------------
 pfeature_id | integer | not null default nextval('unison.pfeature_pfeature_id_seq'::text)pseq_id     | integer | not nullpftype_id   | integer | not nullstart       | integer |stop        | integer |confidence  | real    |run_id      | integer | not null[snip 13 integer and real columns]
 run_id_new  | integer |

Indexes: paprospect2_redundant_alignment unique btree (pseq_id, "start", stop, run_id, pmodel_id),
         p2thread_p2params_id btree (run_id),
         p2thread_pmodel_id btree (pmodel_id)
Foreign Key constraints: pftype_id_exists FOREIGN KEY (pftype_id) REFERENCES pftype(pftype_id) ON UPDATE CASCADE ON DELETE CASCADE,
                         p2thread_pmodel_id_exists FOREIGN KEY (pmodel_id) REFERENCES pmprospect2(pmodel_id) ON UPDATE CASCADE ON DELETE CASCADE,
                         pseq_id_exists FOREIGN KEY (pseq_id) REFERENCES pseq(pseq_id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers: p2thread_i_trigger
The columns pfeature_id..confidence and run_id_new (in red) are from an inherited table. Although the inheritance itself is probably not relevant here (correction welcome), I suspect it may be relevant that all existing rows were written before the table definition included run_id_new. p2thread_i_trigger is defined fires on insert only (not update).

paprospect2 contains ~40M rows. The goal now is to migrate the data to the supertable-inherited column with
update paprospect2 set run_id_new=run_id;

The update's been running for 5 hours (unloaded dual 2.4 GHz Xeon w/2GB RAM, SCSI160 10K drive). There are no other jobs running. Load is ~1.2 and the update's using ~3-5% of the CPU.
$ ps -ostime,time,pcpu,cmd 28701
STIME     TIME %CPU CMD
12:18 00:07:19  2.3 postgres: admin csb 128.137.116.213 UPDATE
This suggests that the update is I/O bound (duh) and vmstat supports this:
$ vmstat 1
   procs                      memory    swap          io     system         cpur  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id0  1  0      0  11288  94632 3558960   0   0    14     6   12    21   1   0   60  1  0      0  12044  94632 3558956   0   0     0   972  332    16   0   1  990  1  0      0  11092  94632 3558932   0   0    16  4420  309    25   0   2  970  1  0      0  11456  94636 3558928   0   0     0   980  326    23   0   1  991  0  0      0  12340  94636 3558924   0   0    16   532  329    14   0   0 1000  1  0      0  12300  94636 3558916   0   0     0  1376  324    16   1   0  990  1  0      0  12252  94636 3558904   0   0    16  1888  325    18   0   0  990  1  0      0  11452  94636 3558888   0   0    16  2864  324    23   1   1  980  1  0      0  12172  94636 3558884   0   0     0   940  320    12   0   1  990  1  0      0  12180  94636 3558872   0   0    16  1840  318    22   0   1  990  1  0      0  11588  94636 3558856   0   0     0  2752  312    16   1   2  97

Presumably the large number of blocks written (bo) versus blocks read (bi) reflects an enormous amount of bookkeeping that has to be done for MVCC, logging, perhaps rewriting a row for the new definition (a guess -- I don't know how this is handled), indicies, etc. There's no swapping and no processes are waiting. In short, it seems that this is ENTIRELY an I/O issue. Obviously, faster drives will help (but probably only by small factor).

Any ideas how I might speed this up? Presumably this is all getting wrapped in a transaction -- does that hurt me for such a large update?

Thanks,
Reece


Bonus diversionary topic: In case it's not obvious, the motivation for this is that the subtable (paprospect2) contains a column (run_id) whose definition I would like to migrate to the inherited table (i.e., the 'super-table'). Although postgresql permits adding a column to a supertable with the same name as an extant column in a subtable, it appears that such "merged definition" columns do not have the same properties as a typical inherited column. In particular, dropping the column from the supertable does not drop it from the subtable (but renaming it does change both names). Hmm.

-- 
Reece Hart, Ph.D.                       rkh@gene.com, http://www.gene.com/
Genentech, Inc.                         650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93                        http://www.in-machina.com/~reece/
South San Francisco, CA  94080-4990     reece@in-machina.com, GPG: 0x25EC91A0

Re: [PERFORM] slow table updates

From
Reece Hart
Date:
Richard-

Thanks for the suggestions. I too had thought about the FK checks, even though the columns aren't getting updated.

I'm flabbergasted that the update is still running (~22 hours elapsed). By comparison, the database takes only 4 hours to recreate from backup! Something funny is happening here. I just interrupted the update and will find another way.

But hang on, what's this:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
30164 compbio   25   0  6056 6056  3568 R    98.8  0.1  29:55 postgres: admin csb [local] COPY

I am the only user and I'm not doing a copy... this must be part of the update process. Does anyone out there know whether updates do a table copy instead of in-table udpating (perhaps as a special case for whole-table updates)?

Of course, I can't help but wonder whether I just killed it when it was nearly done...

Thanks,
Reece

-- 
Reece Hart, Ph.D.                       rkh@gene.com, http://www.gene.com/
Genentech, Inc.                         650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93                        http://www.in-machina.com/~reece/
South San Francisco, CA  94080-4990     reece@in-machina.com, GPG: 0x25EC91A0