Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance

From Carlo Stonebanks
Subject Re: Massive table (500M rows) update nightmare
Date
Msg-id hi5kpr$184c$1@news.hub.org
Whole thread Raw
In response to Re: Massive table (500M rows) update nightmare  (marcin mank <marcin.mank@gmail.com>)
Responses Re: Massive table (500M rows) update nightmare
Re: Massive table (500M rows) update nightmare
List pgsql-performance
> What is the rationale behind this? How about doing 10k rows in 1
> update, and committing every time?

When we did 10K updates, the application would sometimes appear to have
frozen, and we were concerned that there was a deadlock condition because of
the number of locked rows. While we may have the patience to sit around and
wait five minutes to see if the update would continue, we couldn't risk
having other applications appear frozen if that was the case. In fact, there
is no reason for any user or application to write to the same records we are
writing to - but the audit table is read frequently. We are not explicitly
locking anything, or writing any additional code to arbitrate the lcoking
model anywhere -it's all default UPDATE and SELECT syntax.

Doing the updates in smaller chunks resolved these apparent freezes - or,
more specifically, when the application DID freeze, it didn't do it for more
than 30 seconds. In all likelyhood, this is the OS and the DB thrashing.

We have since modified the updates to process 1000 rows at a time with a
commit every 10 pages. Just this morning, though, the IS manager asked me to
stop the backfill because of the load affect on other processes.

> You could try making the condition on the ctid column, to not have to
> use the index on ID, and process the rows in physical order.

An interesting idea, if I can confirm that the performance problem is
because of the WHERE clause, not the UPDATE.

>'where new_column is null' to the conditions.

Already being done, albeit with a coalesce(val, '') = '' - it's quite
possible that this is hurting the WHERE clause; the EXPLAIN shows the table
using the pkey and then filtering on the COALESCE as one would expect.

Carlo


pgsql-performance by date:

Previous
From: marcin mank
Date:
Subject: Re: Massive table (500M rows) update nightmare
Next
From: Greg Smith
Date:
Subject: Re: Air-traffic benchmark