Re: batch update query performance - Mailing list pgsql-performance

From Hans Drexler
Subject Re: batch update query performance
Date
Msg-id 1396963239.13361.27.camel@hans-think
Whole thread Raw
Responses Re: batch update query performance  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Dear Jeff, Albe and Heikki,

Let me start by thanking you for your time. It is really nice to have a
real supportive community. Thank you.

After reading the answers, we decided to do an experiment with a
fillfactor of 40% and dropping the index on the is_grc_002 field (but
retaining the other indexes.) The experiment showed a reduction in
run-time to ~125 seconds. That is almost 25 times faster than it was. We
are now doing more tests to verify this fix. We will send a SOLVED
message when the fix is verified (unless you state to not bother...)

We think we understand why the improvement works. Let me state our
understanding here. Please comment if we got it wrong.

Index entries point to record pages. An update on a row results in a new
row instance. If the new instance can be written in the same page as the
old instance, then no indexes need to be updated because the index still
points to the correct page. (Unless the update itself modifies an
indexed value). By specifying a fillfactor of 40%, there will be room
for an updated version of each row in the page.
We assume (sorry) that vacuuming the table will release the space of the
old rows, so that we can again do an update query and reuse the freed up
space in the pages.


Jeff, answering your question: The update is done after each cycle. It
will actually also update rows that were already updated before. We
realize this is actually wasteful.


So we might change

update t67cdi_nl_cmp_descr set is_grc_002='Y'

to

update t67cdi_nl_cmp_descr set is_grc_002='Y' where is_grc_002 is null

This will avoid creating new records for records that where already
changed before. This might give us additional speed improvement.


Albe, answering your question: Yes, the update was indeed finished in ~3
minutes when all indexes were dropped. Fifteen indexes is indeed a big
number. These indexes are configured by consultants on a
project-by-project basis. They are not hard coded in the software. I
will however advice the consultant to have a critical look at the big
number of indexes used in this case.

Heikki, Thank you for the nifty techniques. I especially like the
possible solution with partitioning the table and using a view. We don't
think we can do this at this point. Let me elaborate a bit to explain
why.

The application works in cycles. Each cycles adds more records to the
table. New records have a NULL value in field is_grc_002. At the end of
the cycle, the value of all existing records is changed to 'Y'. During
the cycle, some records have is_grc_002 NULL and other have value 'Y'.
This is used for the processing. A view can only work if all rows have a
single value. We could use two tables (one with records from old cycles,
one with records from the new cycle) and a view. But that means copying
all records from the "new" to the "older" table in each cycle.

Kind regards,

Hans Drexler





pgsql-performance by date:

Previous
From: Ryan Johnson
Date:
Subject: Re: SSI slows down over time
Next
From: Cora Ma
Date:
Subject: Re: performance degradation after launching postgres cluster using pgpool-II