Re: Batch update query performance - Mailing list pgsql-performance

From Albe Laurenz
Subject Re: Batch update query performance
Date
Msg-id A737B7A37273E048B164557ADEF4A58B17CEBFC1@ntex2010i.host.magwien.gv.at
Whole thread Raw
In response to Batch update query performance  (Hans Drexler <Hans.Drexler@HumanInference.com>)
Responses Re: Batch update query performance  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-performance
Hans Drexler wrote:
> We are porting an application to PostgreSQL. The appplication already
> runs with DB2 (LUW version) and Oracle. One query in particular executes
> slower on Postgres than it does on other Database platforms, notably DB2
> LUW and Oracle. (Please understand, we are not comparing databases here,
> we are porting an application. We do not mean to start a flame war
> here).

[...]

> Postgres needs close to 50
> minutes to process the same query on the same data. Sometimes, Postgres
> needs more than 2 hours.
> 
> The application performs an update query on every row
> of the table. The exact SQL of this query is:
> 
> update t67cdi_nl_cmp_descr set is_grc_002='Y'

[...]

> We tried removing all indexes. That reduces the runtime to ~3 minutes.
> When we start to put indexes back, the run time of the query increases
> again with each index added.

Do I read that right that the duration of the update is reduced from
50 or 120 minutes to 3 when you drop all the indexes?

[...]

> Hypothesis
> we have tried many things to solve this problem ourselves, but to no
> avail so far. Our hypothesis is that
> the Postgres creates new records for all rows and then needs to update
> all 15 indexes to make them point to the new rows. There does not seem
> to be a way to avoid that.
> 
> Question:
> - Is our hypothesis correct?
> - Can the forum please advise us on possible ways to make the query
> faster?

Your hypothesis may be correct.
What you could do is to create the table will a fillfactor of 50 or less
before populating it.  Then 50% of the space will be left empty and
could be used to put the updated data on the same page as the original
data.  That way you could take advantage of HOT (heap only tuples)
which will avoid the need to update indexes that do not reference the
updated column.

If I count right, you have got 15 indexes on this table.
Maybe you could check if you need them all.

Yours,
Laurenz Albe

pgsql-performance by date:

Previous
From: Hans Drexler
Date:
Subject: Batch update query performance
Next
From: Nicolas Paris
Date:
Subject: Re: PGSQL 9.3 - Materialized View - multithreading