Home > mailing lists

Re: Batch update query performance - Mailing list pgsql-performance

From	Albe Laurenz
Subject	Re: Batch update query performance
Date	April 7, 2014 12:06:42
Msg-id	A737B7A37273E048B164557ADEF4A58B17CEBFC1@ntex2010i.host.magwien.gv.at Whole thread
In response to	Batch update query performance (Hans Drexler <Hans.Drexler@HumanInference.com>)
Responses	Re: Batch update query performance
List	pgsql-performance

Tree view

Hans Drexler wrote:
> We are porting an application to PostgreSQL. The appplication already
> runs with DB2 (LUW version) and Oracle. One query in particular executes
> slower on Postgres than it does on other Database platforms, notably DB2
> LUW and Oracle. (Please understand, we are not comparing databases here,
> we are porting an application. We do not mean to start a flame war
> here).

[...]

> Postgres needs close to 50
> minutes to process the same query on the same data. Sometimes, Postgres
> needs more than 2 hours.
> 
> The application performs an update query on every row
> of the table. The exact SQL of this query is:
> 
> update t67cdi_nl_cmp_descr set is_grc_002='Y'

[...]

> We tried removing all indexes. That reduces the runtime to ~3 minutes.
> When we start to put indexes back, the run time of the query increases
> again with each index added.

Do I read that right that the duration of the update is reduced from
50 or 120 minutes to 3 when you drop all the indexes?

[...]

> Hypothesis
> we have tried many things to solve this problem ourselves, but to no
> avail so far. Our hypothesis is that
> the Postgres creates new records for all rows and then needs to update
> all 15 indexes to make them point to the new rows. There does not seem
> to be a way to avoid that.
> 
> Question:
> - Is our hypothesis correct?
> - Can the forum please advise us on possible ways to make the query
> faster?

Your hypothesis may be correct.
What you could do is to create the table will a fillfactor of 50 or less
before populating it.  Then 50% of the space will be left empty and
could be used to put the updated data on the same page as the original
data.  That way you could take advantage of HOT (heap only tuples)
which will avoid the need to update indexes that do not reference the
updated column.

If I count right, you have got 15 indexes on this table.
Maybe you could check if you need them all.

Yours,
Laurenz Albe

pgsql-performance by date:

From: Hans Drexler
Date: 07 April 2014, 11:03:57
Subject: Batch update query performance

From: Nicolas Paris
Date: 07 April 2014, 12:49:23
Subject: Re: PGSQL 9.3 - Materialized View - multithreading

Re: Batch update query performance - Mailing list pgsql-performance

Previous

Next