Thread: The fastest way to update thousands of rows in moderately sized table

The fastest way to update thousands of rows in moderately sized table

From
twoflower
Date:
Hello, I have a table with 30 million records in which I need to update a single column for a couple of thousands of rows, let's say 10 000. The new column value is identical for all matching rows. Doing

update "TRANSLATION" set fk_assignment where fk_job = 1000;

takes 45 seconds. I understand that UPDATE is basically an INSERT followed by DELETE but I was hoping I could do better than that. I found a suggestion to use a temporary table to speed things up, so now I have this:

create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;

update "temp_table" set fk_assignment = null;

update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;

drop table "temp_table";


This got me to about 37 seconds. Still pretty slow. The TRANSLATION has an index and a foreign key constraint on fk_assignment. Removing the constraint brought very little benefit. Removing the index is probably out of question as these kind of operations are very frequent and the table itself is used heavily, including the index. Execution plan:

Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983 width=405) (actual time=43262.266..43262.266 rows=0 loops=1)
 -> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual time=0.566..146.084 rows=8920 loops=1)
   -> Seq Scan on temp_segs _source (cost=0.00..218.83 rows=13983 width=22) (actual time=0.457..13.994 rows=8920 loops=1)
   -> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" _target (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1 loops=8920)
      Index Cond: (id = _source.id)

Planning time: 1.167 ms
Execution time: 43262.577 ms

Is there anything else worth trying? Are these numbers something to be expected, from your experience?

I have Postgres 9.4, the database is on SSD.

Thank you very much for any suggestions.

Standa

View this message in context: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: The fastest way to update thousands of rows in moderately sized table

From
Adrian Klaver
Date:
On 07/23/2015 01:17 PM, twoflower wrote:
> Hello, I have a table with 30 million records in which I need to update
> a single column for a couple of thousands of rows, let's say 10 000. The
> new column value is identical for all matching rows. Doing
>
> |update "TRANSLATION" set fk_assignment where fk_job = 1000; |
>
> takes 45 seconds. I understand that |UPDATE| is basically an |INSERT|
> followed by |DELETE| but I was hoping I could do better than that. I
> found a suggestion to use a temporary table to speed things up, so now I
> have this:
>
> |create unlogged table "temp_table" as
> select id, fk_assignment
> from "TRANSLATION"
> where fk_job = 1000;
>
> update "temp_table" set fk_assignment = null;
>
> update "TRANSLATION" _target
> set fk_assignment = _source.fk_assignment
> from "temp_table" _source
> where _target.id = _source.id;
>
> drop table "temp_table";

Have you tried wrapping the above in a BEGIN/COMMIT block?

So:

BEGIN;

create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;

update "temp_table" set fk_assignment = null;

update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;

drop table "temp_table";

COMMIT;

> |
>
> This got me to about 37 seconds. Still pretty slow. The |TRANSLATION|
> has an index and a foreign key constraint on |fk_assignment|. Removing
> the constraint brought very little benefit. Removing the index is
> probably out of question as these kind of operations are very frequent
> and the table itself is used heavily, including the index. Execution plan:
>
> |Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983
> width=405) (actual time=43262.266..43262.266 rows=0 loops=1)
>   -> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual
> time=0.566..146.084 rows=8920 loops=1)
>     -> Seq Scan on temp_segs _source (cost=0.00..218.83 rows=13983
> width=22) (actual time=0.457..13.994 rows=8920 loops=1)
>     -> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" _target
> (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1
> loops=8920)
>        Index Cond: (id = _source.id)
>
> Planning time: 1.167 ms
> Execution time: 43262.577 ms
> |
> Is there anything else worth trying? Are these numbers something to be
> expected, from your experience?
>
> I have Postgres 9.4, the database is on SSD.
>
> Thank you very much for any suggestions.
>
> Standa
> ------------------------------------------------------------------------
> View this message in context: The fastest way to update thousands of
> rows in moderately sized table
> <http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144.html>
> Sent from the PostgreSQL - general mailing list archive
> <http://postgresql.nabble.com/PostgreSQL-general-f1843780.html> at
> Nabble.com.


--
Adrian Klaver
adrian.klaver@aklaver.com


Adrian Klaver-4 wrote
Have you tried wrapping the above in a BEGIN/COMMIT block?
Yes, I am running the tests inside a BEGIN TRANSACTION / ROLLBACK block.

View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Try creating an index on TRANSLATION fk_job.

From the Execution plan you posted,
->(actual time=43262.266..43262.266 rows=0 loops=1)
actually, there is no row to be updated.

So, if you have a index on TRANSLATION fk_job, the update sql as behind should be finished within several seconds.
->update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;

林士博 wrote
Try creating an index on TRANSLATION fk_job.
The index is already there.

View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Sorry, my mistake.
Execution plan of an update sql seems always get  rows=0.

Can you post execution plan of the original update sql.
EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;

林士博 wrote
Can you post execution plan of the original update sql. EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;
Here it is:

Update on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=35091.036..35091.036 rows=0 loops=1)
   Buffers: shared hit=74842343 read=7242 dirtied=7513
     -> Index Scan using "TRANSLATION_idx_composite_job_last_revision" on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=0.042..24.147 rows=8920 loops=1)
        Index Cond: (fk_job = 59004)
        Buffers: shared hit=626

Planning time: 0.362 ms Execution time: 35091.192 ms


View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

OK. 
In your case, I can not see any reasons that 
using a temp table to do joint-update is faster than directly updating.

And from the execution plan, index scan just takes very little time.
Most of the time is doing insert/delete.

As you mentioned, fk_assignmentwhere is updated frequently, 
and disabling indexes before updating is not an option you can choose, 
try setting fillfactor of  this table to a smaller value.

And maybe you can also check the time of inserting 8920 rows to this table.
If it is far less than 35s. 
you can consider to do update in this way:
1.create a temp table with all columns of fk_assignmentwhere,
  likes 
  create unlogged table "temp_table" as
  select id, ..... , 1000 as fk_job
  from "TRANSLATION"
  where fk_job = 1000;
2.delete rows from original table.
3.inert rows from temp table to original table.

At the end , may be you can check if postgresql can insert start from the position of HWM(High Water Mark).




2015-07-24 15:58 GMT+09:00 twoflower <standa.kurik@gmail.com>:
林士博 wrote
Can you post execution plan of the original update sql. EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;
Here it is:

Update on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=35091.036..35091.036 rows=0 loops=1)
   Buffers: shared hit=74842343 read=7242 dirtied=7513
     -> Index Scan using "TRANSLATION_idx_composite_job_last_revision" on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=0.042..24.147 rows=8920 loops=1)
        Index Cond: (fk_job = 59004)
        Buffers: shared hit=626

Planning time: 0.362 ms Execution time: 35091.192 ms


View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.



--
─repica group──────────────────
▼ポイント×電子マネー×メールで店舗販促に必要な機能を全て提供!

▼フォローアップメールや外部連携に対応!
【mail solution】http://ms.repica.jp/

▼9年連続シェアNo.1 個人情報漏えい対策ソフト
【P-Pointer】http://ppointer.jp/

▼単月導入可能!AR動画再生アプリ

▼ITビジネスを創造しながら未来を創る
───────────────────────────
Thank you, I will look into those suggestions.

Meanwhile, I started experimenting with partitioning the table into smaller
tables, each holding rows with ID spanning 1 million values and using this
approach, the UPDATE takes 300ms. I have to check if all the SELECTs I am
issuing against the original table keep their performance, but so far it
seems they do, if the appropriate indexes are present on the child tables. I
was worried about the overhead of each query having to go through all
(currently) 58 partition tables, but it seems like it's not that big of a
deal.



--
View this message in context:
http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859203.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Yes, partitioning by fk_job can significantly improve performance of this update.
And all the SELECTs with definited fk_job can be faster.

All you should check carefully is those SELECTs without definited fk_job.



2015-07-24 17:18 GMT+09:00 twoflower <standa.kurik@gmail.com>:
Thank you, I will look into those suggestions.

Meanwhile, I started experimenting with partitioning the table into smaller
tables, each holding rows with ID spanning 1 million values and using this
approach, the UPDATE takes 300ms. I have to check if all the SELECTs I am
issuing against the original table keep their performance, but so far it
seems they do, if the appropriate indexes are present on the child tables. I
was worried about the overhead of each query having to go through all
(currently) 58 partition tables, but it seems like it's not that big of a
deal.



--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859203.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
─repica group──────────────────
▼ポイント×電子マネー×メールで店舗販促に必要な機能を全て提供!

▼フォローアップメールや外部連携に対応!
【mail solution】http://ms.repica.jp/

▼9年連続シェアNo.1 個人情報漏えい対策ソフト
【P-Pointer】http://ppointer.jp/

▼単月導入可能!AR動画再生アプリ

▼ITビジネスを創造しながら未来を創る
───────────────────────────
In fact I did not partition by fk_job but by the id (primary key) instead though partitioning by fk_job was my first idea.

I use various columns when querying the table, fk_job is not always there.

View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Thanks for sharing the results of your experiment !