Home > mailing lists

Re: How to avoid UPDATE performance degradation in a transaction - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: How to avoid UPDATE performance degradation in a transaction
Date	February 14, 2020 00:16:14
Msg-id	7624.1581628574@sss.pgh.pa.us Whole thread Raw
In response to	How to avoid UPDATE performance degradation in a transaction (Karl Düüna <karl.dyyna@gmail.com>)
Responses	Re: How to avoid UPDATE performance degradation in a transaction (Karl Düüna <karl.dyyna@gmail.com>) Re: How to avoid UPDATE performance degradation in a transaction (Andres Freund <andres@anarazel.de>)
List	pgsql-performance

Tree view

=?UTF-8?B?S2FybCBEw7zDvG5h?= <karl.dyyna@gmail.com> writes:
> -- TL; DR;
> UPDATE on a row takes relatively constant amount of time outside a
> transaction block, but running UPDATE on a single row over and over inside
> a transaction gets slower and slower as the number of UPDATE operations
> increases.

Yeah, that's unsurprising.  Each new update creates a new version of
its row.  When you do them in separate transactions, then as soon as
transaction N+1 commits the system can recognize that the row version
created by transaction N is dead (no longer visible to anybody) and
recycle it, allowing the number of row versions present on-disk to
stay more or less constant.  However, there's not equivalently good
housekeeping for row versions created by a transaction that's still
running.  So when you do N updates in one transaction, there are going
to be N doomed-but-not-yet-recyclable row versions on disk.

Aside from the disk-space bloat, this is bad because the later updates
have to scan through all the row versions created by earlier updates,
looking for the version they're supposed to update.  So you have an O(N^2)
cost associated with that, which no doubt is what you're observing.

There isn't any really good fix for this, other than "don't do that".
David's nearby suggestion of using a temp table won't help, because
this behavior is the same whether the table is temp or regular.

In principle perhaps we could improve the granularity of dead-row
detection, so that if a row version is both created and deleted by
the current transaction, and we have no live snapshots that could
see it, we could go ahead and mark the row dead.  But it's not clear
that that'd be worth the extra cost to do.  Certainly no existing PG
release tries to do it.

            regards, tom lane

pgsql-performance by date:

From: "David G. Johnston"
Date: 13 February 2020, 23:50:38
Subject: Re: How to avoid UPDATE performance degradation in a transaction

From: Karl Düüna
Date: 14 February 2020, 10:14:57
Subject: Re: How to avoid UPDATE performance degradation in a transaction

Re: How to avoid UPDATE performance degradation in a transaction - Mailing list pgsql-performance

Previous

Next