Thread: Upsert functions spinlock when doing insert first.

Upsert functions spinlock when doing insert first.

From
"Porte, Alexander"
Date:
PostgreSQL version:    9.4.1
Operating system:       CentOS 6.6

We have simple upsert functions part of our ETL process. These upsert funct=
ions execute an insert and update from staging to datawarehouse tables and =
finally mark all these staged rows as processed, all within the same transa=
ction.

In our tests and production we have encountered problems where these functi=
ons get stuck at full CPU with zero disk I/O. We were able to repro the pro=
blem on small dataset (<100'000 rows) with a single thread. We fiddled with=
 settings following best practices, adding large amount of memory, increasi=
ng WAL segments, etc... to no avail.

We identified the problem to be the update statement even though our tests =
insert tens of thousands of rows (millions in production) but only update a=
 handful of rows each time. We thought it might have been a bad query plan =
due to outdated statistics but a full analyze doesn't fix the issue, someti=
me a postgresql restart fixes it but not always.

We might have identified a work around by having the update be done first f=
ollowed by the insert which is semantically equivalent as they do not opera=
te on the same rows. This has been the only reliable fix to avoid this prob=
lem. However this is not a fix and it is concerning to us as it is such a s=
imple use case.


-          Upsert function:                       http://pgsql.privatepaste=
.com/a547f35aa0

-          Staging Table:                            http://pgsql.privatepa=
ste.com/faeafdeb3d

-          Datawarehouse Table:          http://pgsql.privatepaste.com/07a5=
3910e8
Thanks,
Alexander Porte
Sr. Director of Engineering, Analytics
Agilysys, Inc.<http://www.agilysys.com/>
Alexander.Porte@agilysys.com<mailto:Alexander.Porte@agilysys.com>
(o) 425.378.2295 * (m) 425.638.2220
Facebook<https://www.facebook.com/Agilysys?fref=3Dts> | Twitter<https://twi=
tter.com/Agilysys> | LinkedIn<http://www.linkedin.com/company/6142?trk=3Dty=
ah> | Agilysys Blog<http://news.agilysys.com/>

Re: Upsert functions spinlock when doing insert first.

From
Tom Lane
Date:
"Porte, Alexander" <Alexander.Porte@agilysys.com> writes:
> We have simple upsert functions part of our ETL process. These upsert functions execute an insert and update from
stagingto datawarehouse tables and finally mark all these staged rows as processed, all within the same transaction. 

> In our tests and production we have encountered problems where these functions get stuck at full CPU with zero disk
I/O.We were able to repro the problem on small dataset (<100'000 rows) with a single thread. We fiddled with settings
followingbest practices, adding large amount of memory, increasing WAL segments, etc... to no avail. 

This is an interesting report, but you have not really provided enough
detail to let anyone else reproduce the problem.  Without sample data
and a script for how to call the function, we're unlikely to duplicate
your results.

Also, as a general rule, we'd rather that supporting files were attached
to the bug report, so that they get into the pgsql-bugs archives.  A month
from now when those privatepaste entries are gone, this report will mean
nothing to anyone.

            regards, tom lane