Re: SubtransControlLock and performance problems - Mailing list pgsql-performance

From Lars Aksel Opsahl
Subject Re: SubtransControlLock and performance problems
Date
Msg-id HE1P189MB026604DB026FA6353982574B9D130@HE1P189MB0266.EURP189.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: SubtransControlLock and performance problems  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-performance
Hi


>From: Alvaro Herrera <alvherre@2ndquadrant.com>

>Sent: Wednesday, February 19, 2020 4:23 PM

>To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>

>Cc: Laurenz Albe <laurenz.albe@cybertec.at>; Pavel Stehule <pavel.stehule@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: SubtransControlLock and performance problems

> 

>On 2020-Feb-19, Lars Aksel Opsahl wrote:

>

>> With the values above I did see same performance problems and we ended

>> with a lot of subtransControlLock.

>> 

>> So I started to change the code based on your feedbacks.

>> 

>> - What seems to work very good in combination with a catch exception

>> and retry pattern is to insert the data in to separate table for each

>> job. (I the current testcase we reduced the number of

>> subtransControlLock from many hundreds to almost none.)

>

>I think at this point your only recourse is to start taking profiles to

>see where the time is going.  Without that, you're just flying blind and

>whatever you do will not necessarily move your needle at all.


Hi

Yes I totally agree with you and yes I have tried to do some profiling and testing while developing.

From the worst case to best case the time is reduced 15 times (from 300 minutes to 20 minutes) when testing a small dataset for with 619230 surface (25909671 total line points) with the test below “resolve_overlap_gap_run('org_jm.jm_ukomm_flate','figurid','geo',4258,false,'test_topo_jm',0.000001,31,3000); “

The reason for this seems to be related to the problems described by Laurenz Albe related to how Postgres handles try and catch and sub transactions, which I did not know about. If we don't have this is mind and we start to get subtranslocks it seems to kill the performance in some cases.

In this test I ran with 31 parallel threads which is very high on a server with only 32 cores and maybe not realistic. I just did this now see what happens when I try to push a server to it’s limits and maximise the performance increase. If I reduce this to 1 single thread, there should be now difference and if run on 16 threads the difference would much much smaller. 

I will now start to run on datasets which are 10 times bigger to check how thing scales, but then run with around maybe 28 parallel jobs. 

The two branches I have tested on now which should show the main difference are here.

https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_TopoGeo_addLinestringwhich is the faster one.

https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology which is slower one, but here I have now added a check on number of subtranslocks before I kick of new jobs and that reduced time form 9 hours  to 3 hours.


Thanks.


Lars





 

pgsql-performance by date:

Previous
From: Andres Freund
Date:
Subject: Re: How to avoid UPDATE performance degradation in a transaction
Next
From: Jeff Janes
Date:
Subject: Re: tablespace to benefit from ssd ?