Thread: Guideline To Resolve LWLock:SubtransControlLock

Guideline To Resolve LWLock:SubtransControlLock

From

Fred Habash

Date:

16 August 2018, 21:19:11

One of our database API's is run concurrently by near 40 sessions. We see all of them waiting back and forth on this wait state.

There is one scenario described in some forum where sessions connected a read-only replica are affected. This does not apply to our use case.

Why is it called Subtrans Control Lock?

What are the common user session scenarios causing this wait?

- I have read some describe the use of SQL savepoints or PL/pgSQL exception handling.

What are known resolution measures?

----------------------------------------
Thank you

Re: Guideline To Resolve LWLock:SubtransControlLock

From

Alvaro Herrera

Date:

16 August 2018, 21:36:32

On 2018-Aug-16, Fred Habash wrote:

> One of our database API's is run concurrently by near 40 sessions. We see
> all of them waiting back and forth on this wait state.

What version are you running?

> Why is it called Subtrans Control Lock?

It controls access to the pg_subtrans structure, which is used to record
parent/child transaction relationships (as you say, savepoints and
EXCEPTIONs in plpgsql are the most common uses, but not the only ones).
Normally lookup of these is optimized away, but once you cross a
threshold it cannot any longer.

> What are the common user session scenarios causing this wait?
>   - I have read some describe the use of SQL savepoints or PL/pgSQL
> exception handling.
> What are known resolution measures?

Are you in a position to recompile Postgres?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Guideline To Resolve LWLock:SubtransControlLock

From

Fred Habash

Date:

17 August 2018, 21:07:12

Aurora Postgres 9.6.3

So, no chance to recompile (AFAIK).

Is there a design anti-pattern at the schema or data access level that we should look for and correct?

And as for the recompile, are you thinking 'NUM_SUBTRANS_BUFFERS'?

Thanks

On Thu, Aug 16, 2018 at 2:36 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2018-Aug-16, Fred Habash wrote:

> One of our database API's is run concurrently by near 40 sessions. We see
> all of them waiting back and forth on this wait state.

What version are you running?

> Why is it called Subtrans Control Lock?

It controls access to the pg_subtrans structure, which is used to record
parent/child transaction relationships (as you say, savepoints and
EXCEPTIONs in plpgsql are the most common uses, but not the only ones).
Normally lookup of these is optimized away, but once you cross a
threshold it cannot any longer.

> What are the common user session scenarios causing this wait?
> - I have read some describe the use of SQL savepoints or PL/pgSQL
> exception handling.
> What are known resolution measures?

Are you in a position to recompile Postgres?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

----------------------------------------
Thank you

Re: Guideline To Resolve LWLock:SubtransControlLock

From

Alvaro Herrera

Date:

17 August 2018, 21:26:45

On 2018-Aug-17, Fred Habash wrote:

> Aurora Postgres 9.6.3

Oh, okay, I don't know this one.  Did you contact Amazon support?

> So, no chance to recompile (AFAIK).
> Is there a design anti-pattern at the schema or data access level that we
> should look for and correct?

Maybe ...

> And as for the recompile, are you thinking 'NUM_SUBTRANS_BUFFERS'?

Yes, that's one option, but there's also TOTAL_MAX_CACHED_SUBXIDS.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Guideline To Resolve LWLock:SubtransControlLock

From

Fred Habash

Date:

20 August 2018, 23:52:47

Thanks.

How do we go about calculating appropriate values for these two parameters ...

'NUM_SUBTRANS_BUFFERS'?

TOTAL_MAX_CACHED_SUBXIDS

And do both require a recompile?

————-

Thank you.

On Aug 17, 2018, at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

And as for the recompile, are you thinking 'NUM_SUBTRANS_BUFFERS'?

Yes, that's one option, but there's also TOTAL_MAX_CACHED_SUBXIDS.

Re: Guideline To Resolve LWLock:SubtransControlLock

From

Alvaro Herrera

Date:

21 August 2018, 00:00:27

On 2018-Aug-20, Fred Habash wrote:

> How do we go about calculating appropriate values for these two parameters ...

I don't know a lot about your system, so don't have anything to go on.
Also, Aurora is mostly unknown to me.  What did Amazon say?

> > 'NUM_SUBTRANS_BUFFERS'?
> TOTAL_MAX_CACHED_SUBXIDS
> 
> And do both require a recompile?

Yes.  But maybe they'll just move the contention point a little bit
backwards without actually fixing anything.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Guideline To Resolve LWLock:SubtransControlLock

From

Jeremy Schneider

Date:

21 August 2018, 01:14:49

On 8/17/18 11:07, Fred Habash wrote:
> Aurora Postgres 9.6.3

Hi Fred! The Amazon team does watch the AWS forums and that's the place
to raise questions that are specific to PostgreSQL on RDS or questions
specific to Aurora. In fact we would love to see this question over
there since it might be something other people see as well.

https://forums.aws.amazon.com/forum.jspa?forumID=227

That said... FWIW, Aurora PostgreSQL version 9.6.3 uses parent/child
transaction relationships pretty much the same way that community
PostgreSQL 9.6.3 does. The uses you pointed out (savepoints and
exceptions in plpgsql) are the most common causes of contention I've
seen - similar to what Alvaro said his experience is. I have seen
applications grind to a halt on SubtransControlLock when they make heavy
use of exception blocks in plpgsql code; in fact it's pretty
straightforward to demonstrate this behavior with pgbench on community
PostgreSQL.

On 8/20/18 14:00, Alvaro Herrera wrote:
>> And do both require a recompile?
>
> Yes.  But maybe they'll just move the contention point a little bit
> backwards without actually fixing anything.

When it comes to resolution, I agree with Alvaro's assessment here;
unfortunately, I don't know of a great solution on community PostgreSQL
outside of trying to reduce the use of exception blocks in your plpgsql
code. Increasing the cache size can give a little more head room but
doesn't move the contention point significantly. That single global
control lock is hard to get around when you try to use subtransactions
at scale.

-Jeremy

P.S. This applies on the Aurora PostgreSQL 9.6.3 build too but I'm
discussing here in the context of community PostgreSQL code and we can
put further Aurora-specific discussion on the AWS forums.

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

RE: Guideline To Resolve LWLock:SubtransControlLock

From

Fd Habash

Date:

22 August 2018, 18:48:01

Thanks, Jeremy …

“ That said... FWIW, Aurora PostgreSQL version 9.6.3 uses parent/child transaction relationships pretty much the same way that community PostgreSQL 9.6.3 does …”

This is why I posted here first. This particular wait state did not appear to be Aurora specific and was not listed as part of https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/AuroraPostgreSQL.Reference.html#AuroraPostgreSQL.Reference.Waitevents

I go back and forth posting issues between the two forums depending on the nature of it.

----------------
Thank you

From: Jeremy Schneider
Sent: Monday, August 20, 2018 6:19 PM
To: Fred Habash
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Guideline To Resolve LWLock:SubtransControlLock

On 8/17/18 11:07, Fred Habash wrote:

> Aurora Postgres 9.6.3

Hi Fred! The Amazon team does watch the AWS forums and that's the place

to raise questions that are specific to PostgreSQL on RDS or questions

specific to Aurora. In fact we would love to see this question over

there since it might be something other people see as well.

https://forums.aws.amazon.com/forum.jspa?forumID=227

That said... FWIW, Aurora PostgreSQL version 9.6.3 uses parent/child

transaction relationships pretty much the same way that community

PostgreSQL 9.6.3 does. The uses you pointed out (savepoints and

exceptions in plpgsql) are the most common causes of contention I've

seen - similar to what Alvaro said his experience is. I have seen

applications grind to a halt on SubtransControlLock when they make heavy

use of exception blocks in plpgsql code; in fact it's pretty

straightforward to demonstrate this behavior with pgbench on community

PostgreSQL.

On 8/20/18 14:00, Alvaro Herrera wrote:

>> And do both require a recompile?

> Yes. But maybe they'll just move the contention point a little bit

> backwards without actually fixing anything.

When it comes to resolution, I agree with Alvaro's assessment here;

unfortunately, I don't know of a great solution on community PostgreSQL

outside of trying to reduce the use of exception blocks in your plpgsql

code. Increasing the cache size can give a little more head room but

doesn't move the contention point significantly. That single global

control lock is hard to get around when you try to use subtransactions

at scale.

-Jeremy

P.S. This applies on the Aurora PostgreSQL 9.6.3 build too but I'm

discussing here in the context of community PostgreSQL code and we can

put further Aurora-specific discussion on the AWS forums.

Jeremy Schneider

Database Engineer

Amazon Web Services

RE: Guideline To Resolve LWLock:SubtransControlLock

From

Fd Habash

Date:

22 August 2018, 23:07:53

Jeremy …

In your statement, what constitutes ‘heavy use of exception blocks’?

Thanks

I have seen