Help troubleshooting SubtransControlLock problems - Mailing list pgsql-general

From Scott Frazer
Subject Help troubleshooting SubtransControlLock problems
Date
Msg-id CA+ey=amBhfD4Ascc4yyoKRbh+FUx6wwMkujsZ7Ou+xOY2AwsSg@mail.gmail.com
Whole thread Raw
Responses Re: Help troubleshooting SubtransControlLock problems  (Rene Romero Benavides <rene.romero.b@gmail.com>)
Re: Help troubleshooting SubtransControlLock problems  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-general
Hi, we have a Postgres 9.6 setup using replication that has recently started seeing a lot of processes stuck in "SubtransControlLock" as a wait_event on the read-replicas. Like this, only usually about 300-800 of them:


 179706 | LWLockNamed     | SubtransControlLock

 186602 | LWLockNamed     | SubtransControlLock

 186606 | LWLockNamed     | SubtransControlLock

 180947 | LWLockNamed     | SubtransControlLock

 186621 | LWLockNamed     | SubtransControlLock
The server then begins to crawl, with some queries just never finishing until I finally shut the server down.


Searching for that particular combo of wait_event_type and wait_event only seems to turn up the page about statistics collection, but no helpful information on troubleshooting this lock.

Restarting the replica server clears the locks and allows us to start working again, but it's happened twice now in 12 hours and I'm worried it will happen again.

Does anyone have any advice on where to start looking?

Thanks,
Scott

pgsql-general by date:

Previous
From: pinker
Date:
Subject: Re: dirty_ratio & dirty_background_ratio settings with huge memory
Next
From: Andres Freund
Date:
Subject: Re: dirty_ratio & dirty_background_ratio settings with huge memory