Re: High COMMIT times - Mailing list pgsql-performance

From Craig Jackson
Subject Re: High COMMIT times
Date
Msg-id CA+R1LV7+tC4KVcVg610_cLTBeZ_ue9U8w_3vwti2cXxPmStyYg@mail.gmail.com
Whole thread Raw
In response to Re: High COMMIT times  (Don Seiler <don@seiler.us>)
Responses Re: High COMMIT times
List pgsql-performance
How far apart are the min/max connection settings on your application connection pool? We had a similar issue with connection storms in the past on Oracle. One thing we did to minimize the storms was make sure there was not a wide gap between the min/max, say no more than a 5-10 connection difference between min/max.

Regards,

Craig 

On Sun, Jan 10, 2021 at 5:42 PM Don Seiler <don@seiler.us> wrote:
On Sat, Jan 9, 2021 at 2:07 PM Jeff Janes <jeff.janes@gmail.com> wrote:

How are you monitoring the COMMIT times?  What do you generally see in pg_stat_activity.wait_event during the spikes/stalls?

Right now we just observe the COMMIT duration posted in the postgresql log (we log anything over 100ms).

One other thing that I shamefully forgot to mention. When we see these slow COMMITs in the log, they coincide with a connection storm (Cat 5 hurricane) from our apps where connections will go from ~200 to ~1200. This will probably disgust many, but our PG server's max_connections is set to 2000. We have a set of pgbouncers in front of this with a total max_db_connections of 1600. I know many of you think this defeats the whole purpose of having pgbouncer and I agree. I've been trying to explain as much and that even with 32 CPUs on this DB host, we probably shouldn't expect to be able to support more than 100-200 active connections, let alone 1600. I'm still pushing to have our app server instances (which also use their own JDBC (Hikari) connection pool and *then* go through pgbouncer) to lower their min/max connection settings but obviously it's sort of counterintuitive at first glance but hopefully everyone sees the bigger picture.

One nagging question I have is if the slow COMMIT is triggering the connection storm (eg app sees slow response or timeout from a current connection and fires off a new connection in its place), or vice-versa. We're planning to deploy new performant cloud storage (Azure Ultra disk) just for WAL logs but I'm hesitant to say it'll be a silver bullet when we still have this insane connection management strategy in place.

Curious to know what others think (please pull no punches) and if others have been in a similar scenario with anecdotes to share.

Thanks,
Don.

--
Don Seiler
www.seiler.us


--
Craig 

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Attachment

pgsql-performance by date:

Previous
From: Don Seiler
Date:
Subject: Re: High COMMIT times
Next
From: Don Seiler
Date:
Subject: Re: High COMMIT times