Re: heavily contended lwlocks with long wait queues scale badly - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: heavily contended lwlocks with long wait queues scale badly
Date
Msg-id 725d5089-11e6-93c8-b962-67c40240451f@postgresql.org
Whole thread Raw
In response to Re: heavily contended lwlocks with long wait queues scale badly  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: heavily contended lwlocks with long wait queues scale badly
List pgsql-hackers
On 11/1/22 8:37 AM, Robert Haas wrote:
> On Tue, Nov 1, 2022 at 3:17 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> Below are test results with v3 patch. +1 for back-patching it.

First, awesome find and proposed solution!

> The problem with back-patching stuff like this is that it can have
> unanticipated consequences. I think that the chances of something like
> this backfiring are less than for a patch that changes plans, but I
> don't think that they're nil, either. It could turn out that this
> patch, which has really promising results on the workloads we've
> tested, harms some other workload due to some other contention pattern
> we can't foresee. It could also turn out that improving performance at
> the database level actually has negative consequences for some
> application using the database, because the application could be
> unknowingly relying on the database to throttle its activity.

If someone is using the database to throttle activity for their app, I 
have a bunch of follow up questions to understand why.

> It's hard for me to estimate exactly what the risk of a patch like
> this is. I think that if we back-patched this, and only this, perhaps
> the chances of something bad happening aren't incredibly high. But if
> we get into the habit of back-patching seemingly-innocuous performance
> improvements, it's only a matter of time before one of them turns out
> not to be so innocuous as we thought. I would guess that the number of
> times we have to back-patch something like this before somebody starts
> complaining about a regression is likely to be somewhere between 3 and
> 5.

Having the privilege of reading through the release notes for every 
update release, on average 1-2 "performance improvements" in each 
release. I believe they tend to be more negligible, though.

I do understand the concerns. Say you discover your workload does have a 
regression with this patch and then there's a CVE that you want to 
accept -- what do you do? Reading the thread / patch, it seems as if 
this is a lower risk "performance fix", but still nonzero.

While this does affect all supported versions, we could also consider 
backpatching only for PG15. That at least 1/ limits impact on users 
running older versions (opting into a major version upgrade) and 2/ 
we're still very early in the major upgrade cycle for PG15 that it's 
lower risk if there are issues.

Users are generally happy when they can perform a simple upgrade and get 
a performance boost, particularly the set of users that this patch 
affects most (high throughput, high connection count). This is the type 
of fix that would make headlines in a major release announcement (10x 
TPS improvement w/4096 connections?!). That is also part of the tradeoff 
of backpatching this, is that we may lose some of the higher visibility 
marketing opportunities to discuss this (though I'm sure there will be 
plenty of blog posts, etc.)

Andres: when you suggested backpatching, were you thinking of the Nov 
2022 release or the Feb 2023 release?

Thanks,

Jonathan

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Direct I/O
Next
From: Andres Freund
Date:
Subject: Re: ResourceOwner refactoring