Re: scalability bottlenecks with (many) partitions (and more) - Mailing list pgsql-hackers
From | Jakub Wartak |
---|---|
Subject | Re: scalability bottlenecks with (many) partitions (and more) |
Date | |
Msg-id | CAKZiRmyML5tE67n8SLbiU=kKbVROqb33NGT-=kNv1Vcv1dTuAA@mail.gmail.com Whole thread Raw |
In response to | Re: scalability bottlenecks with (many) partitions (and more) (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: scalability bottlenecks with (many) partitions (and more)
|
List | pgsql-hackers |
Hi Tomas! On Tue, Sep 3, 2024 at 6:20 PM Tomas Vondra <tomas@vondra.me> wrote: > > On 9/3/24 17:06, Robert Haas wrote: > > On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote: > >> The one argument to not tie this to max_locks_per_transaction is the > >> vastly different "per element" memory requirements. If you add one entry > >> to max_locks_per_transaction, that adds LOCK which is a whopping 152B. > >> OTOH one fast-path entry is ~5B, give or take. That's a pretty big > >> difference, and it if the locks fit into the shared lock table, but > >> you'd like to allow more fast-path locks, having to increase > >> max_locks_per_transaction is not great - pretty wastefull. > >> > >> OTOH I'd really hate to just add another GUC and hope the users will > >> magically know how to set it correctly. That's pretty unlikely, IMO. I > >> myself wouldn't know what a good value is, I think. > >> > >> But say we add a GUC and set it to -1 by default, in which case it just > >> inherits the max_locks_per_transaction value. And then also provide some > >> basic metric about this fast-path cache, so that people can tune this? > > > > All things being equal, I would prefer not to add another GUC for > > this, but we might need it. > > > > Agreed. > > [..] > > So I think I'm OK with just tying this to max_locks_per_transaction. If that matters then the SLRU configurability effort added 7 GUCs (with 3 scaling up based on shared_buffers) just to give high-end users some relief, so here 1 new shouldn't be that such a deal. We could add to the LWLock/lock_manager wait event docs to recommend just using known-to-be-good certain values from this $thread (or ask the user to benchmark it himself). > >> I think just knowing the "hit ratio" would be enough, i.e. counters for > >> how often it fits into the fast-path array, and how often we had to > >> promote it to the shared lock table would be enough, no? > > > > Yeah, probably. I mean, that won't tell you how big it needs to be, > > but it will tell you whether it's big enough. > > > > True, but that applies to all "cache hit ratio" metrics (like for our > shared buffers). It'd be great to have something better, enough to tell > you how large the cache needs to be. But we don't :-( My $0.02 cents: the originating case that triggered those patches, actually started with LWLock/lock_manager waits being the top#1. The operator can cross check (join) that with a group by pg_locks.fastpath (='f'), count(*). So, IMHO we have good observability in this case (rare thing to say!) > > I wonder if we should be looking at further improvements in the lock > > manager of some kind. [..] > > Perhaps. I agree we'll probably need something more radical soon, not > just changes that aim to fix some rare exceptional case (which may be > annoying, but not particularly harmful for the complete workload). > > For example, if we did what you propose, that might help when very few > transactions need a lot of locks. I don't mind saving memory in that > case, ofc. but is it a problem if those rare cases are a bit slower? > Shouldn't we focus more on cases where many locks are common? Because > people are simply going to use partitioning, a lot of indexes, etc? > > So yeah, I agree we probably need a more fundamental rethink. I don't > think we can just keep optimizing the current approach, there's a limit > of fast it can be. Please help me understand: so are You both discussing potential far future further improvements instead of this one ? My question is really about: is the patchset good enough or are you considering some other new effort instead? BTW some other random questions: Q1. I've been lurking into https://github.com/tvondra/pg-lock-scalability-results and those shouldn't be used anymore for further discussions, as they contained earlier patches (including 0003-Add-a-memory-pool-with-adaptive-rebalancing.patch) and they were replaced by benchmark data in this $thread, right? Q2. Earlier attempts did contain a mempool patch to get those nice numbers (or was that jemalloc or glibc tuning). So were those recent results in [1] collected with still 0003 or you have switched completely to glibc/jemalloc tuning? -J. [1] - https://www.postgresql.org/message-id/b8c43eda-0c3f-4cb4-809b-841fa5c40ada%40vondra.me
pgsql-hackers by date: