Re: scalability bottlenecks with (many) partitions (and more) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: scalability bottlenecks with (many) partitions (and more) |
Date | |
Msg-id | CA+TgmoabtPvTffKQiqKg9qOmyt-sYzjQW+TyFWf1CBN7M4hggQ@mail.gmail.com Whole thread Raw |
In response to | scalability bottlenecks with (many) partitions (and more) (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: scalability bottlenecks with (many) partitions (and more)
|
List | pgsql-hackers |
On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The > LWLock table has 16 partitions by default - it's quite possible that on > machine with many cores and/or many partitions, we can easily hit this. > So I bumped this 4x to 64 partitions. I think this probably makes sense. I'm a little worried that we're just kicking the can down the road here where maybe we should be solving the problem in some more fundamental way, and I'm also a little worried that we might be reducing single-core performance. But it's probably fine. > What I ended up doing is having a hash table of 16-element arrays. There > are 64 "pieces", each essentially the (16 x OID + UINT64 bitmap) that we > have now. Each OID is mapped to exactly one of these parts as if in a > hash table, and in each of those 16-element parts we do exactly the same > thing we do now (linear search, removal, etc.). This works great, the > locality is great, etc. The one disadvantage is this makes PGPROC > larger, but I did a lot of benchmarks and I haven't seen any regression > that I could attribute to this. (More about this later.) I think this is a reasonable approach. Some comments: - FastPathLocalUseInitialized seems unnecessary to me; the contents of an uninitialized local variable are undefined, but an uninitialized global variable always starts out zeroed. - You need comments in various places, including here, where someone is certain to have questions about the algorithm and choice of constants: +#define FAST_PATH_LOCK_REL_GROUP(rel) (((uint64) (rel) * 7883 + 4481) % FP_LOCK_GROUPS_PER_BACKEND) When I originally coded up the fast-path locking stuff, I supposed that we couldn't make the number of slots too big because the algorithm requires a linear search of the whole array. But with this one trick (a partially-associative cache), there's no real reason that I can think of why you can't make the number of slots almost arbitrarily large. At some point you're going to use too much memory, and probably before that point you're going to make the cache big enough that it doesn't fit in the CPU cache of an individual core, at which point possibly it will stop working as well. But honestly ... I don't quite see why this approach couldn't be scaled quite far. Like, if we raised FP_LOCK_GROUPS_PER_BACKEND from your proposed value of 64 to say 65536, would that still perform well? I'm not saying we should do that, because that's probably a silly amount of memory to use for this, but the point is that when you start to have enough partitions that you run out of lock slots, performance is going to degrade, so you can imagine wanting to try to have enough lock groups to make that unlikely. Which leads me to wonder if there's any particular number of lock groups that is clearly "too many" or whether it's just about how much memory we want to use. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: