Re: scalability bottlenecks with (many) partitions (and more) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: scalability bottlenecks with (many) partitions (and more)
Date
Msg-id 0f27b64b-5bf3-4140-98b7-635e312e1796@vondra.me
Whole thread Raw
In response to Re: scalability bottlenecks with (many) partitions (and more)  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: scalability bottlenecks with (many) partitions (and more)
List pgsql-hackers

On 9/16/24 15:11, Jakub Wartak wrote:
> On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas@vondra.me> wrote:
> 
>> [..]
> 
>> Anyway, at this point I'm quite happy with this improvement. I didn't
>> have any clear plan when to commit this, but I'm considering doing so
>> sometime next week, unless someone objects or asks for some additional
>> benchmarks etc.
> 
> Thank you very much for working on this :)
> 
> The only fact that comes to my mind is that we could blow up L2
> caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
> be like one or two 2MB huge pages more @ common max_connections=1000
> x86_64 (830kB -> ~5.1MB), and indeed:
> 
> # without patch:
> postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
> shared_memory_size_in_huge_pages
> 177
> 
> # with patch:
> postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
> shared_memory_size_in_huge_pages
> 178
> 
> So playing Devil's advocate , the worst situation that could possibly
> hurt (?) could be:
> * memory size of PGPROC working set >> L2_cache (thus very high
> max_connections),
> * insane number of working sessions on CPU (sessions >> VCPU) - sadly
> happens to some,
> * those sessions wouldn't have to be competing for the same Oids -
> just fetching this new big fpLockBits[] structure - so probing a lot
> for lots of Oids, but *NOT* having to use futex() syscall [so not that
> syscall price]
> * no huge pages (to cause dTLB misses)
> 
> then maybe(?) one could observe further degradation of dTLB misses in
> the perf-stat counter under some microbenchmark, but measuring that
> requires isolated and physical hardware. Maybe that would be actually
> noise due to overhead of context-switches itself. Just trying to think
> out loud, what big PGPROC could cause here. But this is already an
> unhealthy and non-steady state of the system, so IMHO we are good,
> unless someone comes up with a better (more evil) idea.
> 

I've been thinking about such cases too, but I don't think it can really
happen in practice, because:

- How likely is it that the sessions will need a lot of OIDs, but not
the same ones? Also, why would it matter that the OIDs are not the same,
I don't think it matters unless one of the sessions needs an exclusive
lock, at which point the optimization doesn't really matter.

- If having more fast-path slots means it doesn't fit into L2 cache,
would we fit into L2 without it? I don't think so - if there really are
that many locks, we'd have to add those into the shared lock table, and
there's a lot of extra stuff to keep in memory (relcaches, ...).

This is pretty much one of the cases I focused on in my benchmarking,
and I'm yet to see any regression.


>>> I did look at docs if anything needs updating, but I don't think so. The
> SGML docs only talk about fast-path locking at fairly high level, not
> about how many we have etc.
> 
> Well the only thing I could think of was to add to the
> doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
> is also used as advisory for the number of groups used in
> lockmanager's fast-path implementation" (that is, without going into
> further discussion, as even pg_locks discussion
> doc/src/sgml/system-views.sgml simply uses that term).
> 

Thanks, I'll consider mentioning this in max_locks_per_transaction.
Also, I think there's a place calculating the amount of per-connection
memory, so maybe that needs to be updated too.


regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Jim Jones
Date:
Subject: Re: Psql meta-command conninfo+
Next
From: Alvaro Herrera
Date:
Subject: Re: Psql meta-command conninfo+