Re: scalability bottlenecks with (many) partitions (and more) - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: scalability bottlenecks with (many) partitions (and more)
Date
Msg-id CAKZiRmze8L6FeAKNQhZkn4NmUH2WZZyQ46GUm-Xuky5qr5DUoA@mail.gmail.com
Whole thread Raw
In response to scalability bottlenecks with (many) partitions (and more)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: scalability bottlenecks with (many) partitions (and more)
List pgsql-hackers
On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas@vondra.me> wrote:

> [..]

> Anyway, at this point I'm quite happy with this improvement. I didn't
> have any clear plan when to commit this, but I'm considering doing so
> sometime next week, unless someone objects or asks for some additional
> benchmarks etc.

Thank you very much for working on this :)

The only fact that comes to my mind is that we could blow up L2
caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
be like one or two 2MB huge pages more @ common max_connections=1000
x86_64 (830kB -> ~5.1MB), and indeed:

# without patch:
postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
177

# with patch:
postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
178

So playing Devil's advocate , the worst situation that could possibly
hurt (?) could be:
* memory size of PGPROC working set >> L2_cache (thus very high
max_connections),
* insane number of working sessions on CPU (sessions >> VCPU) - sadly
happens to some,
* those sessions wouldn't have to be competing for the same Oids -
just fetching this new big fpLockBits[] structure - so probing a lot
for lots of Oids, but *NOT* having to use futex() syscall [so not that
syscall price]
* no huge pages (to cause dTLB misses)

then maybe(?) one could observe further degradation of dTLB misses in
the perf-stat counter under some microbenchmark, but measuring that
requires isolated and physical hardware. Maybe that would be actually
noise due to overhead of context-switches itself. Just trying to think
out loud, what big PGPROC could cause here. But this is already an
unhealthy and non-steady state of the system, so IMHO we are good,
unless someone comes up with a better (more evil) idea.

>> I did look at docs if anything needs updating, but I don't think so. The
SGML docs only talk about fast-path locking at fairly high level, not
about how many we have etc.

Well the only thing I could think of was to add to the
doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
is also used as advisory for the number of groups used in
lockmanager's fast-path implementation" (that is, without going into
further discussion, as even pg_locks discussion
doc/src/sgml/system-views.sgml simply uses that term).

-J.



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: Junwang Zhao
Date:
Subject: Re: Fix some ubsan/asan related issues