Re: Slow standby snapshot - Mailing list pgsql-hackers
From | Michail Nikolaev |
---|---|
Subject | Re: Slow standby snapshot |
Date | |
Msg-id | CANtu0oh_ytfAgRYOSfQP49eFZv7qRFH+zdDB9=Bz0e7DQj5VUA@mail.gmail.com Whole thread Raw |
In response to | Re: Slow standby snapshot (Kirill Reshke <reshkekirill@gmail.com>) |
Responses |
Re: Slow standby snapshot
|
List | pgsql-hackers |
)Hello. > I recently ran into a problem in one of our production postgresql cluster. > I had noticed lock contention on procarray lock on standby, which causes WAL > replay lag growth. Yes, I saw the same issue on my production cluster. > 1) set max_connections to big number, like 100000 I made the tests with a more realistic value - 5000. It is valid value for Amazon RDS for example (default is LEAST({DBInstanceClassMemory/9531392}, 5000)). The test looks like this: pgbench -i -s 10 -U postgres -d postgres pgbench -b select-only -p 6543 -j 1 -c 50 -n -P 1 -T 18000 -U postgres postgres pgbench -b simple-update -j 1 -c 50 -n -P 1 -T 18000 -U postgres postgres long transaction on primary - begin;select txid_current(); perf top -p <pid of some standby> So, on postgres 14 (master) non-patched version looks like this: 5.13% postgres [.] KnownAssignedXidsGetAndSetXmin 4.61% postgres [.] pg_checksum_block 2.54% postgres [.] AllocSetAlloc 2.44% postgres [.] base_yyparse It is too much to spend 5-6% of CPU running throw an array :) I think it should be fixed for both the 13 and 14 versions. The patched version like this (was unable to notice KnownAssignedXidsGetAndSetXmin): 3.08% postgres [.] pg_checksum_block 2.89% postgres [.] AllocSetAlloc 2.66% postgres [.] base_yyparse 2.00% postgres [.] MemoryContextAllocZeroAligned On postgres 13 non patched version looks even worse (definitely need to be fixed in my opinion): 26.44% postgres [.] KnownAssignedXidsGetAndSetXmin 2.17% postgres [.] base_yyparse 2.01% postgres [.] AllocSetAlloc 1.55% postgres [.] MemoryContextAllocZeroAligned But your patch does not apply to REL_13_STABLE. Could you please provide two versions? Also, there are warnings while building with patch: procarray.c:4595:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] 4595 | int prv = -1; | ^~~ procarray.c: In function ‘KnownAssignedXidsGetOldestXmin’: procarray.c:5056:5: warning: variable ‘tail’ set but not used [-Wunused-but-set-variable] 5056 | tail; | ^~~~ procarray.c:5067:38: warning: ‘i’ is used uninitialized in this function [-Wuninitialized] 5067 | i = KnownAssignedXidsValidDLL[i].nxt; Some of them are clear errors, so, please recheck the code. Also, maybe it is better to reduce the invasivity by using a more simple approach. For example, use the first bit to mark xid as valid and the last 7 bit (128 values) as an optimistic offset to the next valid xid (jump by 127 steps in the worse scenario). What do you think? Also, it is a good idea to register the patch in the commitfest app (https://commitfest.postgresql.org/). Thanks, Michail.
pgsql-hackers by date: