Hi,
I recently ran into a problem in one of our production postgresql cluster. I had noticed lock contention on procarray lock on standby, which causes WAL replay lag growth.
To reproduce this, you can do the following:
1) set max_connections to big number, like 100000
2) begin a transaction on primary
3) start pgbench workload on primary and on standby
After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in perf top consuming abount 75 % of CPU.
%%
PerfTop: 1060 irqs/sec kernel: 0.0% exact: 0.0% [4000Hz cycles:u], (target_pid: 273361)
-------------------------------------------------------------------------------
73.92% postgres [.] KnownAssignedXidsGetAndSetXmin
1.40% postgres [.] base_yyparse
0.96% postgres [.] LWLockAttemptLock
0.84% postgres [.] hash_search_with_hash_value
0.84% postgres [.] AtEOXact_GUC
0.72% postgres [.] ResetAllOptions
0.70% postgres [.] AllocSetAlloc
0.60% postgres [.] _bt_compare
0.55% postgres [.] core_yylex
0.42%
libc-2.27.so [.] __strlen_avx2
0.23% postgres [.] LWLockRelease
0.19% postgres [.] MemoryContextAllocZeroAligned
0.18% postgres [.] expression_tree_walker.part.3
0.18%
libc-2.27.so [.] __memmove_avx_unaligned_erms
0.17% postgres [.] PostgresMain
0.17% postgres [.] palloc
0.17%
libc-2.27.so [.] _int_malloc
0.17% postgres [.] set_config_option
0.17% postgres [.] ScanKeywordLookup
0.16% postgres [.] _bt_checkpage
%%
We have tried to fix this by using BitMapSet instead of boolean array KnownAssignedXidsValid, but this does not help too much.
Instead, using a doubly linked list helps a little more, we got +1000 tps on pgbench workload with patched postgresql. The general idea of this patch is that, instead of memorizing which elements in KnownAssignedXids are valid, lets maintain a doubly linked list of them. This solution will work in exactly the same way, except that taking a snapshot on the replica is now O(running transaction) instead of O(head - tail) which is significantly faster under some workloads. The patch helps to reduce CPU usage of KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate it from perf top.
The problem is better reproduced on PG13 since PG14 has some snapshot optimization.
Thanks!
Best regards, reshke