Slow standby snapshot - Mailing list pgsql-hackers

From Кирилл Решке
Subject Slow standby snapshot
Date
Msg-id CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
Whole thread Raw
Responses Re: Slow standby snapshot
List pgsql-hackers
Hi,
I recently ran into a problem in one of our production postgresql cluster. I had noticed lock contention on procarray lock on standby, which causes WAL replay lag growth.
To reproduce this, you can do the following:

1) set max_connections to big number, like 100000
2) begin a transaction on primary
3) start pgbench workload on primary and on standby

After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in perf top consuming abount 75 % of CPU.

%%
  PerfTop:    1060 irqs/sec  kernel: 0.0%  exact:  0.0% [4000Hz cycles:u],  (target_pid: 273361)
-------------------------------------------------------------------------------

    73.92%  postgres       [.] KnownAssignedXidsGetAndSetXmin
     1.40%  postgres       [.] base_yyparse
     0.96%  postgres       [.] LWLockAttemptLock
     0.84%  postgres       [.] hash_search_with_hash_value
     0.84%  postgres       [.] AtEOXact_GUC
     0.72%  postgres       [.] ResetAllOptions
     0.70%  postgres       [.] AllocSetAlloc
     0.60%  postgres       [.] _bt_compare
     0.55%  postgres       [.] core_yylex
     0.42%  libc-2.27.so   [.] __strlen_avx2
     0.23%  postgres       [.] LWLockRelease
     0.19%  postgres       [.] MemoryContextAllocZeroAligned
     0.18%  postgres       [.] expression_tree_walker.part.3
     0.18%  libc-2.27.so   [.] __memmove_avx_unaligned_erms
     0.17%  postgres       [.] PostgresMain
     0.17%  postgres       [.] palloc
     0.17%  libc-2.27.so   [.] _int_malloc
     0.17%  postgres       [.] set_config_option
     0.17%  postgres       [.] ScanKeywordLookup
     0.16%  postgres       [.] _bt_checkpage

%%


We have tried to fix this by using BitMapSet instead of boolean array KnownAssignedXidsValid, but this does not help too much.

Instead, using a doubly linked list helps a little more, we got +1000 tps on pgbench workload with patched postgresql. The general idea of this patch is that, instead of memorizing which elements in KnownAssignedXids are valid, lets maintain a doubly linked list of them. This  solution will work in exactly the same way, except that taking a snapshot on the replica is now O(running transaction) instead of O(head - tail) which is significantly faster under some workloads. The patch helps to reduce CPU usage of KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate it from perf top.

The problem is better reproduced on PG13 since PG14 has some snapshot optimization.

Thanks!

Best regards, reshke

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Skip partition tuple routing with constant partition key
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Forget close an open relation in ReorderBufferProcessTXN()