Hi,
During a short 100-second pg_create_logical_replication_slot benchmark
in standby, I compared HEAD with patch v7. v7 removes the
XactLockTableWait polling hot-spot (it no longer shows up in the flame
graph), yet the overall perf-counter numbers are only modestly lower,
suggesting something abnormal .
HEAD
cycles: 2,930,606,156
instructions: 1,003,179,713 (0.34 IPC)
cache-misses: 144,808,110
context-switches: 77,278
elapsed: 100 s
v7
cycles: 2,121,614,632
instructions: 802,200,231 (0.38 IPC)
cache-misses: 100,615,485
context-switches: 78,120
elapsed: 100 s
Profiling shows a second hot-spot in read_local_xlog_page_guts(),
which still relies on a check/sleep loop.
There’s also a todo suggesting further improvements:
/*
* Loop waiting for xlog to be available if necessary
*
* TODO: The walsender has its own version of this function, which uses a
* condition variable to wake up whenever WAL is flushed. We could use the
* same infrastructure here, instead of the check/sleep/repeat style of
* loop.
*/
To test the idea, I implemented an experimental patch. With both v7
and this change applied, the polling disappears from the flame graph
and the counters drop by roughly orders of magnitude:
v7 + CV in read_local_xlog_page_guts
cycles: 6,284,633
instructions: 3,990,034 (0.63 IPC)
cache-misses: 163,394
context-switches: 6
elapsed: 100 s
I plan to post a new patch to fix this as well after further
refinements and tests.
Best,
Xuneng