Hi Hackers,
This patch addresses a performance issue pointed out by Andres Freund,
Benchmark buffer pinning: You know benchmark code, implemented a few functions that can be use in postgres queries, and a python script that runs them and produces CSV files and SVG plots for the current build.
Refactoring reference counting: Before starting to change code and potentially breaking things I considered prudent to isolate it to limit the damage. This code was part of a 8k+ LOC file.
Using simplehash: Simply replacing the HTAB for a simplehash, and adding a new set of macros SH_ENTRY_EMPTY, SH_MAKE_EMPTY, SH_MAKE_IN_USE. To allow using the InvalidBuffer special value instead of allocating extra space for a validity flag. Here I assume that the buffer buffer sequence is independent enough from the array size, so I use the buffer as the hash key directly, omitting a hash function call.
Compact PrivateRefCountEntry: The original implementation used a 4-byte key and 8-byte value. Reference count uses 32 bits, and it is unreasonable to expect one backend to pin the same buffer 1 billion times. The lock mode uses 32 bits but can only assume 4 values. So I packed them in one single uint32, giving 30 bits for count and 2 bits for lock mode. This makes the entries 8-byte long, on 64-bit CPUs this represents more than a 1/3 reduction in memory. This makes the array aligned with the 64-bit words, copying one entry can be completed in one instruction, and every entry will be aligned.
REFCOUNT_ARRAY_ENTRIES=0: since the simplehash is basically some array lookup, it is worth trying to remove it completely and keep only the hash. For small values we would be trading a few branches for a buffer % SIZE, for the use case of prefetch where pin/unpin in a FIFO fashion, it will save an 8-entry array lookup, and some extra data moves.
In addition to the patch I am including
- A bash script to apply and benchmark the patches sequentially. You might have to adjust REPO_ROOT, in my case it gets it relative to the script path, that is under $REPO_ROOT/.patches/pins/.
- A compare-patches.py script that can be copied to src/test/modules/test_buffer_pin to process the benchmark CSV in figures showing one metric for different patches instead of different metrics for one patch as the benchmark.py produces.
- A nicely formatted post about this [2]
Regards,
Alexandre