On Sat, Dec 10, 2022 at 11:02 AM David Rowley <
dgrowleyml@gmail.com> wrote:
> [v4]
Thanks for working on this!
I ran an in-situ benchmark using the v13 radix tree patchset ([1] WIP but should be useful enough for testing allocation speed), only applying the first five, which are local-memory only. The benchmark is not meant to represent a realistic workload, and primarily stresses traversal and allocation of the smallest node type. Minimum of five, with turbo-boost off, on recent Intel laptop hardware:
v13-0001 to 0005:
# select * from bench_load_random_int(500 * 1000);
mem_allocated | load_ms
---------------+---------
151123432 | 222
47.06% postgres postgres [.] rt_set
22.89% postgres postgres [.] SlabAlloc
9.65% postgres postgres [.] rt_node_insert_inner.isra.0
5.94% postgres [unknown] [k] 0xffffffffb5e011b7
3.62% postgres postgres [.] MemoryContextAlloc
2.70% postgres libc.so.6 [.] __memmove_avx_unaligned_erms
2.60% postgres postgres [.] SlabFree
+ v4 slab:
# select * from bench_load_random_int(500 * 1000);
mem_allocated | load_ms
---------------+---------
152463112 | 213
52.42% postgres postgres [.] rt_set
12.80% postgres postgres [.] SlabAlloc
9.38% postgres postgres [.] rt_node_insert_inner.isra.0
7.87% postgres [unknown] [k] 0xffffffffb5e011b7
4.98% postgres postgres [.] SlabFree
While allocation is markedly improved, freeing looks worse here. The proportion is surprising because only about 2% of nodes are freed during the load, but doing that takes up 10-40% of the time compared to allocating.
num_keys = 500000, height = 7
n4 = 2501016, n15 = 56932, n32 = 270, n125 = 0, n256 = 257
Sidenote: I don't recall ever seeing vsyscall (I think that's what the 0xffffffffb5e011b7 address is referring to) in a profile, so not sure what is happening there.
[1]
https://www.postgresql.org/message-id/CAFBsxsHNE621mGuPhd7kxaGc22vMkoSu7R4JW9Zan1jjorGy3g%40mail.gmail.com--
John Naylor
EDB:
http://www.enterprisedb.com