Re: Sample rate added to pg_stat_statements - Mailing list pgsql-hackers
From | Ilia Evdokimov |
---|---|
Subject | Re: Sample rate added to pg_stat_statements |
Date | |
Msg-id | 18631d46-1741-4edc-b116-8d9631cdf919@tantorlabs.com Whole thread Raw |
In response to | Re: Sample rate added to pg_stat_statements (Sami Imseih <samimseih@gmail.com>) |
Responses |
Re: Sample rate added to pg_stat_statements
|
List | pgsql-hackers |
On 28.01.2025 23:50, Ilia Evdokimov wrote: > >> >>> If anyone has the capability to run this benchmark on machines with >>> more >>> CPUs or with different queries, it would be nice. I’d appreciate any >>> suggestions or feedback. >> I wanted to share some additional benchmarks I ran as well >> on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured >> with 16GB of shared_buffers. I also attached the benchmark.sh >> script used to generate the output. >> The benchmark is running the select-only pgbench workload, >> so we have a single heavily contentious entry, which is the >> worst case. >> >> The test shows that the spinlock (SpinDelay waits) >> becomes an issue at high connection counts and will >> become worse on larger machines. A sample_rate going from >> 1 to .75 shows a 60% improvement; but this is on a single >> contentious entry. Most workloads will likely not see this type >> of improvement. I also could not really observe >> this type of difference on smaller machines ( i.e. 32 vCPUs), >> as expected. >> >> ## init >> pgbench -i -s500 >> >> ### 192 connections >> pgbench -c192 -j20 -S -Mprepared -T120 --progress 10 >> >> sample_rate = 1 >> tps = 484338.769799 (without initial connection time) >> waits >> ----- >> 11107 SpinDelay >> 9568 CPU >> 929 ClientRead >> 13 DataFileRead >> 3 BufferMapping >> >> sample_rate = .75 >> tps = 909547.562124 (without initial connection time) >> waits >> ----- >> 12079 CPU >> 4781 SpinDelay >> 2100 ClientRead >> >> sample_rate = .5 >> tps = 1028594.555273 (without initial connection time) >> waits >> ----- >> 13253 CPU >> 3378 ClientRead >> 174 SpinDelay >> >> sample_rate = .25 >> tps = 1019507.126313 (without initial connection time) >> waits >> ----- >> 13397 CPU >> 3423 ClientRead >> >> sample_rate = 0 >> tps = 1015425.288538 (without initial connection time) >> waits >> ----- >> 13106 CPU >> 3502 ClientRead >> >> ### 32 connections >> pgbench -c32 -j20 -S -Mprepared -T120 --progress 10 >> >> sample_rate = 1 >> tps = 620667.049565 (without initial connection time) >> waits >> ----- >> 1782 CPU >> 560 ClientRead >> >> sample_rate = .75 >> tps = 620663.131347 (without initial connection time) >> waits >> ----- >> 1736 CPU >> 554 ClientRead >> >> sample_rate = .5 >> tps = 624094.688239 (without initial connection time) >> waits >> ----- >> 1741 CPU >> 648 ClientRead >> >> sample_rate = .25 >> tps = 628638.538204 (without initial connection time) >> waits >> ----- >> 1702 CPU >> 576 ClientRead >> >> sample_rate = 0 >> tps = 630483.464912 (without initial connection time) >> waits >> ----- >> 1638 CPU >> 574 ClientRead >> >> Regards, >> >> Sami > > > Thank you so much for benchmarking this on a pretty large machine with > a large number of CPUs. The results look fantastic, and I truly > appreciate your effort. > > BWT, I realized that the 'sampling' test needs to be added not only to > the Makefile but also to meson.build. I've included that in the v14 > patch. > > -- > Best regards, > Ilia Evdokimov, > Tantor Labs LLC. In my opinion, if we can't observe bottleneck of spinlock on 32 CPUs, we should determine the CPU count at which it becomes. This will help us understand the scale of the problem. Does this make sense, or are there really no real workloads where the same query runs on more than 32 CPUs, and we've been trying to solve a non-existent problem? -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
pgsql-hackers by date: