Hi,
On 2022-10-28 19:54:20 -0700, Andres Freund wrote:
> I've done a fair bit of benchmarking of this patchset. For COPY it comes out
> ahead everywhere. It's possible that there's a very small regression for
> extremly IO miss heavy workloads, more below.
>
>
> server "base" configuration:
>
> max_wal_size=150GB
> shared_buffers=24GB
> huge_pages=on
> autovacuum=0
> backend_flush_after=2MB
> max_connections=5000
> wal_buffers=128MB
> wal_segment_size=1GB
>
> benchmark: pgbench running COPY into a single table. pgbench -t is set
> according to the client count, so that the same amount of data is inserted.
> This is done oth using small files ([1], ringbuffer not effective, no dirty
> data to write out within the benchmark window) and a bit larger files ([2],
> lots of data to write out due to ringbuffer).
>
> To make it a fair comparison HEAD includes the lwlock-waitqueue fix as well.
>
> s_b=24GB
>
> test: unlogged_small_files, format: text, files: 1024, 9015MB total
> seconds tbl-MBs seconds tbl-MBs seconds tbl-MBs
> clients HEAD HEAD patch patch no_fsm no_fsm
> 1 58.63 207 50.22 242 54.35 224
> 2 32.67 372 25.82 472 27.30 446
> 4 22.53 540 13.30 916 14.33 851
> 8 15.14 804 7.43 1640 7.48 1632
> 16 14.69 829 4.79 2544 4.50 2718
> 32 15.28 797 4.41 2763 3.32 3710
> 64 15.34 794 5.22 2334 3.06 4061
> 128 15.49 786 4.97 2452 3.13 3926
> 256 15.85 768 5.02 2427 3.26 3769
> 512 16.02 760 5.29 2303 3.54 3471
I just spent a few hours trying to reproduce these benchmark results. For the
longest time I could not get the numbers for *HEAD* to even get close to the
above, while the numbers for the patch were very close.
I was worried it was a performance regression in HEAD etc. But no, same git
commit as back then produces the same issue.
As it turns out, I somehow screwed up my benchmark tooling, and I did not set
set the CPU "scaling_governor" and "energy_performance_preference" to
"performance". In a crazy turn of events, that approximately makes no
difference with the patch applied, and a 2x difference for HEAD.
I suspect this is some pathological issue when encountering heavy lock
contention (likely leading to the CPU reducing speed into a deeper state,
which then takes longer to get out of when the lock is released). As the lock
contention is drastically reduced with the patch, that affect is not visible
anymore.
After fixing the performance scaling issue, the results are quite close to the
above numbers again...
Aargh, I want my afternoon back.
Greetings,
Andres Freund