Re: COPY performance on Windows - Mailing list pgsql-hackers

From Vladlen Popolitov
Subject Re: COPY performance on Windows
Date
Msg-id c0a91623b39cd57dc8c3c0e20180ff54@postgrespro.ru
Whole thread Raw
In response to RE: COPY performance on Windows  ("Ryohei Takahashi (Fujitsu)" <r.takahashi_2@fujitsu.com>)
Responses RE: COPY performance on Windows
List pgsql-hackers
Ryohei Takahashi (Fujitsu) писал(а) 2024-12-16 15:10:
Hi
> Please use the "test.sh" in the following e-mail.
>
https://www.postgresql.org/message-id/flat/TY3PR01MB11891C0FD066F069B113A2376823E2%40TY3PR01MB11891.jpnprd01.prod.outlook.com#8455c9f7b66780a356511f5cfe029d57
I cannot reproduce your results. In all of my runs final result depends 
on run order -
benchmark for first versin get higher time, than time is smaller,
f.e. my last run (in start time order, time is in seconds):
PG164: nclients = 1, time  = 251
PG164: nclients = 2, time  = 210
PG164: nclients = 4, time  = 126
PG164: nclients = 8, time  = 107
PG164: nclients = 16, time  = 99
PG164: nclients = 32, time  = 109
PG164: nclients = 64, time  = 112
PG164: nclients = 128, time  = 113
PG164: nclients = 256, time  = 120
PG166: nclients = 1, time  = 244
PG166: nclients = 2, time  = 222
PG166: nclients = 4, time  = 131
PG166: nclients = 8, time  = 109
PG166: nclients = 16, time  = 101
PG166: nclients = 32, time  = 110
PG166: nclients = 64, time  = 115
PG166: nclients = 128, time  = 116
PG166: nclients = 256, time  = 123
PG170: nclients = 1, time  = 240
PG170: nclients = 2, time  = 213
PG170: nclients = 4, time  = 129
PG170: nclients = 8, time  = 110
PG170: nclients = 16, time  = 101
PG170: nclients = 32, time  = 112
PG170: nclients = 64, time  = 115
PG170: nclients = 128, time  = 116
PG170: nclients = 256, time  = 122


I slightly modified your script:
1) exclude creation of input files to the separate step to decrease 
influence of system disk cache.
2) run PostgreSQL servers on separate PC (Windows 10, 11th Gen Intel(R) 
Core(TM) i5-1135G7 @ 2.40GHz , RAM 16GB),
clients on separate PC
3) I added CHECKPOINT in the end of every COPY FROM to flush wal.
4) I used EDB build for Windows from their site. Unfortunatelly, they 
distribute
files without debug symbols like other distributions, it does not help 
during profiling.
5) I think, that better to decrease shared_buffers as small as possible 
to measure all IO time,
but I used 25% of RAM.


My observations
1) for 1-2 clients read time decreases every run (independent on 
Postgres version) -
looks like Windows disk cache (I think, HTFS system information like 
btree of file locations,
not the input file itself) - it contradicts to your main point, that 
17.0 version is slower.

2) 1-client - Postgres backend takes only 12% of CPU, the rest time it 
waits kernel operations.

3) 16-256 clients - I have not made any analysis of multiprocessor 
effect to time increase:
OS process implementation, waiting on PostgreSQL locks or spinlocks, 
parallel access to one
input file or other factors.

Could you confirm, that you receive you results on all execution orders 
(17.0 first and 17.0 last)?


-- 
Best regards,

Vladlen Popolitov.



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: SIGSEGV in GrantLockLocal()
Next
From: John Naylor
Date:
Subject: Re: Shave a few cycles off our ilog10 implementation