Robins Tharakan <tharakan@gmail.com> writes: > Applying all 4 patches, I also see good performance improvement. > With more Large Objects, although pg_dump improved significantly, > pg_restore is now comfortably an order of magnitude faster.
Yeah. The key thing here is that pg_dump can only parallelize the data transfer, while (with 0004) pg_restore can parallelize large object creation and owner-setting as well as data transfer. I don't see any simple way to improve that on the dump side, but I'm not sure we need to. Zillions of empty objects is not really the use case to worry about. I suspect that a more realistic case with moderate amounts of data in the blobs would make pg_dump look better.
Thanks for elaborating, and yes pg_dump times do reflect that
expectation.
The first test involved a fixed number (32k) of Large Objects (LOs) with varying sizes - I chose that number
intentionally since this was beingtested on a 32vCPU instance
and the patch employs 1k batches.
We again see that pg_restore is an order of magnitude faster.
To test pg_restore scaling, 1 Million LOs (100kb each) were created and pg_restore times tested for increasing concurrency (on a 192vCPU instance). We see major speedup upto -j64 and the best time was at -j96, after which performance decreases slowly - see attached image.
Test details: - Command used to generate SQL - create 1k LOs of 1kb each - echo "SELECT lo_from_bytea(0, '\x` printf 'ff%.0s' {1..1000}`') FROM generate_series(1,1000);" > /tmp/tempdel - Verify the LO size: select pg_column_size(lo_get(oid)); - Only GUC changed: max_connections=1000 (for the last test)