Re: Use COPY for populating all pgbench tables - Mailing list pgsql-hackers
From | Tristan Partin |
---|---|
Subject | Re: Use COPY for populating all pgbench tables |
Date | |
Msg-id | CT7F011D8VXQ.A8O613ZKYS8I@gonk Whole thread Raw |
In response to | Re: Use COPY for populating all pgbench tables (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use COPY for populating all pgbench tables
|
List | pgsql-hackers |
On Thu Jun 8, 2023 at 12:33 AM CDT, David Rowley wrote: > On Thu, 8 Jun 2023 at 07:16, Tristan Partin <tristan@neon.tech> wrote: > > > > master: > > > > 50000000 of 50000000 tuples (100%) done (elapsed 260.93 s, remaining 0.00 s)) > > vacuuming... > > creating primary keys... > > done in 1414.26 s (drop tables 0.20 s, create tables 0.82 s, client-side generate 1280.43 s, vacuum 2.55 s, primary keys130.25 s). > > > > patchset: > > > > 50000000 of 50000000 tuples (100%) of pgbench_accounts done (elapsed 243.82 s, remaining 0.00 s)) > > vacuuming... > > creating primary keys... > > done in 375.66 s (drop tables 0.14 s, create tables 0.73 s, client-side generate 246.27 s, vacuum 2.77 s, primary keys125.75 s). > > I've also previously found pgbench -i to be slow. It was a while ago, > and IIRC, it was due to the printfPQExpBuffer() being a bottleneck > inside pgbench. > > On seeing your email, it makes me wonder if PG16's hex integer > literals might help here. These should be much faster to generate in > pgbench and also parse on the postgres side. Do you have a link to some docs? I have not heard of the feature. Definitely feels like a worthy cause. > I wrote a quick and dirty patch to try that and I'm not really getting > the same performance increases as I'd have expected. I also tested > with your patch too and it does not look that impressive either when > running pgbench on the same machine as postgres. I didn't expect my patch to increase performance in all workloads. I was mainly aiming to fix high-latency connections. Based on your results that looks like a 4% reduction in performance of client-side data generation. I had thought maybe it is worth having a flag to keep the old way too, but I am not sure a 4% hit is really that big of a deal. > pgbench copy speedup > > ** master > drowley@amd3990x:~$ pgbench -i -s 1000 postgres > 100000000 of 100000000 tuples (100%) done (elapsed 74.15 s, remaining 0.00 s) > vacuuming... > creating primary keys... > done in 95.71 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 74.45 s, vacuum 0.12 s, primary keys 21.13 s). > > ** David's Patched > drowley@amd3990x:~$ pgbench -i -s 1000 postgres > 100000000 of 100000000 tuples (100%) done (elapsed 69.64 s, remaining 0.00 s) > vacuuming... > creating primary keys... > done in 90.22 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 69.91 s, vacuum 0.12 s, primary keys 20.18 s). > > ** Tristan's patch > drowley@amd3990x:~$ pgbench -i -s 1000 postgres > 100000000 of 100000000 tuples (100%) of pgbench_accounts done (elapsed > 77.44 s, remaining 0.00 s) > vacuuming... > creating primary keys... > done in 98.64 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 77.47 s, vacuum 0.12 s, primary keys 21.04 s). > > I'm interested to see what numbers you get. You'd need to test on > PG16 however. I left the old code in place to generate the decimal > numbers for versions < 16. I will try to test this soon and follow up on the thread. I definitely see no problems with your patch as is though. I would be more than happy to rebase my patches on yours. -- Tristan Partin Neon (https://neon.tech)
pgsql-hackers by date: