Re: Use COPY for populating all pgbench tables - Mailing list pgsql-hackers

From Tristan Partin
Subject Re: Use COPY for populating all pgbench tables
Date
Msg-id CTBPLZSDD9TA.3UADVJM9NTQPQ@gonk
Whole thread Raw
In response to Re: Use COPY for populating all pgbench tables  ("Tristan Partin" <tristan@neon.tech>)
List pgsql-hackers
On Thu Jun 8, 2023 at 11:38 AM CDT, Tristan Partin wrote:
> On Thu Jun 8, 2023 at 12:33 AM CDT, David Rowley wrote:
> > On Thu, 8 Jun 2023 at 07:16, Tristan Partin <tristan@neon.tech> wrote:
> > >
> > > master:
> > >
> > > 50000000 of 50000000 tuples (100%) done (elapsed 260.93 s, remaining 0.00 s))
> > > vacuuming...
> > > creating primary keys...
> > > done in 1414.26 s (drop tables 0.20 s, create tables 0.82 s, client-side generate 1280.43 s, vacuum 2.55 s,
primarykeys 130.25 s). 
> > >
> > > patchset:
> > >
> > > 50000000 of 50000000 tuples (100%) of pgbench_accounts done (elapsed 243.82 s, remaining 0.00 s))
> > > vacuuming...
> > > creating primary keys...
> > > done in 375.66 s (drop tables 0.14 s, create tables 0.73 s, client-side generate 246.27 s, vacuum 2.77 s, primary
keys125.75 s). 
> >
> > I've also previously found pgbench -i to be slow.  It was a while ago,
> > and IIRC, it was due to the printfPQExpBuffer() being a bottleneck
> > inside pgbench.
> >
> > On seeing your email, it makes me wonder if PG16's hex integer
> > literals might help here.  These should be much faster to generate in
> > pgbench and also parse on the postgres side.
> >
> > I wrote a quick and dirty patch to try that and I'm not really getting
> > the same performance increases as I'd have expected. I also tested
> > with your patch too and it does not look that impressive either when
> > running pgbench on the same machine as postgres.
>
> I didn't expect my patch to increase performance in all workloads. I was
> mainly aiming to fix high-latency connections. Based on your results
> that looks like a 4% reduction in performance of client-side data
> generation. I had thought maybe it is worth having a flag to keep the
> old way too, but I am not sure a 4% hit is really that big of a deal.
>
> > pgbench copy speedup
> >
> > ** master
> > drowley@amd3990x:~$ pgbench -i -s 1000 postgres
> > 100000000 of 100000000 tuples (100%) done (elapsed 74.15 s, remaining 0.00 s)
> > vacuuming...
> > creating primary keys...
> > done in 95.71 s (drop tables 0.00 s, create tables 0.01 s, client-side
> > generate 74.45 s, vacuum 0.12 s, primary keys 21.13 s).
> >
> > ** David's Patched
> > drowley@amd3990x:~$ pgbench -i -s 1000 postgres
> > 100000000 of 100000000 tuples (100%) done (elapsed 69.64 s, remaining 0.00 s)
> > vacuuming...
> > creating primary keys...
> > done in 90.22 s (drop tables 0.00 s, create tables 0.01 s, client-side
> > generate 69.91 s, vacuum 0.12 s, primary keys 20.18 s).
> >
> > ** Tristan's patch
> > drowley@amd3990x:~$ pgbench -i -s 1000 postgres
> > 100000000 of 100000000 tuples (100%) of pgbench_accounts done (elapsed
> > 77.44 s, remaining 0.00 s)
> > vacuuming...
> > creating primary keys...
> > done in 98.64 s (drop tables 0.00 s, create tables 0.01 s, client-side
> > generate 77.47 s, vacuum 0.12 s, primary keys 21.04 s).
> >
> > I'm interested to see what numbers you get.  You'd need to test on
> > PG16 however. I left the old code in place to generate the decimal
> > numbers for versions < 16.
>
> I will try to test this soon and follow up on the thread. I definitely
> see no problems with your patch as is though. I would be more than happy
> to rebase my patches on yours.

Finally got around to doing more benchmarking. Using an EC2 instance
hosted in Ireland, and my client laptop in Austin, Texas.

Workload: pgbench -i -s 500

master (9aee26a491)
done in 1369.41 s (drop tables 0.21 s, create tables 0.72 s, client-side generate 1336.44 s, vacuum 1.02 s, primary
keys31.03 s). 
done in 1318.31 s (drop tables 0.21 s, create tables 0.72 s, client-side generate 1282.67 s, vacuum 1.02 s, primary
keys33.69 s). 

copy
done in 307.42 s (drop tables 0.21 s, create tables 0.82 s, client-side generate 270.95 s, vacuum 1.02 s, primary keys
34.42s). 

david
done in 1311.14 s (drop tables 0.72 s, create tables 0.72 s, client-side generate 1274.98 s, vacuum 0.94 s, primary
keys33.79 s). 
done in 1340.18 s (drop tables 0.14 s, create tables 0.59 s, client-side generate 1304.78 s, vacuum 0.92 s, primary
keys33.75 s). 

copy + david
done in 348.70 s (drop tables 0.23 s, create tables 0.72 s, client-side generate 312.94 s, vacuum 0.92 s, primary keys
33.90s). 

I ran two tests for master and your patch David. For the last test, I
adapted your patch onto mine. I am still seeing the huge performance
gains on my branch.

--
Tristan Partin
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences, take 2
Next
From: Morris de Oryx
Date:
Subject: Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function