Home > mailing lists

Re: Use COPY for populating all pgbench tables - Mailing list pgsql-hackers

From	Tristan Partin
Subject	Re: Use COPY for populating all pgbench tables
Date	June 8, 2023 16:38:01
Msg-id	CT7F011D8VXQ.A8O613ZKYS8I@gonk Whole thread Raw
In response to	Re: Use COPY for populating all pgbench tables (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Use COPY for populating all pgbench tables
List	pgsql-hackers

Tree view

On Thu Jun 8, 2023 at 12:33 AM CDT, David Rowley wrote:
> On Thu, 8 Jun 2023 at 07:16, Tristan Partin <tristan@neon.tech> wrote:
> >
> > master:
> >
> > 50000000 of 50000000 tuples (100%) done (elapsed 260.93 s, remaining 0.00 s))
> > vacuuming...
> > creating primary keys...
> > done in 1414.26 s (drop tables 0.20 s, create tables 0.82 s, client-side generate 1280.43 s, vacuum 2.55 s, primary
keys130.25 s). 
> >
> > patchset:
> >
> > 50000000 of 50000000 tuples (100%) of pgbench_accounts done (elapsed 243.82 s, remaining 0.00 s))
> > vacuuming...
> > creating primary keys...
> > done in 375.66 s (drop tables 0.14 s, create tables 0.73 s, client-side generate 246.27 s, vacuum 2.77 s, primary
keys125.75 s). 
>
> I've also previously found pgbench -i to be slow.  It was a while ago,
> and IIRC, it was due to the printfPQExpBuffer() being a bottleneck
> inside pgbench.
>
> On seeing your email, it makes me wonder if PG16's hex integer
> literals might help here.  These should be much faster to generate in
> pgbench and also parse on the postgres side.

Do you have a link to some docs? I have not heard of the feature.
Definitely feels like a worthy cause.

> I wrote a quick and dirty patch to try that and I'm not really getting
> the same performance increases as I'd have expected. I also tested
> with your patch too and it does not look that impressive either when
> running pgbench on the same machine as postgres.

I didn't expect my patch to increase performance in all workloads. I was
mainly aiming to fix high-latency connections. Based on your results
that looks like a 4% reduction in performance of client-side data
generation. I had thought maybe it is worth having a flag to keep the
old way too, but I am not sure a 4% hit is really that big of a deal.

> pgbench copy speedup
>
> ** master
> drowley@amd3990x:~$ pgbench -i -s 1000 postgres
> 100000000 of 100000000 tuples (100%) done (elapsed 74.15 s, remaining 0.00 s)
> vacuuming...
> creating primary keys...
> done in 95.71 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 74.45 s, vacuum 0.12 s, primary keys 21.13 s).
>
> ** David's Patched
> drowley@amd3990x:~$ pgbench -i -s 1000 postgres
> 100000000 of 100000000 tuples (100%) done (elapsed 69.64 s, remaining 0.00 s)
> vacuuming...
> creating primary keys...
> done in 90.22 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 69.91 s, vacuum 0.12 s, primary keys 20.18 s).
>
> ** Tristan's patch
> drowley@amd3990x:~$ pgbench -i -s 1000 postgres
> 100000000 of 100000000 tuples (100%) of pgbench_accounts done (elapsed
> 77.44 s, remaining 0.00 s)
> vacuuming...
> creating primary keys...
> done in 98.64 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 77.47 s, vacuum 0.12 s, primary keys 21.04 s).
>
> I'm interested to see what numbers you get.  You'd need to test on
> PG16 however. I left the old code in place to generate the decimal
> numbers for versions < 16.

I will try to test this soon and follow up on the thread. I definitely
see no problems with your patch as is though. I would be more than happy
to rebase my patches on yours.

--
Tristan Partin
Neon (https://neon.tech)

pgsql-hackers by date:

From: Greg Sabino Mullane
Date: 08 June 2023, 16:05:21
Subject: Re: Let's make PostgreSQL multi-threaded

From: Tom Lane
Date: 08 June 2023, 16:58:08
Subject: Re: Making Vars outer-join aware

Re: Use COPY for populating all pgbench tables - Mailing list pgsql-hackers

Previous

Next