Home > mailing lists

Re: parallel data loading for pgbench -i - Mailing list pgsql-hackers

From	lakshmi
Subject	Re: parallel data loading for pgbench -i
Date	April 13 09:14:18
Msg-id	CAEvyyTjt1_QXO_37h1hbVqWdONm+uopV74j3K2pS5VrLKmozsw@mail.gmail.com Whole thread
In response to	Re: parallel data loading for pgbench -i (Mircea Cadariu <cadariu.mircea@gmail.com>)
List	pgsql-hackers

Tree view

Hi Mircea, Heikki,

I tested the v3 patch on 19devel with larger scale factors.

The behavior looks much better now compared to the earlier versions. For scale 100 and 500, I see clear improvements in overall runtime, and for scale 2000, the total time is around 97s on my system.

The loading phase now runs concurrently across workers, and I don’t see the earlier serialization behavior anymore.

The VACUUM phase also remains relatively small (~6s for scale 2000), which suggests that the previous overhead has been addressed.

I also verified correctness, and the row counts match the expected values.

Overall, the partitioned parallel approach looks solid and scales well in my tests.

Thanks again for the work on this.

Best regards,
Lakshmi

On Sat, Apr 11, 2026 at 12:07 AM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:

Hi,

On 07/04/2026 10:00, Heikki Linnakangas wrote:
>
> This all makes more sense in the partitioned case. Perhaps we should
> parallelize only when partitioned are used, and use only one thread
> per partition.
>
Thanks for having a look. I attached v3 that parallelizes only the
partitioned case, one thread per partition. Results:

patch:

pgbench -i -s 100 --partitions 10

done in 12.63 s (drop tables 0.05 s, create tables 0.01 s, client-side
generate 5.98 s, vacuum 1.63 s, primary keys 4.96 s).

master:

pgbench -i -s 100 --partitions 10

done in 29.29 s (drop tables 0.00 s, create tables 0.02 s, client-side
generate 16.31 s, vacuum 7.78 s, primary keys 5.18 s).

--
Thanks,
Mircea Cadariu

pgsql-hackers by date:

From: vignesh C
Date: 13 April, 08:53:39
Subject: Re: Support EXCEPT for ALL SEQUENCES publications

From: Alexandre Felipe
Date: 13 April, 09:22:22
Subject: Re: SLOPE - Planner optimizations on monotonic expressions.

Re: parallel data loading for pgbench -i - Mailing list pgsql-hackers

Previous

Next