Hi Mircea, Heikki,
I tested the v3 patch on 19devel with larger scale factors.
The behavior looks much better now compared to the earlier versions. For scale 100 and 500, I see clear improvements in overall runtime, and for scale 2000, the total time is around 97s on my system.
The loading phase now runs concurrently across workers, and I don’t see the earlier serialization behavior anymore.
The VACUUM phase also remains relatively small (~6s for scale 2000), which suggests that the previous overhead has been addressed.
I also verified correctness, and the row counts match the expected values.
Overall, the partitioned parallel approach looks solid and scales well in my tests.
Thanks again for the work on this.
Best regards,
Lakshmi
Hi,
On 07/04/2026 10:00, Heikki Linnakangas wrote:
>
> This all makes more sense in the partitioned case. Perhaps we should
> parallelize only when partitioned are used, and use only one thread
> per partition.
>
Thanks for having a look. I attached v3 that parallelizes only the
partitioned case, one thread per partition. Results:
patch:
pgbench -i -s 100 --partitions 10
done in 12.63 s (drop tables 0.05 s, create tables 0.01 s, client-side
generate 5.98 s, vacuum 1.63 s, primary keys 4.96 s).
master:
pgbench -i -s 100 --partitions 10
done in 29.29 s (drop tables 0.00 s, create tables 0.02 s, client-side
generate 16.31 s, vacuum 7.78 s, primary keys 5.18 s).
--
Thanks,
Mircea Cadariu