Re: parallel data loading for pgbench -i - Mailing list pgsql-hackers

From lakshmi
Subject Re: parallel data loading for pgbench -i
Date
Msg-id CAEvyyThGm4NHDnfCGeCCOZ1_nrB=Eqct6y55GGuW0_UpTAsu3g@mail.gmail.com
Whole thread
In response to RE: parallel data loading for pgbench -i  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
Hi Hayato,

Thanks for your feedback.

I tried a few runs with different partition counts. From what I saw, performance doesn’t always improve with more partitions—in fact, higher partition counts increase VACUUM time and slow things down.

I also agree that having control over the number of workers (like using -j) would help balance this better.

Regarding TRUNCATE, I noticed it’s already done earlier, so it might be worth checking if the extra TRUNCATE is needed.

I didn’t see memory issues in my tests, but I understand it could become a concern with many partitions.

Thanks again for the suggestions.

Best regards,  
Lakshmi

On Mon, Apr 13, 2026 at 12:53 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Mircea,

Thanks for updating the patch. Now each worker looks like not to create each
child tables, just run TRUNCATE and COPY. But I'm unclear why the TRUNCATE is
needed here. Isn't they truncated in initGenerateDataClientSide()->initTruncateTables()
before launching threads?
Also, the current API is questionable. E.g., we cannot work in series if --partition is
specified. And I'm afraid OOM failure may be more likely to happen if the table has
many partitions.
Is it possible that we can have -p again for the initialization? We can require
partitions >= nthreads or partitions % nthreads == 0 at that time.


Best regards,
Hayato Kuroda
FUJITSU LIMITED

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Add missing period to HINT messages
Next
From: David Steele
Date:
Subject: Re: Heads Up: cirrus-ci is shutting down June 1st