Re: parallel data loading for pgbench -i - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: parallel data loading for pgbench -i
Date
Msg-id eb80b601-7600-4fa1-8b99-550d40c1745b@iki.fi
Whole thread Raw
In response to Re: parallel data loading for pgbench -i  (lakshmi <lakshmigcdac@gmail.com>)
List pgsql-hackers
On 18/03/2026 12:37, lakshmi wrote:
> So overall, the benefit of parallel loading is much clearer in the 
> partitioned case.
> 
> I’ll try to look further into the VACUUM behavior.

As discussed already, the slower VACUUM is surely because of the lack of 
COPY FREEZE option. That's unfortunate...

The way this patch uses the connections and workers is a little bonkers. 
The main thread uses the first connection to execute:

begin; TRUNCATE TABLE pgbench_accounts;

That connection is handed over to the first worker thread, and new 
connections are opened for the other workers. But thanks to the 
TRUNCATE, the open transaction on the first connection holds an 
AccessExclusiveLock, preventing the other workers from starting the COPY 
until the first worker has finished! I added some debugging prints to 
show this:

$ pgbench -s500 -i -j10 postgres
dropping old tables...
creating tables...
generating data (client-side)...
loading pgbench_accounts with 10 threads...
0.00: thread 0: sending COPY command, use_freeze: 1
0.00: thread 1: sending COPY command, use_freeze: 0
0.00: thread 2: sending COPY command, use_freeze: 0
0.00: thread 0: COPY started for rows between 0 and 5000000
0.00: thread 6: sending COPY command, use_freeze: 0
0.00: thread 3: sending COPY command, use_freeze: 0
0.00: thread 9: sending COPY command, use_freeze: 0
0.00: thread 4: sending COPY command, use_freeze: 0
0.00: thread 5: sending COPY command, use_freeze: 0
0.00: thread 7: sending COPY command, use_freeze: 0
0.00: thread 8: sending COPY command, use_freeze: 0
6.19: thread 0: COPY done!
6.27: thread 9: COPY started for rows between 45000000 and 50000000
6.27: thread 1: COPY started for rows between 5000000 and 10000000
6.27: thread 5: COPY started for rows between 25000000 and 30000000
6.27: thread 2: COPY started for rows between 10000000 and 15000000
6.27: thread 6: COPY started for rows between 30000000 and 35000000
6.27: thread 3: COPY started for rows between 15000000 and 20000000
6.27: thread 8: COPY started for rows between 40000000 and 45000000
6.27: thread 4: COPY started for rows between 20000000 and 25000000
6.27: thread 7: COPY started for rows between 35000000 and 40000000
19.19: thread 1: COPY done!
19.21: thread 9: COPY done!
19.26: thread 6: COPY done!
19.27: thread 7: COPY done!
19.28: thread 3: COPY done!
19.28: thread 5: COPY done!
19.28: thread 4: COPY done!
19.29: thread 8: COPY done!
19.36: thread 2: COPY done!
vacuuming...
creating primary keys...
done in 71.58 s (drop tables 0.07 s, create tables 0.01 s, client-side 
generate 19.41 s, vacuum 26.50 s, primary keys 25.59 s).

The straightforward fix is to commit the TRUNCATE transaction, and not 
use FREEZE on any of the COPY commands.

This all makes more sense in the partitioned case. Perhaps we should 
parallelize only when partitioned are used, and use only one thread per 
partition.

- Heikki




pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Alberto Piai
Date:
Subject: Re: Adding a stored generated column without long-lived locks