Re: parallel data loading for pgbench -i - Mailing list pgsql-hackers
| From | Heikki Linnakangas |
|---|---|
| Subject | Re: parallel data loading for pgbench -i |
| Date | |
| Msg-id | eb80b601-7600-4fa1-8b99-550d40c1745b@iki.fi Whole thread Raw |
| In response to | Re: parallel data loading for pgbench -i (lakshmi <lakshmigcdac@gmail.com>) |
| List | pgsql-hackers |
On 18/03/2026 12:37, lakshmi wrote: > So overall, the benefit of parallel loading is much clearer in the > partitioned case. > > I’ll try to look further into the VACUUM behavior. As discussed already, the slower VACUUM is surely because of the lack of COPY FREEZE option. That's unfortunate... The way this patch uses the connections and workers is a little bonkers. The main thread uses the first connection to execute: begin; TRUNCATE TABLE pgbench_accounts; That connection is handed over to the first worker thread, and new connections are opened for the other workers. But thanks to the TRUNCATE, the open transaction on the first connection holds an AccessExclusiveLock, preventing the other workers from starting the COPY until the first worker has finished! I added some debugging prints to show this: $ pgbench -s500 -i -j10 postgres dropping old tables... creating tables... generating data (client-side)... loading pgbench_accounts with 10 threads... 0.00: thread 0: sending COPY command, use_freeze: 1 0.00: thread 1: sending COPY command, use_freeze: 0 0.00: thread 2: sending COPY command, use_freeze: 0 0.00: thread 0: COPY started for rows between 0 and 5000000 0.00: thread 6: sending COPY command, use_freeze: 0 0.00: thread 3: sending COPY command, use_freeze: 0 0.00: thread 9: sending COPY command, use_freeze: 0 0.00: thread 4: sending COPY command, use_freeze: 0 0.00: thread 5: sending COPY command, use_freeze: 0 0.00: thread 7: sending COPY command, use_freeze: 0 0.00: thread 8: sending COPY command, use_freeze: 0 6.19: thread 0: COPY done! 6.27: thread 9: COPY started for rows between 45000000 and 50000000 6.27: thread 1: COPY started for rows between 5000000 and 10000000 6.27: thread 5: COPY started for rows between 25000000 and 30000000 6.27: thread 2: COPY started for rows between 10000000 and 15000000 6.27: thread 6: COPY started for rows between 30000000 and 35000000 6.27: thread 3: COPY started for rows between 15000000 and 20000000 6.27: thread 8: COPY started for rows between 40000000 and 45000000 6.27: thread 4: COPY started for rows between 20000000 and 25000000 6.27: thread 7: COPY started for rows between 35000000 and 40000000 19.19: thread 1: COPY done! 19.21: thread 9: COPY done! 19.26: thread 6: COPY done! 19.27: thread 7: COPY done! 19.28: thread 3: COPY done! 19.28: thread 5: COPY done! 19.28: thread 4: COPY done! 19.29: thread 8: COPY done! 19.36: thread 2: COPY done! vacuuming... creating primary keys... done in 71.58 s (drop tables 0.07 s, create tables 0.01 s, client-side generate 19.41 s, vacuum 26.50 s, primary keys 25.59 s). The straightforward fix is to commit the TRUNCATE transaction, and not use FREEZE on any of the COPY commands. This all makes more sense in the partitioned case. Perhaps we should parallelize only when partitioned are used, and use only one thread per partition. - Heikki
pgsql-hackers by date: