Re: Faster data load - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Faster data load |
Date | |
Msg-id | 1a34eef6-9b30-46fa-bcc7-d7dc1c0602bb@aklaver.com Whole thread Raw |
In response to | Faster data load (Lok P <loknath.73@gmail.com>) |
Responses |
Re: Faster data load
|
List | pgsql-general |
On 9/8/24 10:45, Peter J. Holzer wrote: > On 2024-09-06 01:44:00 +0530, Lok P wrote: >> We are having a requirement to create approx 50 billion rows in a partition >> table(~1 billion rows per partition, 200+gb size daily partitions) for a >> performance test. We are currently using ' insert into <target table_partition> >> select.. From <source_table_partition> or <some transformed query>;' method . >> We have dropped all indexes and constraints First and then doing the load. >> Still it's taking 2-3 hours to populate one partition. > > That seems quite slow. Is the table very wide or does it have a large > number of indexes? > >> Is there a faster way to achieve this? >> >> Few teammate suggesting to use copy command and use file load instead, which >> will be faster. > > I doubt that. > > I benchmarked several strategies for populating tables 5 years ago and > (for my test data and on our hardware at the time - YMMV) s simple > INSERT ... SELECT was more than twice as fast as 8 parallel COPY > operations (and about 8 times as fast as a single COPY). > > Details will have changed since then (I should rerun that benchmark on > a current system), but I'd be surprised if COPY became that much faster > relative to INSERT ... SELECT. Yeah they seem to have changed a great deal. Though you are correct in saying COPY is not faster then INSERT..SELECT select version(); version ----------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit select count(*) from nyc_taxi_duckdb ; count --------- 2846722 ll -h nyc_taxi.csv -rw-rw-r-- 1 aklaver aklaver 252M Sep 8 10:54 nyc_taxi.csv insert into nyc_duplicate select * from nyc_taxi_duckdb ; INSERT 0 2846722 Time: 7015.072 ms (00:07.015) truncate nyc_duplicate ; \copy nyc_duplicate from 'nyc_taxi.csv' with csv header COPY 2846722 Time: 8760.197 ms (00:08.760) copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header; COPY 2846722 Time: 7904.279 ms (00:07.904) Just to see what the coming attraction offers: select version(); version -------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 17rc1 (Ubuntu 17~rc1-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit insert into nyc_duplicate select * from nyc_taxi_duckdb ; INSERT 0 2846722 Time: 5315.878 ms (00:05.316) \copy nyc_duplicate from 'nyc_taxi.csv' with csv header COPY 2846722 Time: 10042.129 ms (00:10.042) copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header; COPY 2846722 Time: 8422.503 ms (00:08.423) > > hp > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: