Home > mailing lists

Re: performance of loading CSV data with COPY is 50 times fasterthan Perl::DBI - Mailing list pgsql-general

From	Steven Lembark
Subject	Re: performance of loading CSV data with COPY is 50 times fasterthan Perl::DBI
Date	February 3, 2020 16:01:04
Msg-id	20200203100104.2a9d3fcd.lembark@wrkhors.com Whole thread Raw
In response to	performance of loading CSV data with COPY is 50 times faster thanPerl::DBI (Matthias Apitz <guru@unixarea.de>)
Responses	Re: performance of loading CSV data with COPY is 50 times faster thanPerl::DBI Re: performance of loading CSV data with COPY is 50 times fasterthan Perl::DBI
List	pgsql-general

Tree view

On Fri, 31 Jan 2020 19:24:41 +0100
Matthias Apitz <guru@unixarea.de> wrote:

> Hello,
> 
> Since ages, we transfer data between different DBS (Informix, Sybase,
> Oracle, and now PostgreSQL) with our own written tool, based on
> Perl::DBI which produces a CSV like export in a common way, i.e. an
> export of Oracle can be loaded into Sybase and vice versa. Export and
> Import is done row by row, for some tables millions of rows.
> 
> We produced a special version of the tool to export the rows into a
> format which understands the PostgreSQL's COPY command and got to know
> that the import into PostgreSQL of the same data with COPY is 50 times
> faster than with Perl::DBI, 2.5 minutes ./. 140 minutes for around 6
> million rows into an empty table without indexes.
> 
> How can COPY do this so fast?

DBI is a wonderful tool, but not intended for bulk transfer. It
is useful for post-processing queries that extract specific
data in ways that SQL cannot readily handle. 

One big slowdown is the cycle of pull-a-row, push-a-row involves 
signiicant latency due to database connections. That limits the 
throughput.

Depending on who wrote the code, they may have extracted the rows
as hashrefs rather than arrays; that can be a 10x slowdown right
there. [I have no idea why so many people are so addicted to storing
rows in hashes, but it is always a significant slowdown; and 
array slices are no more complicated than hash slices!]

Where DBI is really nice is managing the copy: generating a 
useful basename, determining the path, deciding whether to zip
the output, etc. Using metadata to determine which of the tables
to back up and where to put the result, all of the automation 
you'd want to get flexible backups is nice in DBI. Bulk copies, 
probably not.

-- 
Steven Lembark                                        3646 Flora Place
Workhorse Computing                                St. Louis, MO 63110
lembark@wrkhors.com                                    +1 888 359 3508

pgsql-general by date:

From: basti
Date: 03 February 2020, 13:09:21
Subject: Slow Query, many tmpfiles

From: Steven Lembark
Date: 03 February 2020, 16:02:59
Subject: Re: performance of loading CSV data with COPY is 50 times fasterthan Perl::DBI

Re: performance of loading CSV data with COPY is 50 times fasterthan Perl::DBI - Mailing list pgsql-general

Previous

Next