Thread: performance of copy_from() vs. raw COPY command

performance of copy_from() vs. raw COPY command

From

Daniel Lenski

Date:

24 June 2011, 18:39:43

Hi,
I'm using Postgre's COPY feature to populate a large table (~500M
rows, 100 GB) with frequent updates from CSV text files.  Currently
I'm using psycopg's copy_expert():

     cur.copy_expert("COPY Table (c1, c2, ...) FROM STDIN WITH CSV",
open(filename))

This avoids any issue with client vs. server file permissions, but I'm
wondering if it carries a significant performance penalty by reading
the text file at the Python level rather than the OS level.  Instead I
could do:

     cur.execute("COPY Table (c1, c2, ...) FROM $$%s$$ WITH CSV" % fn)

Of course, this opens up a whole host of permissions and security
issues.  Is there any reason to believe that the Python file interface
significantly slows down COPY FROM?

Thanks,
Dan

Re: performance of copy_from() vs. raw COPY command

From

Federico Di Gregorio

Date:

25 June 2011, 16:54:34

On 24/06/11 20:38, Daniel Lenski wrote:
> I'm using Postgre's COPY feature to populate a large table (~500M
> rows, 100 GB) with frequent updates from CSV text files.  Currently
> I'm using psycopg's copy_expert():
>
>      cur.copy_expert("COPY Table (c1, c2, ...) FROM STDIN WITH CSV",
> open(filename))
>
> This avoids any issue with client vs. server file permissions, but I'm
> wondering if it carries a significant performance penalty by reading
> the text file at the Python level rather than the OS level.  Instead I
> could do:
>
>      cur.execute("COPY Table (c1, c2, ...) FROM $$%s$$ WITH CSV" % fn)
>
> Of course, this opens up a whole host of permissions and security
> issues.  Is there any reason to believe that the Python file interface
> significantly slows down COPY FROM?

Using copy_expert() is inherently slower than reading file on the server
because you read it into Python's memory, transfer it to the backend
going through libpq and finally parsing it on the server. Python surely
has some kind of overhead but I don't think is much. Storing the file on
the server and reading it from there is surely faster.

Hope this helps,
federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Studio Associato Di Nunzio e Di Gregorio                  http://dndg.it
  Those who do not study Lisp are doomed to reimplement it. Poorly.
                                     -- from Karl M. Hegbloom .signature