Re: Benchmark Data requested - Mailing list pgsql-performance

From Dimitri Fontaine
Subject Re: Benchmark Data requested
Date
Msg-id 200802061129.44482.dfontaine@hi-media.com
Whole thread Raw
In response to Re: Benchmark Data requested  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Benchmark Data requested
List pgsql-performance
Le mercredi 06 février 2008, Greg Smith a écrit :
> pgloader is a great tool for a lot of things, particularly if there's any
> chance that some of your rows will get rejected.  But the way things pass
> through the Python/psycopg layer made it uncompetative (more than 50%
> slowdown) against the straight COPY path from a rows/second perspective
> the last time (V2.1.0?)

I've yet to add in the psycopg wrapper Marko wrote for skytools: at the moment
I'm using the psycopg1 interface even when psycopg2 is used, and it seems the
new version has some great performance improvements. I just didn't bother
until now thinking this wouldn't affect COPY.

> I did what I thought was a fair test of it (usual
> caveat of "with the type of data I was loading").  Maybe there's been some
> gigantic improvement since then, but it's hard to beat COPY when you've
> got an API layer or two in the middle.

Did you compare to COPY or \copy? I'd expect psycopg COPY api not to be that
more costly than psql one, after all.
Where pgloader is really left behind (in term of tuples inserted per second)
compared to COPY is when it has to jiggle a lot with the data, I'd say
(reformat, reorder, add constants, etc). But I've tried to design it so that
when not configured to arrange (massage?) the data, the code path is the
simplest possible.

Do you want to test pgloader again with Marko psycopgwrapper code to see if
this helps? If yes I'll arrange to push it to CVS ASAP.

Maybe at the end of this PostgreSQL backend code will be smarter than pgloader
(wrt error handling and data massaging) and we'll be able to drop the
project, but in the meantime I'll try my best to have pgloader as fast as
possible :)
--
dim

Attachment

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Optimizer : query rewrite and execution plan ?
Next
From: Dimitri Fontaine
Date:
Subject: Re: Benchmark Data requested --- pgloader CE design ideas