Re: Issues with \copy from file - Mailing list pgsql-performance

From Robert Haas
Subject Re: Issues with \copy from file
Date
Msg-id 603c8f070912180723g766e6810o2bbfb5a1f0f928f3@mail.gmail.com
Whole thread Raw
In response to Re: Issues with \copy from file  (Sigurgeir Gunnarsson <sgunnars@gmail.com>)
Responses Re: Issues with \copy from file  (Sigurgeir Gunnarsson <sgunnars@gmail.com>)
List pgsql-performance
On Fri, Dec 18, 2009 at 7:46 AM, Sigurgeir Gunnarsson
<sgunnars@gmail.com> wrote:
> I hope the issue is still open though I haven't replied to it before.
>
> Euler mentioned that I did not provide any details about my system. I'm
> using version 8.3 and with most settings default on an old machine with 2 GB
> of mem. The table definition is simple, four columns; id, value, x, y where
> id is primary key and x, y are combined into an index.
>
> I'm not sure if it matters but unlike Euler's suggestion I'm using \copy
> instead of COPY. Regarding my comparison to MySQL, it is completely valid.
> This is done on the same computer, using the same disk on the same platform.
> From that I would derive that IO is not my problem, unless postgresql is
> doing IO twice while MySQL only once.
>
> I guess my tables are InnoDB since that is the default type (or so I think).
> BEGIN/COMMIT I did not find change much. Are there any other suggestions ?

Did you read Matthew Wakeling's reply?  Arranging to skip WAL will
help a lot here.  To do that, you need to either create or truncate
the table in the same transaction that does the COPY.

The problem with the MySQL comparison is that it's not really
relevant.   It isn't that the PostgreSQL code just sucks and if we
wrote it properly it would be as fast as MySQL.  If that were the
case, everyone would be up in arms, and it would have been fixed long
ago.  Rather, the problem is almost certainly that it's not an
apples-to-apples comparison.  MySQL is probably doing something
different, such as perhaps not properly arranging for recovery if the
system goes down in the middle of the copy, or just after it
completes.  But I don't know MySQL well enough to know exactly what
the difference is, and I'm not particularly interested in spending a
lot of time figuring it out.  I think you'll get that reaction from
others on this list as well, but of course that's up to them.
Everybody here is a volunteer, of course, and generally our interest
is principally PostgreSQL.

On the other hand, we can certainly give you lots of information about
what PostgreSQL is doing and why that takes the amount of time that it
does, or give you information on how you can find out more about what
it's doing.

...Robert

pgsql-performance by date:

Previous
From: Grzegorz Jaśkiewicz
Date:
Subject: Re: Automatic optimization of IN clauses via INNER JOIN
Next
From: Robert Haas
Date:
Subject: Re: Automatic optimization of IN clauses via INNER JOIN