Re: Issues with \copy from file - Mailing list pgsql-performance

From Sigurgeir Gunnarsson
Subject Re: Issues with \copy from file
Date
Msg-id ebd3ad520912180751q26d770d0gad7f4a516d9e3b53@mail.gmail.com
Whole thread Raw
In response to Re: Issues with \copy from file  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Issues with \copy from file  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
The intention was never to talk down postgresql but rather trying to get some explanation of this difference so that I could do the proper changes.

After having read the link from Euler's post, which I oversaw, I have managed to shorten the import time. My problem was with the indexes. I was able to shorten the import time, of a 26 million line import, from 2 hours + (I gave up after that time) downto 12 minutes by dropping the indexes after truncate and before copy.

This is what I was expecting and I'm therefore satisfied with the result.

Regards, Sigurgeir

2009/12/18 Robert Haas <robertmhaas@gmail.com>
On Fri, Dec 18, 2009 at 7:46 AM, Sigurgeir Gunnarsson
<sgunnars@gmail.com> wrote:
> I hope the issue is still open though I haven't replied to it before.
>
> Euler mentioned that I did not provide any details about my system. I'm
> using version 8.3 and with most settings default on an old machine with 2 GB
> of mem. The table definition is simple, four columns; id, value, x, y where
> id is primary key and x, y are combined into an index.
>
> I'm not sure if it matters but unlike Euler's suggestion I'm using \copy
> instead of COPY. Regarding my comparison to MySQL, it is completely valid.
> This is done on the same computer, using the same disk on the same platform.
> From that I would derive that IO is not my problem, unless postgresql is
> doing IO twice while MySQL only once.
>
> I guess my tables are InnoDB since that is the default type (or so I think).
> BEGIN/COMMIT I did not find change much. Are there any other suggestions ?

Did you read Matthew Wakeling's reply?  Arranging to skip WAL will
help a lot here.  To do that, you need to either create or truncate
the table in the same transaction that does the COPY.

The problem with the MySQL comparison is that it's not really
relevant.   It isn't that the PostgreSQL code just sucks and if we
wrote it properly it would be as fast as MySQL.  If that were the
case, everyone would be up in arms, and it would have been fixed long
ago.  Rather, the problem is almost certainly that it's not an
apples-to-apples comparison.  MySQL is probably doing something
different, such as perhaps not properly arranging for recovery if the
system goes down in the middle of the copy, or just after it
completes.  But I don't know MySQL well enough to know exactly what
the difference is, and I'm not particularly interested in spending a
lot of time figuring it out.  I think you'll get that reaction from
others on this list as well, but of course that's up to them.
Everybody here is a volunteer, of course, and generally our interest
is principally PostgreSQL.

On the other hand, we can certainly give you lots of information about
what PostgreSQL is doing and why that takes the amount of time that it
does, or give you information on how you can find out more about what
it's doing.

...Robert

pgsql-performance by date:

Previous
From: Matthew Wakeling
Date:
Subject: Re: Idea how to get rid of Bitmap Heap Scan
Next
From: "Michael N. Mikhulya"
Date:
Subject: Re: Idea how to get rid of Bitmap Heap Scan