Thread: Optimize for insertions?

Optimize for insertions?

From
"Gurupartap Davis"
Date:
Hi,
 
I'm trying to migrate a mysql database to postgresql.  I've got the tables all set up, but inserting the data from the mysql dumps is taking forever.  I've got about 200 million rows in the big table (growing by about 1.5 million per day), but at the 100 inserts/second I'm getting, this will take over 3 weeks.  MySQL on the same machine averages about 1100-1200 inserts/second (no, I'm not running them both at the same time ;-)
 
Any thoughts on how to tweak the postmaster for quick inserts?  I've got fsync turned off, but that's about all I've tried so far. 
 
I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a  700MHz processor, 256MB RAM, and an 80GB IDE HD
 
Thanks
-partap

Re: Optimize for insertions?

From
Jeff Davis
Date:
I think that the fastest way that PostgreSQL can load data into a table is
using COPY.

Try exporting the data from mysql in a text format, for example
tab-delimited. Then, use COPY to load it. That should go *much* faster than
inserts.

If that is for some reason not a viable solution, break the inserts into
large blocks and make each one its own transaction (explicit begin/commit).
This tells postgres when to flush to disk, basically. This might not really
help you since you already turned off fsync.

Hope this helps,
    Jeff Davis

On Monday 26 November 2001 01:17 pm, you wrote:
> Hi,
>
> I'm trying to migrate a mysql database to postgresql.  I've got the tables
> all set up, but inserting the data from the mysql dumps is taking forever.
> I've got about 200 million rows in the big table (growing by about 1.5
> million per day), but at the 100 inserts/second I'm getting, this will take
> over 3 weeks.  MySQL on the same machine averages about 1100-1200
> inserts/second (no, I'm not running them both at the same time ;-)
>
> Any thoughts on how to tweak the postmaster for quick inserts?  I've got
> fsync turned off, but that's about all I've tried so far.
>
> I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a  700MHz
> processor, 256MB RAM, and an 80GB IDE HD
>
> Thanks
> -partap

Re: Optimize for insertions?

From
wsheldah@lexmark.com
Date:

Hi,

What you probably want to do is export the mysql data into delimited text files
and use Postgresql's COPY command. It's much faster than doing straight
INSERT's.

If you have to do inserts for some reason, it might be helpful to group them
into chunks of N insert operations each, and put each chunk in a transaction.
I'm not sure what a good value for N would be though, and this wouldn't be
nearly as fast as the COPY command in any case.

HTH,

Wes Sheldahl



"Gurupartap Davis" <partap%yahoo.com@interlock.lexmark.com> on 11/26/2001
04:17:00 PM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] Optimize for insertions?



Hi,

I'm trying to migrate a mysql database to  postgresql.  I've got the tables all
set up, but inserting the data from  the mysql dumps is taking forever.  I've
got about 200 million rows in the  big table (growing by about 1.5 million per
day), but at the 100  inserts/second I'm getting, this will take over 3 weeks.
MySQL on the  same machine averages about 1100-1200 inserts/second (no, I'm not
running them  both at the same time ;-)

Any thoughts on how to tweak the postmaster for  quick inserts?  I've got fsync
turned off, but that's about all I've tried  so far.

I'm running postgresql 7.1.3 on Linux (Redhat  7.2 with ext3) with a  700MHz
processor, 256MB RAM, and an 80GB IDE  HD

Thanks
-partap





Re: Optimize for insertions?

From
"Gregory Wood"
Date:
1. Look into using the COPY command. This allows fast bulk imports.
2. Consider tweaking your transactions. If you are not using transactions
(possible considering you are migrating from MySQL), understand that
PostgreSQL creates an implicit transaction for each statement, unless an
explicit transaction is created. That means that you are dealing with the
additional overhead of 200 million transactions on top of the INSERTs
themselves. Of course, doing everything inside a single transaction can be
cumbersome too. Try stuffing a thousand or so INSERTs into a single
transaction and see if that speeds things up.
3. If you created indexes, drop them during your import, and readd them
after you've finished. Maintaining those slows things down a bit as well.

Hope this helps!

Greg

----- Original Message -----
From: Gurupartap Davis
To: pgsql-general@postgresql.org
Sent: Monday, November 26, 2001 4:17 PM
Subject: [GENERAL] Optimize for insertions?


Hi,

I'm trying to migrate a mysql database to postgresql.  I've got the tables
all set up, but inserting the data from the mysql dumps is taking forever.
I've got about 200 million rows in the big table (growing by about 1.5
million per day), but at the 100 inserts/second I'm getting, this will take
over 3 weeks.  MySQL on the same machine averages about 1100-1200
inserts/second (no, I'm not running them both at the same time ;-)

Any thoughts on how to tweak the postmaster for quick inserts?  I've got
fsync turned off, but that's about all I've tried so far.

I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a  700MHz
processor, 256MB RAM, and an 80GB IDE HD

Thanks
-partap


Re: Optimize for insertions?

From
"Gurupartap Davis"
Date:
Thanks for the advice, everyone...I've got my mysql dump files (about 300
50MB files) inserting at a rate of about 1 minute each now vs. 1.5 hours
each before(!)

I'm worried about creating indexes after all the data is in there,
though...I'm going to need 4-5 indexes.  Isn't that going to take a hella
long time? How are indexes created anyhow? I would assume a linear search
through every record in the table...would vacuum help at all?  how bout
vacuum analyze?



----- Original Message -----
From: "Gregory Wood" <gregw@com-stock.com>
To: "Gurupartap Davis" <partap@yahoo.com>
Cc: "PostgreSQL-General" <pgsql-general@postgresql.org>
Sent: Monday, November 26, 2001 3:09 PM
Subject: Re: [GENERAL] Optimize for insertions?


> 1. Look into using the COPY command. This allows fast bulk imports.
> 2. Consider tweaking your transactions. If you are not using transactions
> (possible considering you are migrating from MySQL), understand that
> PostgreSQL creates an implicit transaction for each statement, unless an
> explicit transaction is created. That means that you are dealing with the
> additional overhead of 200 million transactions on top of the INSERTs
> themselves. Of course, doing everything inside a single transaction can be
> cumbersome too. Try stuffing a thousand or so INSERTs into a single
> transaction and see if that speeds things up.
> 3. If you created indexes, drop them during your import, and readd them
> after you've finished. Maintaining those slows things down a bit as well.
>
> Hope this helps!
>
> Greg
>
> ----- Original Message -----
> From: Gurupartap Davis
> To: pgsql-general@postgresql.org
> Sent: Monday, November 26, 2001 4:17 PM
> Subject: [GENERAL] Optimize for insertions?
>
>
> Hi,
>
> I'm trying to migrate a mysql database to postgresql.  I've got the tables
> all set up, but inserting the data from the mysql dumps is taking forever.
> I've got about 200 million rows in the big table (growing by about 1.5
> million per day), but at the 100 inserts/second I'm getting, this will
take
> over 3 weeks.  MySQL on the same machine averages about 1100-1200
> inserts/second (no, I'm not running them both at the same time ;-)
>
> Any thoughts on how to tweak the postmaster for quick inserts?  I've got
> fsync turned off, but that's about all I've tried so far.
>
> I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a
700MHz
> processor, 256MB RAM, and an 80GB IDE HD
>
> Thanks
> -partap
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


Re: Optimize for insertions?

From
Francisco Reyes
Date:
On Mon, 26 Nov 2001, Gurupartap Davis wrote:

> I'm worried about creating indexes after all the data is in there,
> though...I'm going to need 4-5 indexes.  Isn't that going to take a hella
> long time? How are indexes created anyhow? I would assume a linear search
> through every record in the table...would vacuum help at all?  how bout
> vacuum analyze?

I am somewhat of a newbie, but I was told that 'copy' doesn't use the
indexes so you could create your indexes and when you are all done with
the copy commands then run a vacuum analyze.

I do bulk uploads daily from a Foxpro system, been 7million my biggest
table, and I have found that a plain vacuum is helpfull after each bulk upload.
When I am done with all my bulk uploads, 3 files, then I do one vacuum analyze.

In theory you could just do a single vacuum analyze when you are all done,
but for me a vacuum after each upload seems to help. In particular I am
still trying to decide whether to use truncate or delete's. When I use
delete old the old records seems to still "be around" so the vacuum gets
rid of them. The main advantage of truncate is that it is a fast way to
empty a table, it's main drawback is that it can not be rolled back.