Thread: Optimize for insertions?
Hi,
I'm trying to migrate a mysql database to postgresql. I've got the tables all set up, but inserting the data from the mysql dumps is taking forever. I've got about 200 million rows in the big table (growing by about 1.5 million per day), but at the 100 inserts/second I'm getting, this will take over 3 weeks. MySQL on the same machine averages about 1100-1200 inserts/second (no, I'm not running them both at the same time ;-)
Any thoughts on how to tweak the postmaster for quick inserts? I've got fsync turned off, but that's about all I've tried so far.
I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a 700MHz processor, 256MB RAM, and an 80GB IDE HD
Thanks
-partap
I think that the fastest way that PostgreSQL can load data into a table is using COPY. Try exporting the data from mysql in a text format, for example tab-delimited. Then, use COPY to load it. That should go *much* faster than inserts. If that is for some reason not a viable solution, break the inserts into large blocks and make each one its own transaction (explicit begin/commit). This tells postgres when to flush to disk, basically. This might not really help you since you already turned off fsync. Hope this helps, Jeff Davis On Monday 26 November 2001 01:17 pm, you wrote: > Hi, > > I'm trying to migrate a mysql database to postgresql. I've got the tables > all set up, but inserting the data from the mysql dumps is taking forever. > I've got about 200 million rows in the big table (growing by about 1.5 > million per day), but at the 100 inserts/second I'm getting, this will take > over 3 weeks. MySQL on the same machine averages about 1100-1200 > inserts/second (no, I'm not running them both at the same time ;-) > > Any thoughts on how to tweak the postmaster for quick inserts? I've got > fsync turned off, but that's about all I've tried so far. > > I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a 700MHz > processor, 256MB RAM, and an 80GB IDE HD > > Thanks > -partap
Hi, What you probably want to do is export the mysql data into delimited text files and use Postgresql's COPY command. It's much faster than doing straight INSERT's. If you have to do inserts for some reason, it might be helpful to group them into chunks of N insert operations each, and put each chunk in a transaction. I'm not sure what a good value for N would be though, and this wouldn't be nearly as fast as the COPY command in any case. HTH, Wes Sheldahl "Gurupartap Davis" <partap%yahoo.com@interlock.lexmark.com> on 11/26/2001 04:17:00 PM To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] Optimize for insertions? Hi, I'm trying to migrate a mysql database to postgresql. I've got the tables all set up, but inserting the data from the mysql dumps is taking forever. I've got about 200 million rows in the big table (growing by about 1.5 million per day), but at the 100 inserts/second I'm getting, this will take over 3 weeks. MySQL on the same machine averages about 1100-1200 inserts/second (no, I'm not running them both at the same time ;-) Any thoughts on how to tweak the postmaster for quick inserts? I've got fsync turned off, but that's about all I've tried so far. I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a 700MHz processor, 256MB RAM, and an 80GB IDE HD Thanks -partap
1. Look into using the COPY command. This allows fast bulk imports. 2. Consider tweaking your transactions. If you are not using transactions (possible considering you are migrating from MySQL), understand that PostgreSQL creates an implicit transaction for each statement, unless an explicit transaction is created. That means that you are dealing with the additional overhead of 200 million transactions on top of the INSERTs themselves. Of course, doing everything inside a single transaction can be cumbersome too. Try stuffing a thousand or so INSERTs into a single transaction and see if that speeds things up. 3. If you created indexes, drop them during your import, and readd them after you've finished. Maintaining those slows things down a bit as well. Hope this helps! Greg ----- Original Message ----- From: Gurupartap Davis To: pgsql-general@postgresql.org Sent: Monday, November 26, 2001 4:17 PM Subject: [GENERAL] Optimize for insertions? Hi, I'm trying to migrate a mysql database to postgresql. I've got the tables all set up, but inserting the data from the mysql dumps is taking forever. I've got about 200 million rows in the big table (growing by about 1.5 million per day), but at the 100 inserts/second I'm getting, this will take over 3 weeks. MySQL on the same machine averages about 1100-1200 inserts/second (no, I'm not running them both at the same time ;-) Any thoughts on how to tweak the postmaster for quick inserts? I've got fsync turned off, but that's about all I've tried so far. I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a 700MHz processor, 256MB RAM, and an 80GB IDE HD Thanks -partap
Thanks for the advice, everyone...I've got my mysql dump files (about 300 50MB files) inserting at a rate of about 1 minute each now vs. 1.5 hours each before(!) I'm worried about creating indexes after all the data is in there, though...I'm going to need 4-5 indexes. Isn't that going to take a hella long time? How are indexes created anyhow? I would assume a linear search through every record in the table...would vacuum help at all? how bout vacuum analyze? ----- Original Message ----- From: "Gregory Wood" <gregw@com-stock.com> To: "Gurupartap Davis" <partap@yahoo.com> Cc: "PostgreSQL-General" <pgsql-general@postgresql.org> Sent: Monday, November 26, 2001 3:09 PM Subject: Re: [GENERAL] Optimize for insertions? > 1. Look into using the COPY command. This allows fast bulk imports. > 2. Consider tweaking your transactions. If you are not using transactions > (possible considering you are migrating from MySQL), understand that > PostgreSQL creates an implicit transaction for each statement, unless an > explicit transaction is created. That means that you are dealing with the > additional overhead of 200 million transactions on top of the INSERTs > themselves. Of course, doing everything inside a single transaction can be > cumbersome too. Try stuffing a thousand or so INSERTs into a single > transaction and see if that speeds things up. > 3. If you created indexes, drop them during your import, and readd them > after you've finished. Maintaining those slows things down a bit as well. > > Hope this helps! > > Greg > > ----- Original Message ----- > From: Gurupartap Davis > To: pgsql-general@postgresql.org > Sent: Monday, November 26, 2001 4:17 PM > Subject: [GENERAL] Optimize for insertions? > > > Hi, > > I'm trying to migrate a mysql database to postgresql. I've got the tables > all set up, but inserting the data from the mysql dumps is taking forever. > I've got about 200 million rows in the big table (growing by about 1.5 > million per day), but at the 100 inserts/second I'm getting, this will take > over 3 weeks. MySQL on the same machine averages about 1100-1200 > inserts/second (no, I'm not running them both at the same time ;-) > > Any thoughts on how to tweak the postmaster for quick inserts? I've got > fsync turned off, but that's about all I've tried so far. > > I'm running postgresql 7.1.3 on Linux (Redhat 7.2 with ext3) with a 700MHz > processor, 256MB RAM, and an 80GB IDE HD > > Thanks > -partap > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Mon, 26 Nov 2001, Gurupartap Davis wrote: > I'm worried about creating indexes after all the data is in there, > though...I'm going to need 4-5 indexes. Isn't that going to take a hella > long time? How are indexes created anyhow? I would assume a linear search > through every record in the table...would vacuum help at all? how bout > vacuum analyze? I am somewhat of a newbie, but I was told that 'copy' doesn't use the indexes so you could create your indexes and when you are all done with the copy commands then run a vacuum analyze. I do bulk uploads daily from a Foxpro system, been 7million my biggest table, and I have found that a plain vacuum is helpfull after each bulk upload. When I am done with all my bulk uploads, 3 files, then I do one vacuum analyze. In theory you could just do a single vacuum analyze when you are all done, but for me a vacuum after each upload seems to help. In particular I am still trying to decide whether to use truncate or delete's. When I use delete old the old records seems to still "be around" so the vacuum gets rid of them. The main advantage of truncate is that it is a fast way to empty a table, it's main drawback is that it can not be rolled back.