Thread: psql is slow and it does not take much resources
Hi all, I've been searching around for an answer to this, but I coulnd't find anything. So here we go. I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB of RAM and lot of free HD space. I have a very large dump file, more then 4GB, to recreate a database. When I run: psql -Uuser -q -dmydatabase <mydump.sql & It works, but is slow, it inserts about 1.000 record in 10 seconds. But the curious thing is that if I check the resources of the server it seems that psql is not using them much. "top" says that postgres is using around 2% of the processor and psql around 1% so that server is only 3% used. "top" also says that more or less 48% idle and 50%waiting... In the other hand the memory seems to be very high... What could be the problem? Is that the server is just slow reading from the file? It has a 300GB SCSI drive. Is there any way to make this work faster? I would like to recreate the db at least once a month, but this taked too much time... Thanks in advance for any advice. Javier.
Javier de la Torre wrote: > Hi all, > > I've been searching around for an answer to this, but I coulnd't find > anything. So here we go. > > I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB > of RAM and lot of free HD space. > > I have a very large dump file, more then 4GB, to recreate a database. > When I run: > > psql -Uuser -q -dmydatabase <mydump.sql & > > It works, but is slow, it inserts about 1.000 record in 10 seconds. > But the curious thing is that if I check the resources of the server > it seems that psql is not using them much. "top" says that postgres is > using around 2% of the processor and psql around 1% so that server is > only 3% used. "top" also says that more or less 48% idle and > 50%waiting... > > In the other hand the memory seems to be very high... > > What could be the problem? Is that the server is just slow reading > from the file? It has a 300GB SCSI drive. > > Is there any way to make this work faster? I would like to recreate > the db at least once a month, but this taked too much time... > > Thanks in advance for any advice. > > Javier. > How is the MyDump created? is it inserts? Or copy's? -- Larry Rosenman Database Support Engineer PERVASIVE SOFTWARE. INC. 12365B RIATA TRACE PKWY 3015 AUSTIN TX 78727-6531 Tel: 512.231.6173 Fax: 512.231.6597 Email: Larry.Rosenman@pervasive.com Web: www.pervasive.com
It is inserts. I create the inserts myself with a Python programmed I hace created to migrate MySQL databases to PostgreSQL (by th way if someone wants it...) Thanks. Javier. On 5/3/06, Larry Rosenman <lrosenman@pervasive.com> wrote: > Javier de la Torre wrote: > > Hi all, > > > > I've been searching around for an answer to this, but I coulnd't find > > anything. So here we go. > > > > I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB > > of RAM and lot of free HD space. > > > > I have a very large dump file, more then 4GB, to recreate a database. > > When I run: > > > > psql -Uuser -q -dmydatabase <mydump.sql & > > > > It works, but is slow, it inserts about 1.000 record in 10 seconds. > > But the curious thing is that if I check the resources of the server > > it seems that psql is not using them much. "top" says that postgres is > > using around 2% of the processor and psql around 1% so that server is > > only 3% used. "top" also says that more or less 48% idle and > > 50%waiting... > > > > In the other hand the memory seems to be very high... > > > > What could be the problem? Is that the server is just slow reading > > from the file? It has a 300GB SCSI drive. > > > > Is there any way to make this work faster? I would like to recreate > > the db at least once a month, but this taked too much time... > > > > Thanks in advance for any advice. > > > > Javier. > > > How is the MyDump created? is it inserts? Or copy's? > > > > -- > Larry Rosenman > Database Support Engineer > > PERVASIVE SOFTWARE. INC. > 12365B RIATA TRACE PKWY > 3015 > AUSTIN TX 78727-6531 > > Tel: 512.231.6173 > Fax: 512.231.6597 > Email: Larry.Rosenman@pervasive.com > Web: www.pervasive.com >
Javier de la Torre wrote: > It is inserts. > > I create the inserts myself with a Python programmed I hace created to > migrate MySQL databases to PostgreSQL (by th way if someone wants > it...) Ok, that makes *EACH* insert a transaction, with all the overhead. You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better yet set it up as a COPY. -- Larry Rosenman Database Support Engineer PERVASIVE SOFTWARE. INC. 12365B RIATA TRACE PKWY 3015 AUSTIN TX 78727-6531 Tel: 512.231.6173 Fax: 512.231.6597 Email: Larry.Rosenman@pervasive.com Web: www.pervasive.com
On Wednesday 03 May 2006 16:12, Larry Rosenman wrote: >Javier de la Torre wrote: >> It is inserts. >> >> I create the inserts myself with a Python programmed I hace created >> to migrate MySQL databases to PostgreSQL (by th way if someone wants >> it...) > >Ok, that makes *EACH* insert a transaction, with all the overhead. > >You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better >yet set it up as a COPY. I'm using essentially the same approach for my custom backup/restore procedure. I also found it a very slow process. But when I wrapped up each table script (ie. 20-30k of INSERTs) the time it took to populate the entire database went down from about half an hour to 50 seconds. Very impressive ;-) However, I'm wondering if there's a practical limit to how many rows you can insert within one transaction? -- Leif Biberg Kristensen :: Registered Linux User #338009 http://solumslekt.org/ :: Cruising with Gentoo/KDE
Yes, Thanks. I am doing this now... Is definetly faster, but I will also discover now if there is a limit in a transaction side... I am going to try to insert into one single transaction 60 million records in a table. In any case I still don't understand how why PostgreSQL was not taking resources before without the transaction. If it has to create a transaction per insert I understand it will have to do more things, but why is not taking all resources from the machine? I mean, why is it only taking 3% of them. Javier. On 5/3/06, Leif B. Kristensen <leif@solumslekt.org> wrote: > On Wednesday 03 May 2006 16:12, Larry Rosenman wrote: > >Javier de la Torre wrote: > >> It is inserts. > >> > >> I create the inserts myself with a Python programmed I hace created > >> to migrate MySQL databases to PostgreSQL (by th way if someone wants > >> it...) > > > >Ok, that makes *EACH* insert a transaction, with all the overhead. > > > >You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better > >yet set it up as a COPY. > > I'm using essentially the same approach for my custom backup/restore > procedure. I also found it a very slow process. But when I wrapped up > each table script (ie. 20-30k of INSERTs) the time it took to populate > the entire database went down from about half an hour to 50 seconds. > Very impressive ;-) > > However, I'm wondering if there's a practical limit to how many rows you > can insert within one transaction? > -- > Leif Biberg Kristensen :: Registered Linux User #338009 > http://solumslekt.org/ :: Cruising with Gentoo/KDE > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Javier de la Torre wrote: > Yes, > > Thanks. I am doing this now... > > Is definetly faster, but I will also discover now if there is a limit > in a transaction side... I am going to try to insert into one single > transaction 60 million records in a table. > > In any case I still don't understand how why PostgreSQL was not taking > resources before without the transaction. If it has to create a > transaction per insert I understand it will have to do more things, > but why is not taking all resources from the machine? I mean, why is > it only taking 3% of them. > I'll bet your WAL disk is mostly WAIT-I/O, waiting for the WAL log flushes at end of transaction. LER -- Larry Rosenman Database Support Engineer PERVASIVE SOFTWARE. INC. 12365B RIATA TRACE PKWY 3015 AUSTIN TX 78727-6531 Tel: 512.231.6173 Fax: 512.231.6597 Email: Larry.Rosenman@pervasive.com Web: www.pervasive.com
On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote: > However, I'm wondering if there's a practical limit to how many rows you > can insert within one transaction? There's a limit of (I think) 2-4 billion commands per transaction. Each command can insert any number of tuples. So if you're doing one tuple per command that limits you to a few billion inserts per transaction. Ofcourse, COPY is always faster still... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Great! Then there will be no problems. I would use COPY but I think I can not. While moving from MySQL to PostgreSQL I am also transforming a pair of fields, latitude, longitude, into a geometry field, POINT, that is understood for Potgis. I though I will not be able to use COPY when inserting data with functions. Thanks again all. Javier. On 5/3/06, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote: > > However, I'm wondering if there's a practical limit to how many rows you > > can insert within one transaction? > > There's a limit of (I think) 2-4 billion commands per transaction. Each > command can insert any number of tuples. > > So if you're doing one tuple per command that limits you to a few > billion inserts per transaction. Ofcourse, COPY is always faster > still... > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > From each according to his ability. To each according to his ability to litigate. > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (GNU/Linux) > > iD8DBQFEWMwgIB7bNG8LQkwRAnvUAJ9YlsyGDInXKwFhsViFTJXvnUmd9ACeO5Al > LLqOvjBshH9VXfR1SaBHMYE= > =itek > -----END PGP SIGNATURE----- > > >
Martijn van Oosterhout <kleptog@svana.org> writes: >> However, I'm wondering if there's a practical limit to how many rows you >> can insert within one transaction? > There's a limit of (I think) 2-4 billion commands per transaction. Each > command can insert any number of tuples. > So if you're doing one tuple per command that limits you to a few > billion inserts per transaction. Ofcourse, COPY is always faster > still... If you have any deferred triggers (including foreign keys) on the table then the practical limit is likely to be a lot less, say in the few millions. However, in a bulk data load situation you probably don't need to have such triggers; it's better to establish the constraint after you load the data. BTW, has the OP read http://developer.postgresql.org/docs/postgres/populate.html ? Much of this thread seems to be rehashing that page ... regards, tom lane
Javier de la Torre wrote: > Great! Then there will be no problems. > > I would use COPY but I think I can not. While moving from MySQL to > PostgreSQL I am also transforming a pair of fields, latitude, > longitude, into a geometry field, POINT, that is understood for > Potgis. I though I will not be able to use COPY when inserting data > with functions. I definitely recommend using copy if you are inserting a large amount of data into postgis. we use something like the following python code to read from a csv file and insert into pgsql. I can't remember the rate it works at but it was much quicker than anything else we tried. def insertData( header, delimiter, filename, table, SRID, dbname, user, host ): f = open(filename, 'r') # Open a new process to enter data (~10x faster than psycopg) process = os.popen('psql %s %s -c "COPY %s (geom, elevation) from stdin;" -h %s' % (dbname, user, table, host), "w") for a in f: unpackline = a[:-1].split(delimiter) easting, northing, elevation = unpackline process.write("SRID=%s;POINT(%s %s)\t%s\n" %( SRID, easting, northing, elevation)) f.close() process.close() Hope that helps, Joe
Javier de la Torre wrote: > Great! Then there will be no problems. > > I would use COPY but I think I can not. While moving from MySQL to > PostgreSQL I am also transforming a pair of fields, latitude, > longitude, into a geometry field, POINT, that is understood for > Potgis. I though I will not be able to use COPY when inserting data > with functions. I think you'd get better results COPYing into a (temporary) table (say it's called 'fish') and then inserting your data into your table using INSERT (...) SELECT ... FROM fish; with the necessary conversions. I think it'll be still faster than a couple million seperate INSERTs, and it gives you the opportunity to fix mistakes if you issue the above command inside a transaction. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote: > However, I'm wondering if there's a practical limit to how many rows you > can insert within one transaction? I believe transactions are limited to 4B commands, so the answer would be 4B rows. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, May 03, 2006 at 04:43:15PM +0200, Javier de la Torre wrote: > Yes, > > Thanks. I am doing this now... > > Is definetly faster, but I will also discover now if there is a limit > in a transaction side... I am going to try to insert into one single > transaction 60 million records in a table. > > In any case I still don't understand how why PostgreSQL was not taking > resources before without the transaction. If it has to create a > transaction per insert I understand it will have to do more things, > but why is not taking all resources from the machine? I mean, why is > it only taking 3% of them. Because a server has more than just CPU as a resource. In this case you were undoubtedly limited by the drives that pg_xlog is on. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, May 03, 2006 at 04:11:36PM +0200, Javier de la Torre wrote: > It is inserts. > > I create the inserts myself with a Python programmed I hace created to > migrate MySQL databases to PostgreSQL (by th way if someone wants > it...) Have you looked at http://pgfoundry.org/projects/my2postgres/ ? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thursday 04 May 2006 22:30, Jim C. Nasby wrote: >On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote: >> However, I'm wondering if there's a practical limit to how many rows >> you can insert within one transaction? > >I believe transactions are limited to 4B commands, so the answer would >be 4B rows. That is definitely not the case. I routinely do around 36000 inserts wrapped up in one transaction. I know that there is one hard-wired limit due to the OID wrap-around problem, at 2^31 commands in one transaction. But the practical limit due to hardware resources is probably much lower. -- Leif Biberg Kristensen :: Registered Linux User #338009 http://solumslekt.org/ :: Cruising with Gentoo/KDE
On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote: > On Thursday 04 May 2006 22:30, Jim C. Nasby wrote: > >On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote: > >> However, I'm wondering if there's a practical limit to how many rows > >> you can insert within one transaction? > > > >I believe transactions are limited to 4B commands, so the answer would > >be 4B rows. > > That is definitely not the case. I routinely do around 36000 inserts > wrapped up in one transaction. Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*. And as Tom mentioned, if you have foreign keys or triggers each insert will burn through multiple CIDs. > I know that there is one hard-wired limit due to the OID wrap-around > problem, at 2^31 commands in one transaction. But the practical limit > due to hardware resources is probably much lower. This has nothing to do with OIDs, and in fact I don't believe there's any intrinsic reason why you couldn't insert more than 2B records in a table with OIDs so long as you don't have a unique index defined on it. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, 2006-05-04 at 16:06, Jim C. Nasby wrote: > On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote: > > I know that there is one hard-wired limit due to the OID wrap-around > > problem, at 2^31 commands in one transaction. But the practical limit > > due to hardware resources is probably much lower. > > This has nothing to do with OIDs, and in fact I don't believe there's > any intrinsic reason why you couldn't insert more than 2B records in a > table with OIDs so long as you don't have a unique index defined on it. But there is a truth there. Most people are far more likely to be limited by their machine's hardware than by the database at this point. Of course, SOMEONE will find a way to hit the limit.
On Thursday 04 May 2006 23:06, Jim C. Nasby wrote: >On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote: >> On Thursday 04 May 2006 22:30, Jim C. Nasby wrote: >> >I believe transactions are limited to 4B commands, so the answer >> > would be 4B rows. >> >> That is definitely not the case. I routinely do around 36000 inserts >> wrapped up in one transaction. > >Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*. I've wiped my glasses now :-) I'm not used to reading B for Billion, and I thought I saw an 8. (Here in Scandinavia, we adhere to the French convention of 1000 millions = 1 milliard. 1000 milliards = 1 billion.) But it looks like we do agree on the order of magnitude after all. Another interpretation leads to reading 4B as 4 bytes, and given that a byte equals 8 bits, 4B would be the width of a 32-bit integer. -- Leif Biberg Kristensen :: Registered Linux User #338009 http://solumslekt.org/ :: Cruising with Gentoo/KDE