Thread: psql is slow and it does not take much resources

psql is slow and it does not take much resources

From
"Javier de la Torre"
Date:
Hi all,

I've been searching around for an answer to this, but I coulnd't find
anything. So here we go.

I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB
of RAM and lot of free HD space.

I have a very large dump file, more then 4GB, to recreate a database.
When I run:

psql -Uuser -q -dmydatabase <mydump.sql &

It works, but is slow, it inserts about 1.000 record in 10 seconds.
But the curious thing is that if I check the resources of the server
it seems that psql is not using them much. "top" says that postgres is
using around 2% of the processor and psql around 1% so that server is
only 3% used. "top" also says that more or less 48% idle and
50%waiting...

In the other hand the memory seems to be very high...

What could be the problem? Is that the server is just slow reading
from the file? It has a 300GB SCSI drive.

Is there any way to make this work faster? I would like to recreate
the db at least once a month, but this taked too much time...

Thanks in advance for any advice.

Javier.

Re: psql is slow and it does not take much resources

From
"Larry Rosenman"
Date:
Javier de la Torre wrote:
> Hi all,
>
> I've been searching around for an answer to this, but I coulnd't find
> anything. So here we go.
>
> I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB
> of RAM and lot of free HD space.
>
> I have a very large dump file, more then 4GB, to recreate a database.
> When I run:
>
> psql -Uuser -q -dmydatabase <mydump.sql &
>
> It works, but is slow, it inserts about 1.000 record in 10 seconds.
> But the curious thing is that if I check the resources of the server
> it seems that psql is not using them much. "top" says that postgres is
> using around 2% of the processor and psql around 1% so that server is
> only 3% used. "top" also says that more or less 48% idle and
> 50%waiting...
>
> In the other hand the memory seems to be very high...
>
> What could be the problem? Is that the server is just slow reading
> from the file? It has a 300GB SCSI drive.
>
> Is there any way to make this work faster? I would like to recreate
> the db at least once a month, but this taked too much time...
>
> Thanks in advance for any advice.
>
> Javier.
>
How is the MyDump created?  is it inserts? Or copy's?



--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

Re: psql is slow and it does not take much resources

From
"Javier de la Torre"
Date:
It is inserts.

I create the inserts myself with a Python programmed I hace created to
migrate MySQL databases to PostgreSQL (by th way if someone wants
it...)

Thanks.

Javier.



On 5/3/06, Larry Rosenman <lrosenman@pervasive.com> wrote:
> Javier de la Torre wrote:
> > Hi all,
> >
> > I've been searching around for an answer to this, but I coulnd't find
> > anything. So here we go.
> >
> > I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB
> > of RAM and lot of free HD space.
> >
> > I have a very large dump file, more then 4GB, to recreate a database.
> > When I run:
> >
> > psql -Uuser -q -dmydatabase <mydump.sql &
> >
> > It works, but is slow, it inserts about 1.000 record in 10 seconds.
> > But the curious thing is that if I check the resources of the server
> > it seems that psql is not using them much. "top" says that postgres is
> > using around 2% of the processor and psql around 1% so that server is
> > only 3% used. "top" also says that more or less 48% idle and
> > 50%waiting...
> >
> > In the other hand the memory seems to be very high...
> >
> > What could be the problem? Is that the server is just slow reading
> > from the file? It has a 300GB SCSI drive.
> >
> > Is there any way to make this work faster? I would like to recreate
> > the db at least once a month, but this taked too much time...
> >
> > Thanks in advance for any advice.
> >
> > Javier.
> >
> How is the MyDump created?  is it inserts? Or copy's?
>
>
>
> --
> Larry Rosenman
> Database Support Engineer
>
> PERVASIVE SOFTWARE. INC.
> 12365B RIATA TRACE PKWY
> 3015
> AUSTIN TX  78727-6531
>
> Tel: 512.231.6173
> Fax: 512.231.6597
> Email: Larry.Rosenman@pervasive.com
> Web: www.pervasive.com
>

Re: psql is slow and it does not take much resources

From
"Larry Rosenman"
Date:
Javier de la Torre wrote:
> It is inserts.
>
> I create the inserts myself with a Python programmed I hace created to
> migrate MySQL databases to PostgreSQL (by th way if someone wants
> it...)
Ok, that makes *EACH* insert a transaction, with all the overhead.

You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
yet
set it up as a COPY.



--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

Re: psql is slow and it does not take much resources

From
"Leif B. Kristensen"
Date:
On Wednesday 03 May 2006 16:12, Larry Rosenman wrote:
>Javier de la Torre wrote:
>> It is inserts.
>>
>> I create the inserts myself with a Python programmed I hace created
>> to migrate MySQL databases to PostgreSQL (by th way if someone wants
>> it...)
>
>Ok, that makes *EACH* insert a transaction, with all the overhead.
>
>You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
>yet set it up as a COPY.

I'm using essentially the same approach for my custom backup/restore
procedure. I also found it a very slow process. But when I wrapped up
each table script (ie. 20-30k of INSERTs) the time it took to populate
the entire database went down from about half an hour to 50 seconds.
Very impressive ;-)

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE

Re: psql is slow and it does not take much resources

From
"Javier de la Torre"
Date:
Yes,

Thanks. I am doing this now...

Is definetly faster, but I will also discover now if there is a limit
in a transaction side... I am going to try to insert into one single
transaction 60 million records in a table.

In any case I still don't understand how why PostgreSQL was not taking
resources before without the transaction. If it has to create a
transaction per insert I understand it will have to do more things,
but why is not taking all resources from the machine? I mean, why is
it only taking 3% of them.

Javier.

On 5/3/06, Leif B. Kristensen <leif@solumslekt.org> wrote:
> On Wednesday 03 May 2006 16:12, Larry Rosenman wrote:
> >Javier de la Torre wrote:
> >> It is inserts.
> >>
> >> I create the inserts myself with a Python programmed I hace created
> >> to migrate MySQL databases to PostgreSQL (by th way if someone wants
> >> it...)
> >
> >Ok, that makes *EACH* insert a transaction, with all the overhead.
> >
> >You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
> >yet set it up as a COPY.
>
> I'm using essentially the same approach for my custom backup/restore
> procedure. I also found it a very slow process. But when I wrapped up
> each table script (ie. 20-30k of INSERTs) the time it took to populate
> the entire database went down from about half an hour to 50 seconds.
> Very impressive ;-)
>
> However, I'm wondering if there's a practical limit to how many rows you
> can insert within one transaction?
> --
> Leif Biberg Kristensen :: Registered Linux User #338009
> http://solumslekt.org/ :: Cruising with Gentoo/KDE
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: psql is slow and it does not take much resources

From
"Larry Rosenman"
Date:
Javier de la Torre wrote:
> Yes,
>
> Thanks. I am doing this now...
>
> Is definetly faster, but I will also discover now if there is a limit
> in a transaction side... I am going to try to insert into one single
> transaction 60 million records in a table.
>
> In any case I still don't understand how why PostgreSQL was not taking
> resources before without the transaction. If it has to create a
> transaction per insert I understand it will have to do more things,
> but why is not taking all resources from the machine? I mean, why is
> it only taking 3% of them.
>
I'll bet your WAL disk is mostly WAIT-I/O, waiting for the WAL log
flushes at end of transaction.

LER

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

Re: psql is slow and it does not take much resources

From
Martijn van Oosterhout
Date:
On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> However, I'm wondering if there's a practical limit to how many rows you
> can insert within one transaction?

There's a limit of (I think) 2-4 billion commands per transaction. Each
command can insert any number of tuples.

So if you're doing one tuple per command that limits you to a few
billion inserts per transaction. Ofcourse, COPY is always faster
still...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: psql is slow and it does not take much resources

From
"Javier de la Torre"
Date:
Great! Then there will be no problems.

I would use COPY but I think I can not. While moving from MySQL to
PostgreSQL I am also transforming a pair of fields, latitude,
longitude, into a geometry field, POINT, that is understood for
Potgis. I though I will not be able to use COPY when inserting data
with functions.

Thanks again all.

Javier.

On 5/3/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> > However, I'm wondering if there's a practical limit to how many rows you
> > can insert within one transaction?
>
> There's a limit of (I think) 2-4 billion commands per transaction. Each
> command can insert any number of tuples.
>
> So if you're doing one tuple per command that limits you to a few
> billion inserts per transaction. Ofcourse, COPY is always faster
> still...
>
> Have a nice day,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to litigate.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFEWMwgIB7bNG8LQkwRAnvUAJ9YlsyGDInXKwFhsViFTJXvnUmd9ACeO5Al
> LLqOvjBshH9VXfR1SaBHMYE=
> =itek
> -----END PGP SIGNATURE-----
>
>
>

Re: psql is slow and it does not take much resources

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
>> However, I'm wondering if there's a practical limit to how many rows you
>> can insert within one transaction?

> There's a limit of (I think) 2-4 billion commands per transaction. Each
> command can insert any number of tuples.

> So if you're doing one tuple per command that limits you to a few
> billion inserts per transaction. Ofcourse, COPY is always faster
> still...

If you have any deferred triggers (including foreign keys) on the table
then the practical limit is likely to be a lot less, say in the few
millions.  However, in a bulk data load situation you probably don't
need to have such triggers; it's better to establish the constraint
after you load the data.

BTW, has the OP read
http://developer.postgresql.org/docs/postgres/populate.html
?  Much of this thread seems to be rehashing that page ...

            regards, tom lane

Re: psql is slow and it does not take much resources

From
Joe Healy
Date:
Javier de la Torre wrote:
> Great! Then there will be no problems.
>
> I would use COPY but I think I can not. While moving from MySQL to
> PostgreSQL I am also transforming a pair of fields, latitude,
> longitude, into a geometry field, POINT, that is understood for
> Potgis. I though I will not be able to use COPY when inserting data
> with functions.
I definitely recommend using copy if you are inserting a large amount of
data into postgis.
we use something like the following python code to read from a csv file
and insert into pgsql.
I can't remember the rate it works at but it was much quicker than
anything else we tried.

def insertData( header, delimiter, filename, table, SRID, dbname, user,
host ):

    f = open(filename, 'r')

    # Open a new process to enter data (~10x faster than psycopg)
    process = os.popen('psql %s %s -c "COPY %s (geom, elevation) from
stdin;" -h %s' % (dbname, user, table, host), "w")

    for a in f:
        unpackline = a[:-1].split(delimiter)
        easting, northing, elevation = unpackline
        process.write("SRID=%s;POINT(%s %s)\t%s\n" %( SRID, easting,
northing, elevation))

    f.close()
    process.close()

Hope that helps,

Joe

Re: psql is slow and it does not take much resources

From
Alban Hertroys
Date:
Javier de la Torre wrote:
> Great! Then there will be no problems.
>
> I would use COPY but I think I can not. While moving from MySQL to
> PostgreSQL I am also transforming a pair of fields, latitude,
> longitude, into a geometry field, POINT, that is understood for
> Potgis. I though I will not be able to use COPY when inserting data
> with functions.

I think you'd get better results COPYing into a (temporary) table (say
it's called 'fish') and then inserting your data into your table using
INSERT (...) SELECT ... FROM fish; with the necessary conversions.

I think it'll be still faster than a couple million seperate INSERTs,
and it gives you the opportunity to fix mistakes if you issue the above
command inside a transaction.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: psql is slow and it does not take much resources

From
"Jim C. Nasby"
Date:
On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> However, I'm wondering if there's a practical limit to how many rows you
> can insert within one transaction?

I believe transactions are limited to 4B commands, so the answer would
be 4B rows.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: psql is slow and it does not take much resources

From
"Jim C. Nasby"
Date:
On Wed, May 03, 2006 at 04:43:15PM +0200, Javier de la Torre wrote:
> Yes,
>
> Thanks. I am doing this now...
>
> Is definetly faster, but I will also discover now if there is a limit
> in a transaction side... I am going to try to insert into one single
> transaction 60 million records in a table.
>
> In any case I still don't understand how why PostgreSQL was not taking
> resources before without the transaction. If it has to create a
> transaction per insert I understand it will have to do more things,
> but why is not taking all resources from the machine? I mean, why is
> it only taking 3% of them.

Because a server has more than just CPU as a resource. In this case you
were undoubtedly limited by the drives that pg_xlog is on.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: psql is slow and it does not take much resources

From
"Jim C. Nasby"
Date:
On Wed, May 03, 2006 at 04:11:36PM +0200, Javier de la Torre wrote:
> It is inserts.
>
> I create the inserts myself with a Python programmed I hace created to
> migrate MySQL databases to PostgreSQL (by th way if someone wants
> it...)

Have you looked at http://pgfoundry.org/projects/my2postgres/ ?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: psql is slow and it does not take much resources

From
"Leif B. Kristensen"
Date:
On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:
>On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
>> However, I'm wondering if there's a practical limit to how many rows
>> you can insert within one transaction?
>
>I believe transactions are limited to 4B commands, so the answer would
>be 4B rows.

That is definitely not the case. I routinely do around 36000 inserts
wrapped up in one transaction.

I know that there is one hard-wired limit due to the OID wrap-around
problem, at 2^31 commands in one transaction. But the practical limit
due to hardware resources is probably much lower.
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE

Re: psql is slow and it does not take much resources

From
"Jim C. Nasby"
Date:
On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:
> On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:
> >On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> >> However, I'm wondering if there's a practical limit to how many rows
> >> you can insert within one transaction?
> >
> >I believe transactions are limited to 4B commands, so the answer would
> >be 4B rows.
>
> That is definitely not the case. I routinely do around 36000 inserts
> wrapped up in one transaction.

Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*.
And as Tom mentioned, if you have foreign keys or triggers each insert
will burn through multiple CIDs.

> I know that there is one hard-wired limit due to the OID wrap-around
> problem, at 2^31 commands in one transaction. But the practical limit
> due to hardware resources is probably much lower.

This has nothing to do with OIDs, and in fact I don't believe there's
any intrinsic reason why you couldn't insert more than 2B records in a
table with OIDs so long as you don't have a unique index defined on it.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: psql is slow and it does not take much resources

From
Scott Marlowe
Date:
On Thu, 2006-05-04 at 16:06, Jim C. Nasby wrote:
> On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:

> > I know that there is one hard-wired limit due to the OID wrap-around
> > problem, at 2^31 commands in one transaction. But the practical limit
> > due to hardware resources is probably much lower.
>
> This has nothing to do with OIDs, and in fact I don't believe there's
> any intrinsic reason why you couldn't insert more than 2B records in a
> table with OIDs so long as you don't have a unique index defined on it.

But there is a truth there.  Most people are far more likely to be
limited by their machine's hardware than by the database at this point.

Of course, SOMEONE will find a way to hit the limit.

Re: psql is slow and it does not take much resources

From
"Leif B. Kristensen"
Date:
On Thursday 04 May 2006 23:06, Jim C. Nasby wrote:
>On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:
>> On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:
>> >I believe transactions are limited to 4B commands, so the answer
>> > would be 4B rows.
>>
>> That is definitely not the case. I routinely do around 36000 inserts
>> wrapped up in one transaction.
>
>Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*.

I've wiped my glasses now :-)

I'm not used to reading B for Billion, and I thought I saw an 8. (Here
in Scandinavia, we adhere to the French convention of 1000 millions = 1
milliard. 1000 milliards = 1 billion.) But it looks like we do agree on
the order of magnitude after all.

Another interpretation leads to reading 4B as 4 bytes, and given that a
byte equals 8 bits, 4B would be the width of a 32-bit integer.
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE