Thread: Postmaster using only 4-5% CPU

Postmaster using only 4-5% CPU

From
Edoardo Serra
Date:
Hi all,
         I'm having a very strange performance
problems on a fresh install of postgres 8.1.3
I've just installed it with default option and
--enable-thread-safety without tweaking config files yet.

The import of a small SQL files into the DB (6
tables with 166.500 total records, INSERT syntax)
took me more than 18 minutes as shown below
(output of  "time ./psql benchmarks < dump.sql")

real 18m33.062s
user 0m10.386s
sys 0m7.707s

The server is an
- Intel(R) Xeon(TM) CPU 3.60GHz - 1MB L2
- 1 GB RAM
- 2x HDD SCSI U320 RAID 1 Hardware (HP 6i controller)

The same import, tried on an another low-end
server with a fresh install of postgres 8.1.3 gave me:

real 2m4.497s
user 0m6.234s
sys 0m6.148s

During the test, the postmaster on the first
server (the slow one) uses only a 4% CPU, while
on the second one it reaches 50% cpu usage

I was thinking on a IO bandwidth saturation, but
"vmstat 1" during the import shows me small values for io/bo column

Searching the archive of the ml I found a Disk IO
test I suddenly ran on the slower server as follow

# time bash -c "dd if=/dev/zero of=bigfile bs=8k
count=200000 && sync" (write test)
# time dd if=bigfile of=/dev/null bs=8k (read test)

output of "vmstat 1" during the above test follows:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa

Write test
  0 11    540   2344  12152 863456    4    0   340 27848 1709   695  6 53  0 41
  0 11    540   2344  12180 863516    4    0    44 45500 1623   386  0  2  0 98
  0  5    540   3168  12200 862520    0    0   264 44888 1573   315  1  2  0 97

Read test
  0  2    440   2328   6076 849120    0    0 94552     0 1550   624  3 10  0 87
  0  2    440   2248   6104 848936    0    0 94508     0 1567   715  7 10  0 83
  0  3    440   2824   6148 847828    0    0
102540   448 1511   675 14 11  0 75

Values of io/(bi-bo) during the disk test are a
lot higher than during the import operation....

I really have no more clues .... :(

Do you have any ideas ?

Tnx in advance

Regards


Edoardo Serra
WeBRainstorm S.r.l.
IT, Internet services & consulting
Via Pio Foà 83/C
10126 Torino
Tel: +39 011 6966881


Re: Postmaster using only 4-5% CPU

From
"Markus Bertheau"
Date:
The low end server by chance doesn't have an IDE disk that lies about
write completion, or a battery backed disk controller? Try disabling
fsync on the new server to get comparable figures.

Markus Bertheau

2006/3/21, Edoardo Serra <osdevel@webrainstorm.it>:
> Hi all,
>          I'm having a very strange performance
> problems on a fresh install of postgres 8.1.3
> I've just installed it with default option and
> --enable-thread-safety without tweaking config files yet.
>
> The import of a small SQL files into the DB (6
> tables with 166.500 total records, INSERT syntax)
> took me more than 18 minutes as shown below
> (output of  "time ./psql benchmarks < dump.sql")
>
> real 18m33.062s
> user 0m10.386s
> sys 0m7.707s
>
> The server is an
> - Intel(R) Xeon(TM) CPU 3.60GHz - 1MB L2
> - 1 GB RAM
> - 2x HDD SCSI U320 RAID 1 Hardware (HP 6i controller)
>
> The same import, tried on an another low-end
> server with a fresh install of postgres 8.1.3 gave me:
>
> real 2m4.497s
> user 0m6.234s
> sys 0m6.148s
>
> During the test, the postmaster on the first
> server (the slow one) uses only a 4% CPU, while
> on the second one it reaches 50% cpu usage
>
> I was thinking on a IO bandwidth saturation, but
> "vmstat 1" during the import shows me small values for io/bo column
>
> Searching the archive of the ml I found a Disk IO
> test I suddenly ran on the slower server as follow
>
> # time bash -c "dd if=/dev/zero of=bigfile bs=8k
> count=200000 && sync" (write test)
> # time dd if=bigfile of=/dev/null bs=8k (read test)
>
> output of "vmstat 1" during the above test follows:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>   r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>
> Write test
>   0 11    540   2344  12152 863456    4    0   340 27848 1709   695  6 53  0 41
>   0 11    540   2344  12180 863516    4    0    44 45500 1623   386  0  2  0 98
>   0  5    540   3168  12200 862520    0    0   264 44888 1573   315  1  2  0 97
>
> Read test
>   0  2    440   2328   6076 849120    0    0 94552     0 1550   624  3 10  0 87
>   0  2    440   2248   6104 848936    0    0 94508     0 1567   715  7 10  0 83
>   0  3    440   2824   6148 847828    0    0
> 102540   448 1511   675 14 11  0 75
>
> Values of io/(bi-bo) during the disk test are a
> lot higher than during the import operation....
>
> I really have no more clues .... :(
>
> Do you have any ideas ?
>
> Tnx in advance
>
> Regards
>
>
> Edoardo Serra
> WeBRainstorm S.r.l.
> IT, Internet services & consulting
> Via Pio Foà 83/C
> 10126 Torino
> Tel: +39 011 6966881
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Postmaster using only 4-5% CPU

From
Guillaume Cottenceau
Date:
Edoardo Serra <osdevel 'at' webrainstorm.it> writes:

> Hi all,
>          I'm having a very strange performance problems on a fresh
> install of postgres 8.1.3
> I've just installed it with default option and --enable-thread-safety
> without tweaking config files yet.
>
> The import of a small SQL files into the DB (6 tables with 166.500
> total records, INSERT syntax)
> took me more than 18 minutes as shown below (output of  "time ./psql
> benchmarks < dump.sql")
>
> real 18m33.062s
> user 0m10.386s
> sys 0m7.707s
>
> The server is an
> - Intel(R) Xeon(TM) CPU 3.60GHz - 1MB L2
> - 1 GB RAM
> - 2x HDD SCSI U320 RAID 1 Hardware (HP 6i controller)

I have seen similar very low performance for INSERTs, although
using SCSI 320 disk, controlled by LSI Logic 53C1030 (using
Fusion MPT SCSI Host driver 3.01.18 on Linux 2.6.11). Something
like tens of INSERTs per second into a small table, no more.
"iostat" reports very large figures in the "await" field compared
to other servers using raid1 controllers, that's my best guess,
but I was unable to find why and how to fix (and the vendor has
been very helpless until now). I'm wondering if we don't have an
issue with the driver but have no more clue.

--
Guillaume Cottenceau

Re: Postmaster using only 4-5% CPU

From
Scott Marlowe
Date:
On Tue, 2006-03-21 at 06:46, Edoardo Serra wrote:
> Hi all,
>          I'm having a very strange performance
> problems on a fresh install of postgres 8.1.3
> I've just installed it with default option and
> --enable-thread-safety without tweaking config files yet.
>
> The import of a small SQL files into the DB (6
> tables with 166.500 total records, INSERT syntax)
> took me more than 18 minutes as shown below
> (output of  "time ./psql benchmarks < dump.sql")
>
> real 18m33.062s
> user 0m10.386s
> sys 0m7.707s
>
> The server is an
> - Intel(R) Xeon(TM) CPU 3.60GHz - 1MB L2
> - 1 GB RAM
> - 2x HDD SCSI U320 RAID 1 Hardware (HP 6i controller)
>
> The same import, tried on an another low-end
> server with a fresh install of postgres 8.1.3 gave me:
>
> real 2m4.497s
> user 0m6.234s
> sys 0m6.148s

Here's what's happening.  On the "fast" machine, you are almost
certainly using IDE drives.  PostgreSQL uses a system call called
"fsync" when writing data out.  It writes the data to the write ahead
logs, calls fsync, and waits for it to return.

fsync() tells the drive to flush its write buffers to disk and tell the
OS when it has completed this.

SCSI drives dutifully write out those buffers, and then, only after
they're written, tell the OS that yes, the data is written out.  Since
SCSI drives can do other things while this is going on, by using command
queueing, this is no great harm to performance, since the drive and OS
can transfer other data into / out of buffers during this fsync
operation.

Meanwhile, back in the jungle...  The machine with IDE drives operates
differently.  Most, if not all, IDE drives, when told by the OS to
fsync() tell the OS immediately that the fsync() call has completed, and
the data is written to the drive.  Shortly thereafter, the drive
actually commences to write the data out.  When it gets a chance.

The reason IDE drives do this is that until very recently, the IDE
interface allowed only one operation at a time to be "in flight" on an
interface / drive.

So, if the IDE drive really did write the data out, then report that it
was done, it would be much slower than the SCSI drive listed above,
because ALL operations on it would stop, waiting in line, for the caches
to flush to the platters.

For PostgreSQL, the way IDE drives operate is dangerous.  Write data
out, call fsync(), get an immediate return, mark the data as committed,
move on the next operation, operator trips over power cord / power
conditioner explodes, power supply dies, brown out causes the machine to
reboot, et. al., and when the machine comes up, PostgreSQL politely
informs you that your database is corrupt, and you come to the
pgsql-general group asking how to get your database back online.  Very
bad.

With SCSI drives, the same scenario results in a machine that comes
right back up and keeps on trucking.

So, what's happening to you is that on the machine with SCSI drives,
PostgreSQL, the OS, and the drives are operating properly, making sure
your data is secure, and, unfortunately, taking its sweet time doing
it.  Given that your .sql file is probably individual inserts without a
transaction, this is normal.

Try wrapping the inserts in the sql file in begin; / commit; statements,
like so:

begin;
insert into table ...
(100,000 inserts here)
insert into table ...
commit;

and it should fly.  And, if there's a single bad row, the whole import
rolls back.  Which means you don't have to figure out where the import
stopped or which rows did or didn't take.  You just fix the one or two
bad rows, and run the whole import again.

When a good friend of mine first started using PostgreSQL, he was a
total MySQL bigot.  He was importing a 10,000 row dataset, and made a
smartassed remark after 10 minutes how it would have imported in minutes
on MySQL.  It was a test database, so I had him stop the import, delete
all the imported rows, and wrap the whole import inside begin; and
commit;

The import took about 20 seconds or so.

Now, for the interesting test.  Run the import on both machines, with
the begin; commit; pairs around it.  Halfway through the import, pull
the power cord, and see which one comes back up.  Don't do this to
servers with data you like, only test machines, obviously.  For an even
more interesting test, do this with MySQL, Oracle, DB2, etc...

I've been amazed that the looks of horror I get for suggesting such a
test are about the same from an Oracle DBA as they are from a MySQL
DBA.  :)

Re: Postmaster using only 4-5% CPU

From
Edoardo Serra
Date:
At 18.44 21/03/2006, Scott Marlowe wrote:
>Here's what's happening.  On the "fast" machine, you are almost
>certainly using IDE drives.

Oh yes, the fast machine has IDE drives, you got it ;)

>Meanwhile, back in the jungle...  The machine with IDE drives operates
>differently.  Most, if not all, IDE drives, when told by the OS to
>fsync() tell the OS immediately that the fsync() call has completed, and
>the data is written to the drive.  Shortly thereafter, the drive
>actually commences to write the data out.  When it gets a chance.

I really didn't know this behaviour of IDE drives.
I was stracing the postmaster while investigating the problem and noticed
many fsync syscalls (one after each INSERT).

I was investigating on it but I didn't explain me why SCSI was slower.
You helped me a lot ;) tnx

>For PostgreSQL, the way IDE drives operate is dangerous.  Write data
>out, call fsync(), get an immediate return, mark the data as committed,
>move on the next operation, operator trips over power cord / power
>conditioner explodes, power supply dies, brown out causes the machine to
>reboot, et. al., and when the machine comes up, PostgreSQL politely
>informs you that your database is corrupt, and you come to the
>pgsql-general group asking how to get your database back online.  Very
>bad.

Yes, it sounds very bad... what about SATA drives ?
I heard about command queueing in SATA but I don't know if the kernel
handles it properly

>Try wrapping the inserts in the sql file in begin; / commit; statements,
>like so:
>
>begin;
>insert into table ...
>(100,000 inserts here)
>insert into table ...
>commit;
>
>and it should fly.

Oh, yes with the insert wrapped in a transaction the import time is as follows:
- SCSI: 35 secs
- IDE: 50 secs

>When a good friend of mine first started using PostgreSQL, he was a
>total MySQL bigot.  He was importing a 10,000 row dataset, and made a
>smartassed remark after 10 minutes how it would have imported in minutes
>on MySQL.  It was a test database, so I had him stop the import, delete
>all the imported rows, and wrap the whole import inside begin; and
>commit;
>
>The import took about 20 seconds or so.

;)

>Now, for the interesting test.  Run the import on both machines, with
>the begin; commit; pairs around it.  Halfway through the import, pull
>the power cord, and see which one comes back up.  Don't do this to
>servers with data you like, only test machines, obviously.  For an even
>more interesting test, do this with MySQL, Oracle, DB2, etc...

I will surely run a test like this ;)

Tnx a lot again for help

Regards

Edoardo Serra


Re: Postmaster using only 4-5% CPU

From
"Jim C. Nasby"
Date:
On Thu, Mar 23, 2006 at 10:14:24AM +0100, Edoardo Serra wrote:
> >Now, for the interesting test.  Run the import on both machines, with
> >the begin; commit; pairs around it.  Halfway through the import, pull
> >the power cord, and see which one comes back up.  Don't do this to
> >servers with data you like, only test machines, obviously.  For an even
> >more interesting test, do this with MySQL, Oracle, DB2, etc...
>
> I will surely run a test like this ;)

If you do, I'd be *very* interested in the results. Pervasive would
probably pay for a whitepaper about this, btw (see
http://www.pervasivepostgres.com/postgresql/partners_in_publishing.asp).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postmaster using only 4-5% CPU

From
Scott Marlowe
Date:
On Fri, 2006-03-24 at 04:16, Jim C. Nasby wrote:
> On Thu, Mar 23, 2006 at 10:14:24AM +0100, Edoardo Serra wrote:
> > >Now, for the interesting test.  Run the import on both machines, with
> > >the begin; commit; pairs around it.  Halfway through the import, pull
> > >the power cord, and see which one comes back up.  Don't do this to
> > >servers with data you like, only test machines, obviously.  For an even
> > >more interesting test, do this with MySQL, Oracle, DB2, etc...
> >
> > I will surely run a test like this ;)
>
> If you do, I'd be *very* interested in the results. Pervasive would
> probably pay for a whitepaper about this, btw (see
> http://www.pervasivepostgres.com/postgresql/partners_in_publishing.asp).

Hehe.  good luck with it.

At the last company I worked at I was the PostgreSQL DBA, and I could
not get one single Oracle, DB2, MySQL, MSSQL, Ingres, or other DBA to
agree to that kind of test.

6 months later, when all three power conditioners blew at once (amazing
what a 1/4" piece of wire can do, eh?) and we lost all power in our
hosting center, there was one, and only one, database server that came
back up without errors, and we know which one that was.  No other
database there was up in less than 2 hours.  So, I wandered the floor
watching the folks panic who were trying to bring their systems back
up.

And you know what?  They still didn't want to test their systems for
recovery from a power loss situation.