Thread: postgresql 8.3 tps rate

postgresql 8.3 tps rate

From
Ibrahim Harrani
Date:
Hi,

I am running postgresql 8.3.5 on FreeBSD with Dual core Intel(R)
Xeon(R) CPU 3065  @ 2.33GHz, 2GB RAM and Seagate Technology -
Barracuda 7200.10 SATA 3.0Gb/ (RAID 1).

I made several benchmark test with pgbench, TPS rate is almost 40 +/- 5.
$ pgbench -i pgbench -s 50 -U pgsql

[pgsql@$ pgbench -c 200 -t 2 -U pgsql -d pgbench

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 200
number of transactions per client: 2
number of transactions actually processed: 400/400
tps = 39.044088 (including connections establishing)
tps = 41.528795 (excluding connections establishing)

[pgsql@$ pgbench -c 100 -t 5 -U pgsql -d pgbench

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 100
number of transactions per client: 5
number of transactions actually processed: 500/500
tps = 30.162271 (including connections establishing)
tps = 30.643256 (excluding connections establishing)

Is this rate is normal or not? What can I do to improve tps and insert
performance?

Here is some changes made in postgresql.conf and sysctl.conf

/etc/sysctl.conf
#1024 MB shmmax
kern.ipc.shmmax=1073741824
#shmall = shmmax / 4096 (page size)
kern.ipc.shmall=262144
kern.ipc.semmsl=512
kern.ipc.semmap=256

postgresql.conf

shared_buffers = 800MB                  # min 128kB or max_connections*16kB
work_mem = 2MB                          # min 64kB
maintenance_work_mem = 32MB             # min 1MB
max_connections = 600                   # (change requires restart)
max_fsm_relations = 2000                # min 100, ~70 bytes each
synchronous_commit = off
wal_buffers = 1024kB                    # min 32kB
checkpoint_segments = 32                # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min              # range 30s-1h
checkpoint_completion_target = 0.9      # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s               # 0 is off

Re: postgresql 8.3 tps rate

From
Claus Guttesen
Date:
> I am running postgresql 8.3.5 on FreeBSD with Dual core Intel(R)
> Xeon(R) CPU 3065  @ 2.33GHz, 2GB RAM and Seagate Technology -
> Barracuda 7200.10 SATA 3.0Gb/ (RAID 1).
>
> I made several benchmark test with pgbench, TPS rate is almost 40 +/- 5.
> $ pgbench -i pgbench -s 50 -U pgsql
>
> [pgsql@$ pgbench -c 200 -t 2 -U pgsql -d pgbench
>
> transaction type: TPC-B (sort of)
> scaling factor: 10
> number of clients: 200
> number of transactions per client: 2
> number of transactions actually processed: 400/400
> tps = 39.044088 (including connections establishing)
> tps = 41.528795 (excluding connections establishing)
>
> [pgsql@$ pgbench -c 100 -t 5 -U pgsql -d pgbench
>
> transaction type: TPC-B (sort of)
> scaling factor: 10
> number of clients: 100
> number of transactions per client: 5
> number of transactions actually processed: 500/500
> tps = 30.162271 (including connections establishing)
> tps = 30.643256 (excluding connections establishing)
>
> Is this rate is normal or not? What can I do to improve tps and insert
> performance?

You add more and faster disks.

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

Re: postgresql 8.3 tps rate

From
"Joshua D. Drake"
Date:
On Thu, 2009-01-22 at 17:47 +0200, Ibrahim Harrani wrote:
> Hi,
>
> I am running postgresql 8.3.5 on FreeBSD with Dual core Intel(R)
> Xeon(R) CPU 3065  @ 2.33GHz, 2GB RAM and Seagate Technology -
> Barracuda 7200.10 SATA 3.0Gb/ (RAID 1).
>
> I made several benchmark test with pgbench, TPS rate is almost 40 +/- 5.
> $ pgbench -i pgbench -s 50 -U pgsql
>
> [pgsql@$ pgbench -c 200 -t 2 -U pgsql -d pgbench
>
> transaction type: TPC-B (sort of)
> scaling factor: 10
> number of clients: 200
> number of transactions per client: 2
> number of transactions actually processed: 400/400
> tps = 39.044088 (including connections establishing)
> tps = 41.528795 (excluding connections establishing)
>
> [pgsql@$ pgbench -c 100 -t 5 -U pgsql -d pgbench
>
> transaction type: TPC-B (sort of)
> scaling factor: 10
> number of clients: 100
> number of transactions per client: 5
> number of transactions actually processed: 500/500
> tps = 30.162271 (including connections establishing)
> tps = 30.643256 (excluding connections establishing)
>
> Is this rate is normal or not? What can I do to improve tps and insert
> performance?

Run a real benchmark. Running 400/500 transactions doesn't give you any
real indication of what is going on. Run 50000 or so and see how it
looks.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: postgresql 8.3 tps rate

From
Merlin Moncure
Date:
On 1/22/09, Ibrahim Harrani <ibrahim.harrani@gmail.com> wrote:
>
>  Is this rate is normal or not? What can I do to improve tps and insert
>  performance?
>
>  postgresql.conf
>
>  shared_buffers = 800MB                  # min 128kB or max_connections*16kB
>  work_mem = 2MB                          # min 64kB
>  maintenance_work_mem = 32MB             # min 1MB

I would raise maintenance_work_mem, although it's unrelated to your issue.

>  max_connections = 600                   # (change requires restart)
>  max_fsm_relations = 2000                # min 100, ~70 bytes each
>  synchronous_commit = off

Something is very wrong.  40tps is low, even for sata raid 1, but
absolutely terrible with synchronous_commit = off.  This suggests you
are problems are read related and not sync related.  Can you rerun
pgbench using the custom sql feature, passing insert statements?

Are you sure nothing else is going on/wrong with the box? Can we see a
bonnie++ run?

merlin

Re: postgresql 8.3 tps rate

From
Alvaro Herrera
Date:
Ibrahim Harrani escribió:

> I made several benchmark test with pgbench, TPS rate is almost 40 +/- 5.
> $ pgbench -i pgbench -s 50 -U pgsql
>
> [pgsql@$ pgbench -c 200 -t 2 -U pgsql -d pgbench

Try with 1000 transactions per client or more, instead of 2.

Also, I think you should set the "scale" in the prepare step (-i) at
least as high as the number of clients you're going to use.  (I dimly
recall some recent development in this area that might mean I'm wrong.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: postgresql 8.3 tps rate

From
Ibrahim Harrani
Date:
Hi Merlin,

Here is the bonnie++ and new pgbench result with high transaction numbers.


$ pgbench -i -s 30 -U pgsql pgbench
$ pbench -c 100 -t 1000 -U pgsql -d pgbench

transaction type: TPC-B (sort of)
scaling factor: 30
number of clients: 100
number of transactions per client: 1000
number of transactions actually processed: 100000/100000
tps = 45.145051 (including connections establishing)
tps = 45.162367 (excluding connections establishing)

$ bonnie++
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.

Version 1.93d       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
myserver 300M   391  97  9619   1  8537   2   673  99 +++++ +++  1196  16
Latency               211ms     388ms     325ms   27652us     722us    6720ms
Version 1.93d       ------Sequential Create------ --------Random Create--------
myserver    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  9004  25 +++++ +++ +++++ +++  8246  20 +++++ +++ +++++ +++
Latency               592ms     208us     102us     673ms     179us     100us

1.93c,1.93d,myserver,1,1232710758,300M,,391,97,9619,1,8537,2,673,99,+++++,+++,1196,16,16,,,,,9004,25,+++++,+++,+++++,+++,8246,20,+++++,+++,+++++,+++,211ms,388ms,325ms,27652us,722us,6720ms,592ms,208us,102us,673ms,179us,100us

When I compare my  bonnie++ result with the one at
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm,
It seems that there is something wrong with the disks!?

On Thu, Jan 22, 2009 at 7:14 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On 1/22/09, Ibrahim Harrani <ibrahim.harrani@gmail.com> wrote:
>>
>>  Is this rate is normal or not? What can I do to improve tps and insert
>>  performance?
>>
>>  postgresql.conf
>>
>>  shared_buffers = 800MB                  # min 128kB or max_connections*16kB
>>  work_mem = 2MB                          # min 64kB
>>  maintenance_work_mem = 32MB             # min 1MB
>
> I would raise maintenance_work_mem, although it's unrelated to your issue.
>
>>  max_connections = 600                   # (change requires restart)
>>  max_fsm_relations = 2000                # min 100, ~70 bytes each
>>  synchronous_commit = off
>
> Something is very wrong.  40tps is low, even for sata raid 1, but
> absolutely terrible with synchronous_commit = off.  This suggests you
> are problems are read related and not sync related.  Can you rerun
> pgbench using the custom sql feature, passing insert statements?
>
> Are you sure nothing else is going on/wrong with the box? Can we see a
> bonnie++ run?
>
> merlin
>

Re: postgresql 8.3 tps rate

From
Ibrahim Harrani
Date:
This is the another bonnie++ test result with version 1.03

Delete files in random order...done.
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
myserver 300M 13150   7 12713   1 13067   4 72426  53 +++++ +++ +++++ +++
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1048   2 +++++ +++  2322   3   985   2 +++++ +++  1797   3
myserver,300M,13150,7,12713,1,13067,4,72426,53,+++++,+++,+++++,+++,16,1048,2,+++++,+++,2322,3,985,2,+++++,+++,1797,3

Also I attached bon_csv2html output for both version of bonnie++

Thanks in advance.


On Thu, Jan 22, 2009 at 7:14 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On 1/22/09, Ibrahim Harrani <ibrahim.harrani@gmail.com> wrote:
>>
>>  Is this rate is normal or not? What can I do to improve tps and insert
>>  performance?
>>
>>  postgresql.conf
>>
>>  shared_buffers = 800MB                  # min 128kB or max_connections*16kB
>>  work_mem = 2MB                          # min 64kB
>>  maintenance_work_mem = 32MB             # min 1MB
>
> I would raise maintenance_work_mem, although it's unrelated to your issue.
>
>>  max_connections = 600                   # (change requires restart)
>>  max_fsm_relations = 2000                # min 100, ~70 bytes each
>>  synchronous_commit = off
>
> Something is very wrong.  40tps is low, even for sata raid 1, but
> absolutely terrible with synchronous_commit = off.  This suggests you
> are problems are read related and not sync related.  Can you rerun
> pgbench using the custom sql feature, passing insert statements?
>
> Are you sure nothing else is going on/wrong with the box? Can we see a
> bonnie++ run?
>
> merlin
>

Attachment

Re: postgresql 8.3 tps rate

From
David Rees
Date:
On Thu, Jan 22, 2009 at 1:27 PM, Ibrahim Harrani
<ibrahim.harrani@gmail.com> wrote:
> Version 1.93d       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> myserver 300M   391  97  9619   1  8537   2   673  99 +++++ +++  1196  16
> Latency               211ms     388ms     325ms   27652us     722us    6720ms
> Version 1.93d       ------Sequential Create------ --------Random Create--------
> myserver    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                 16  9004  25 +++++ +++ +++++ +++  8246  20 +++++ +++ +++++ +++
> Latency               592ms     208us     102us     673ms     179us     100us

You should be testing bonnie with a file size that is at least double
the amount of memory in your machine - in this case, 4GB files, not
300MB files.

> When I compare my  bonnie++ result with the one at
> http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm,
> It seems that there is something wrong with the disks!?

Yes, your machine appears to be very slow.  You should be able to
write in the order of 30-50MB/s+ and read in the order of 40-80MB/s+.
Random IO should be in the 200 tps range for a 7200rpm SATA RAID1.

-Dave

Re: postgresql 8.3 tps rate

From
Ibrahim Harrani
Date:
Hi David,

$ I run the test again with the following options. Also I added the
html output of the result.

$ bonnie++  -u pgsql -n 128 -r 2048 -s 4096 -x 1
Using uid:70, gid:70.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
myserver,4G,8028,5,8118,1,5079,1,36055,28,32950,3,128.4,0,128,5620,11,142084,99,88739,99,12880,26,109150,99,90362,99

What about this result?
This is a intel server with onboard raid. I will check raid
configuration again tomorrow. Especially  Write Cache and Read Ahead
values mentioned at
http://www.intel.com/support/motherboards/server/sb/CS-021019.htm

On Thu, Jan 22, 2009 at 11:41 PM, David Rees <drees76@gmail.com> wrote:
> On Thu, Jan 22, 2009 at 1:27 PM, Ibrahim Harrani
> <ibrahim.harrani@gmail.com> wrote:
>> Version 1.93d       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
>> myserver 300M   391  97  9619   1  8537   2   673  99 +++++ +++  1196  16
>> Latency               211ms     388ms     325ms   27652us     722us    6720ms
>> Version 1.93d       ------Sequential Create------ --------Random Create--------
>> myserver    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>                 16  9004  25 +++++ +++ +++++ +++  8246  20 +++++ +++ +++++ +++
>> Latency               592ms     208us     102us     673ms     179us     100us
>
> You should be testing bonnie with a file size that is at least double
> the amount of memory in your machine - in this case, 4GB files, not
> 300MB files.
>
>> When I compare my  bonnie++ result with the one at
>> http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm,
>> It seems that there is something wrong with the disks!?
>
> Yes, your machine appears to be very slow.  You should be able to
> write in the order of 30-50MB/s+ and read in the order of 40-80MB/s+.
> Random IO should be in the 200 tps range for a 7200rpm SATA RAID1.
>
> -Dave
>

Attachment

Re: postgresql 8.3 tps rate

From
Scott Marlowe
Date:
On Thu, Jan 22, 2009 at 3:29 PM, Ibrahim Harrani
<ibrahim.harrani@gmail.com> wrote:

> This is a intel server with onboard raid. I will check raid
> configuration again tomorrow. Especially  Write Cache and Read Ahead
> values mentioned at
> http://www.intel.com/support/motherboards/server/sb/CS-021019.htm

It would be good to use software RAID to see what kind of numbers you
get without the built in RAID.  Most built in RAID solutions are only
suitable for holding the OS and such.

Re: postgresql 8.3 tps rate

From
Craig James
Date:
David Rees wrote:
> On Thu, Jan 22, 2009 at 1:27 PM, Ibrahim Harrani
> <ibrahim.harrani@gmail.com> wrote:
>> Version 1.93d       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
>> myserver 300M   391  97  9619   1  8537   2   673  99 +++++ +++  1196  16
>> Latency               211ms     388ms     325ms   27652us     722us    6720ms
>> Version 1.93d       ------Sequential Create------ --------Random Create--------
>> myserver    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>                 16  9004  25 +++++ +++ +++++ +++  8246  20 +++++ +++ +++++ +++
>> Latency               592ms     208us     102us     673ms     179us     100us
>
> You should be testing bonnie with a file size that is at least double
> the amount of memory in your machine - in this case, 4GB files, not
> 300MB files.
>
>> When I compare my  bonnie++ result with the one at
>> http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm,
>> It seems that there is something wrong with the disks!?
>
> Yes, your machine appears to be very slow.  You should be able to
> write in the order of 30-50MB/s+ and read in the order of 40-80MB/s+.
> Random IO should be in the 200 tps range for a 7200rpm SATA RAID1.

Have you tried the really basic speed test?

  time (dd if=/dev/zero of=bigfile bs=8192 count=1000000; sync)

  time dd if=bigfile of=/dev/null bs=8192

Divide 8.2GB by the times reported.  On a single 10K SATA drive, I get about 55MB/sec write and 61 MB/sec read.

If you can't get similar numbers, then something is wrong.

Craig

Re: postgresql 8.3 tps rate

From
Greg Smith
Date:
On Thu, 22 Jan 2009, Alvaro Herrera wrote:

> Also, I think you should set the "scale" in the prepare step (-i) at
> least as high as the number of clients you're going to use.  (I dimly
> recall some recent development in this area that might mean I'm wrong.)

The idea behind that maxim (clients>=scale) is that locking on the smaller
tables will bottleneck resuls if you don't follow that advice.  It's a bit
messier than that though.  Increasing the scale will also make the
database larger, and once it gets bigger than available RAM your results
are going to dive hard because of that, more so than the locking would
have held you back.

All kind of irrelevant for Ibrahim's case, because if you're not getting
more than 50MB/s out of your disks the pgbench results are kind of moot
anyway--there's a larger problem to sort out first.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: postgresql 8.3 tps rate

From
Ibrahim Harrani
Date:
Hi Craig,

Here is the result. It seems that disk write is terrible!.

root@myserver /usr]#  time (dd if=/dev/zero of=bigfile bs=8192
count=1000000; sync)


1000000+0 records in
1000000+0 records out
8192000000 bytes transferred in 945.343806 secs (8665630 bytes/sec)

real    15m46.206s
user    0m0.368s
sys     0m15.560s
[root@myserver /usr]#

[root@myserver /usr]#  time dd if=bigfile of=/dev/null bs=8192
1000000+0 records in
1000000+0 records out
8192000000 bytes transferred in 174.646798 secs (46906099 bytes/sec)

real    2m54.663s
user    0m0.246s
sys     0m9.307s


On Fri, Jan 23, 2009 at 12:44 AM, Craig James
<craig_james@emolecules.com> wrote:
> David Rees wrote:
>>
>> On Thu, Jan 22, 2009 at 1:27 PM, Ibrahim Harrani
>> <ibrahim.harrani@gmail.com> wrote:
>>>
>>> Version 1.93d       ------Sequential Output------ --Sequential Input-
>>> --Random-
>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>>> --Seeks--
>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>>>  /sec %CP
>>> myserver 300M   391  97  9619   1  8537   2   673  99 +++++ +++  1196  16
>>> Latency               211ms     388ms     325ms   27652us     722us
>>>  6720ms
>>> Version 1.93d       ------Sequential Create------ --------Random
>>> Create--------
>>> myserver    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>>             files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
>>> %CP
>>>                16  9004  25 +++++ +++ +++++ +++  8246  20 +++++ +++ +++++
>>> +++
>>> Latency               592ms     208us     102us     673ms     179us
>>> 100us
>>
>> You should be testing bonnie with a file size that is at least double
>> the amount of memory in your machine - in this case, 4GB files, not
>> 300MB files.
>>
>>> When I compare my  bonnie++ result with the one at
>>> http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm,
>>> It seems that there is something wrong with the disks!?
>>
>> Yes, your machine appears to be very slow.  You should be able to
>> write in the order of 30-50MB/s+ and read in the order of 40-80MB/s+.
>> Random IO should be in the 200 tps range for a 7200rpm SATA RAID1.
>
> Have you tried the really basic speed test?
>
>  time (dd if=/dev/zero of=bigfile bs=8192 count=1000000; sync)
>
>  time dd if=bigfile of=/dev/null bs=8192
>
> Divide 8.2GB by the times reported.  On a single 10K SATA drive, I get about
> 55MB/sec write and 61 MB/sec read.
>
> If you can't get similar numbers, then something is wrong.
>
> Craig
>

Re: postgresql 8.3 tps rate

From
Craig James
Date:
Ibrahim Harrani wrote:
> Hi Craig,
>
> Here is the result. It seems that disk write is terrible!.
>
> root@myserver /usr]#  time (dd if=/dev/zero of=bigfile bs=8192
> count=1000000; sync)
>
>
> 1000000+0 records in
> 1000000+0 records out
> 8192000000 bytes transferred in 945.343806 secs (8665630 bytes/sec)
>
> real    15m46.206s
> user    0m0.368s
> sys     0m15.560s

So it's nothing to do with Postgres.  I'm no expert solving this sort of problem, but I'd start by looking for:

  - a rogue process that's using disk bandwidth (use vmstat when the system is idle)
  - system logs, maybe there are a zillion error messages
  - if you have a second disk, try its performance
  - if you don't have a second disk, buy one, install it, and try it
  - get another SATA controller and try that

Or do the reverse: Put the disk in a different computer (one that you've tested beforehand and verified is fast) and
seeif the problem follows the disk.  Same for the SATA card. 

It could be your SATA controller, the disk, some strange hdparm setting ... who knows?

I ran into this once a LONG time ago with a kernal that didn't recognize the disk or driver or something, and disabled
theDMA (direct-memory access) feature, which meant the CPU had to handle every single byte coming from the disk, which
ofcourse meant SLOW, plus you couldn't even type while the disk was busy.  A simple manual call to hdparm(1) to force
DMAon fixed it.  Weird stuff like that can be very hard to find. 

I also saw very low write speed once on a RAID device with a battery-backed cache, when the battery went dead.  The
RAIDcontroller went into its conservative mode, which for some reason was much slower than the disk's raw performance. 

Craig

Re: postgresql 8.3 tps rate

From
Scott Marlowe
Date:
On Thu, Jan 22, 2009 at 10:52 PM, Ibrahim Harrani
<ibrahim.harrani@gmail.com> wrote:
> Hi Craig,
>
> Here is the result. It seems that disk write is terrible!.
>
> root@myserver /usr]#  time (dd if=/dev/zero of=bigfile bs=8192
> count=1000000; sync)
>
>
> 1000000+0 records in
> 1000000+0 records out
> 8192000000 bytes transferred in 945.343806 secs (8665630 bytes/sec)

That's pretty horrific.  Can you try your disks without the
motherboard RAID controller in the way either singly or using software
RAID?

Re: postgresql 8.3 tps rate

From
Merlin Moncure
Date:
On 1/23/09, Ibrahim Harrani <ibrahim.harrani@gmail.com> wrote:
> Hi Craig,
>
>  Here is the result. It seems that disk write is terrible!.
>
>  root@myserver /usr]#  time (dd if=/dev/zero of=bigfile bs=8192
>  count=1000000; sync)

Note, while sequential write speeds are a good indication of general
raid crappyness, they are not the main driver of your low pgbench
results (buy they may be involved with poor insert performance)

That is coming from your seek performance, which is also lousy at ~130
according to bonnie.

merlin

Re: postgresql 8.3 tps rate

From
Greg Smith
Date:
On Fri, 23 Jan 2009, Merlin Moncure wrote:

> Note, while sequential write speeds are a good indication of general
> raid crappyness, they are not the main driver of your low pgbench
> results (buy they may be involved with poor insert performance) That is
> coming from your seek performance, which is also lousy at ~130 according
> to bonnie.

The bonnie seek test includes a write component to it:  "In 10% of cases,
it is dirtied and written back with write(2)."  Since ~130 isn't that much
worse than normal for a cheap 7200 RPM drive (near 200), it could easily
be the case that the awful performance on those writes is pulling down the
seek score.  Sequential writes are low on the scale of things that matter
in a database context, but if they're screwed up bad enough it can ripple
into more important areas.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
Greg Smith wrote:
> On Thu, 22 Jan 2009, Alvaro Herrera wrote:
>
>> Also, I think you should set the "scale" in the prepare step (-i) at
>> least as high as the number of clients you're going to use.  (I dimly
>> recall some recent development in this area that might mean I'm wrong.)
>
> The idea behind that maxim (clients>=scale) is that locking on the
> smaller tables will bottleneck resuls if you don't follow that advice.
> It's a bit messier than that though.  Increasing the scale will also
> make the database larger, and once it gets bigger than available RAM
> your results are going to dive hard because of that, more so than the
> locking would have held you back.
>
> All kind of irrelevant for Ibrahim's case, because if you're not getting
> more than 50MB/s out of your disks the pgbench results are kind of moot
> anyway--there's a larger problem to sort out first.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
IIRC this is a FreeBSD system, not Linux. Could there be some filesystem
performance issue here? I know zero about FreeBSD filesystems.

Also, is there a separate driver machine you can use to run pgbench? The
pgbench client uses resources, which could lower your throughput.

--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
[snip]

I'm actually doing some very similar testing and getting very similar
results. My disk is a single Seagate Barracuda 7200 RPM SATA (160 GB).
The OS is openSUSE 11.1 (2.6.27 kernel) with the "stock" PostgreSQL
8.3.5 RPM. I started out running pgbench on the same machine but just
moved the driver to another one trying to get better results. The other
driver is quite small -- 512 MB 1.6 GHz -- so I might need to shut down
the desktop and X on it.

But the real mystery is this: I have two XFS partitions. Let's call them
sda5 and sda6. The PostgreSQL install put the database in
/var/lib/pgsql, which is on sda5. But I created a tablespace on sda6
specifically for the database that pgbench is using, and put that
database there. At least that's what pgadmin3 is telling me I did. But
when I run pgbench, I see intense I/O on *both* partitions sometimes,
and other times I see it *only* on sda5.

I can understand the "both" times -- I didn't move any "system-level"
things like the write-ahead logs. But what I can't understand is the
periods when it isn't using sda6, where the tablespace is.

Anyhow, here's a short segment of the results I'm getting.

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 2
number of transactions per client: 100
number of transactions actually processed: 200/200
tps = 37.360964 (including connections establishing)
tps = 37.430501 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 4
number of transactions per client: 100
number of transactions actually processed: 400/400
tps = 51.768918 (including connections establishing)
tps = 51.985556 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 6
number of transactions per client: 100
number of transactions actually processed: 600/600
tps = 51.462103 (including connections establishing)
tps = 51.734119 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 8
number of transactions per client: 100
number of transactions actually processed: 800/800
tps = 44.316328 (including connections establishing)
tps = 44.473483 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 44.750672 (including connections establishing)
tps = 44.910703 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 12
number of transactions per client: 100
number of transactions actually processed: 1200/1200
tps = 45.048743 (including connections establishing)
tps = 45.205084 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 14
number of transactions per client: 100
number of transactions actually processed: 1400/1400
tps = 26.849217 (including connections establishing)
tps = 26.916643 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 16
number of transactions per client: 100
number of transactions actually processed: 1600/1600
tps = 11.187072 (including connections establishing)
tps = 11.198109 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 18
number of transactions per client: 100
number of transactions actually processed: 1800/1800
tps = 38.183978 (including connections establishing)
tps = 38.301026 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 35.012091 (including connections establishing)
tps = 35.109165 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 22
number of transactions per client: 100
number of transactions actually processed: 2200/2200
tps = 28.286106 (including connections establishing)
tps = 28.350341 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 100
number of transactions actually processed: 2400/2400
tps = 29.285593 (including connections establishing)
tps = 29.358284 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 26
number of transactions per client: 100
number of transactions actually processed: 2600/2600
tps = 29.237558 (including connections establishing)
tps = 29.308422 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 28
number of transactions per client: 100
number of transactions actually processed: 2800/2800
tps = 35.251509 (including connections establishing)
tps = 35.351999 (excluding connections establishing)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 29.790523 (including connections establishing)
tps = 29.863336 (excluding connections establishing)


I'm going to move the pgbench database back to the main sda5 partition
-- given the "both" periods, the seeks must be killing me. This is
actually not a benchmark, but a way of generating sample "blktrace"
data, so when it's finally working, I'll have some block-layer traces to
tell me exactly what's going on. :)

--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Re: postgresql 8.3 tps rate

From
Greg Smith
Date:
On Sun, 25 Jan 2009, M. Edward (Ed) Borasky wrote:

> I started out running pgbench on the same machine but just
> moved the driver to another one trying to get better results.

That normally isn't necessary until you get to the point where you're
running thousands of transactions per second.  The CPU load of the pgbench
isn't that big, and moving it to a network client does add its own
overhead.

> I can understand the "both" times -- I didn't move any "system-level"
> things like the write-ahead logs. But what I can't understand is the
> periods when it isn't using sda6, where the tablespace is.

Writes to the database are buffered by the shared_buffers mechanism.  If
you dirty a block, it has to be unused for a bit after that before it will
be written out.  The OS also buffers writes to the database disk.  The
combination of the two means that you can write things sometimes that
won't turn into physical disk I/O to the database for a while.  That is
not true of the WAL, which will always be generating activity when running
pgbench.

> number of transactions actually processed: 3000/3000

Generally, pgbench results start to be useful when you've pushed through
around 100,000 transactions or run for a few minutes.  It looks like your
largest client test might be approaching that threshold only because the
TPS rate is so slow.

I'm not sure what is going on with your system, but the advice showing up
earlier in this thread is well worth heeding here:  if you haven't
thoroughly proven that your disk setup works as expected on simple I/O
tests such as dd and bonnie++, you shouldn't be running pgbench yet.
It's not a tranparent benchmark unless you really understand what it's
doing, and you can waste endless time chasing phantom database setup
problems that way when you should be staring at hardware, driver, or OS
level ones instead.  Do you know the disks are working as they should
here?  Does the select-only pgbench give you reasonable results?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
Greg Smith wrote:

> I'm not sure what is going on with your system, but the advice showing
> up earlier in this thread is well worth heeding here:  if you haven't
> thoroughly proven that your disk setup works as expected on simple I/O
> tests such as dd and bonnie++, you shouldn't be running pgbench yet.
> It's not a tranparent benchmark unless you really understand what it's
> doing, and you can waste endless time chasing phantom database setup
> problems that way when you should be staring at hardware, driver, or OS
> level ones instead.  Do you know the disks are working as they should
> here?  Does the select-only pgbench give you reasonable results?

Actually, this isn't so much a 'pgbench' exercise as it is a source of
'real-world application' data for my Linux I/O performance visualization
tools. I've done 'iozone' tests, though not recently. But what I'm
building is an I/O analysis toolset, not a database application. So I am
"staring at hardware, driver or OS level" issues. :) To be more precise,
I'm using block I/O layer tools, which are "beneath" the filesystem
layer but "above" the driver and hardware levels.

What you might find interesting is that, when I presented the earlier
(iozone) test results at the Computer Measurement Group meeting in Las
Vegas in December, there were two disk drive engineers in the audience,
from, IIRC, Fujitsu. When they saw my results showing all four Linux
schedulers yielding essentially the same performance metrics using some
fairly tight statistical significance tests, they told me that it was
because the drive was re-ordering operations according to its own
internal scheduler!

I haven't had a chance to investigate that in any detail yet, but I
assume that they knew what they were talking about. The drive in
question is an off-the-shelf unit that I got at CompUSA as part of a
system that I had them build. In any event, it's *not* a "server-grade
I/O subsystem", it's a single disk drive designed for "home desktop PC"
use cases. In short, I don't expect server-grade TPS values.

I did capture some 'iostat' data after I moved the 'pgbench' database
back into the main partition where the rest PostgreSQL database lives.
As I expected, the device and partition utilizations were in the high 90
percent range.

I don't have the bandwidth figures from 'iostat' handy, but if the
utilization is 98.4 percent, they may be the best I can get out of the
drive with the xfs filesystem and the cfq scheduler. And the choice of
scheduler might not matter. And the choice of filesystem might not
matter. I may be getting all the drive can do.

--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Re: postgresql 8.3 tps rate

From
Matthew Wakeling
Date:
On Sun, 25 Jan 2009, M. Edward (Ed) Borasky wrote:
> Actually, this isn't so much a 'pgbench' exercise as it is a source of
> 'real-world application' data for my Linux I/O performance visualization
> tools. I've done 'iozone' tests, though not recently. But what I'm
> building is an I/O analysis toolset, not a database application.

Are these performance results when the analysis tools are active or
inactive? Have you investigated whether the analysis tools might be
slowing the I/O down at all?

> ...they told me that it was because the drive was re-ordering operations
> according to its own internal scheduler!

A modern SATA drive with native queuing will do that. SCSI drives have
been doing that for twenty years or so.

Matthew

--
 I pause for breath to allow you to get over your shock that I really did cover
 all that in only five minutes...        -- Computer Science Lecturer

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
Matthew Wakeling wrote:
> On Sun, 25 Jan 2009, M. Edward (Ed) Borasky wrote:
>> Actually, this isn't so much a 'pgbench' exercise as it is a source of
>> 'real-world application' data for my Linux I/O performance visualization
>> tools. I've done 'iozone' tests, though not recently. But what I'm
>> building is an I/O analysis toolset, not a database application.
>
> Are these performance results when the analysis tools are active or
> inactive? Have you investigated whether the analysis tools might be
> slowing the I/O down at all?

There is some overhead with the finest-granularity data capture tool,
blktrace. I haven't measured it yet but it is supposedly "about two
percent" according to its creators. And the trace files it creates are
shipped to a collection server over the network -- they are not stored
on the server under test.

The other tool set, sysstat, simply reads out counters from /proc. The
last time I looked at them they were using minuscule amounts of
processor and RAM and doing no disk I/O. But there are (slightly) more
efficient ways of getting the required counters from /proc, and blktrace
can actually reconstruct the I/O counters. People run *production*
servers with much more intrusive performance logging tool sets than what
I am using. :)

>
>> ...they told me that it was because the drive was re-ordering
>> operations according to its own internal scheduler!
>
> A modern SATA drive with native queuing will do that. SCSI drives have
> been doing that for twenty years or so.

At the CMG meeting I asked the disk drive engineers, "well, if the
drives are doing the scheduling, why does Linux go to all the trouble?"
Their answer was something like, "smart disk drives are a relatively
recent invention." But

a. SANs have been smart for years, and people with I/O intensive
workloads use SANs designed for them if they can afford them. Linux even
has a "no-op" scheduler you can use if your SAN is doing a better job.

b. The four I/O schedulers in the 2.6 kernel are relatively recent. I
can go dig up the exact release dates on the web, but their first
appearance in Red Hat was RHEL 4 / 2.6.9, and I think they've been
re-written once since then.

c. If SCSI drives have been doing their own scheduling for 20 years,
that's even less incentive for Linux to do more than just provide
efficient SCSI drivers. I've never gotten down into the nitty-gritty of
SCSI, and I'm not sure there's any reason to do so now, given that other
protocols seem to be taking the lead.
>
> Matthew
>


--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Re: postgresql 8.3 tps rate

From
Craig Ringer
Date:
M. Edward (Ed) Borasky wrote:

> At the CMG meeting I asked the disk drive engineers, "well, if the
> drives are doing the scheduling, why does Linux go to all the trouble?"

One big reason is that Linux knows more about the relative importance of
I/O operations than the individual drives do. Linux's scheduler supports
priorities in its queuing, and is able to do things like try to perform
small blocking reads before long non-blocking writes in order to improve
overall system throughput.

The individual drives just don't have the knowledge to make these
decisions. All they can really do is try to fetch the set of blocks
currently requested in the smallest possible time. This can even be
harmful if, in order to improve overall throughput, the drive reorders
what to the OS is a very high priority operation and performs it later
than it would've if following the request queue directly.

I've had to modify the 3ware driver to set very shallow request queues
for similar reasons. It was using very deep queuing and the controller
seems to do very stupid request queuing where it maintains a single deep
per-controller queue rather than per-array queues, so I was getting
AWFUL request latencies on high priority small reads when bulk I/O was
in progress elsewhere - even on another array on the same controller.

I've also had to turn NCQ off on SATA disks before because it was
foiling Linux's I/O priority management, so tasks with low ionice levels
still ended up rather severely impacting system responsiveness.

Hopefully these devices will end up with the smarts to manage I/O in a
priority queue where read latency is considered as well as throughput.

> Their answer was something like, "smart disk drives are a relatively
> recent invention." But

--
Craig Ringer

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
Craig Ringer wrote:
> M. Edward (Ed) Borasky wrote:
>
>> At the CMG meeting I asked the disk drive engineers, "well, if the
>> drives are doing the scheduling, why does Linux go to all the trouble?"
>
> One big reason is that Linux knows more about the relative importance of
> I/O operations than the individual drives do. Linux's scheduler supports
> priorities in its queuing, and is able to do things like try to perform
> small blocking reads before long non-blocking writes in order to improve
> overall system throughput.
>
> The individual drives just don't have the knowledge to make these
> decisions. All they can really do is try to fetch the set of blocks
> currently requested in the smallest possible time. This can even be
> harmful if, in order to improve overall throughput, the drive reorders
> what to the OS is a very high priority operation and performs it later
> than it would've if following the request queue directly.
>
> I've had to modify the 3ware driver to set very shallow request queues
> for similar reasons. It was using very deep queuing and the controller
> seems to do very stupid request queuing where it maintains a single deep
> per-controller queue rather than per-array queues, so I was getting
> AWFUL request latencies on high priority small reads when bulk I/O was
> in progress elsewhere - even on another array on the same controller.
>
> I've also had to turn NCQ off on SATA disks before because it was
> foiling Linux's I/O priority management, so tasks with low ionice levels
> still ended up rather severely impacting system responsiveness.
>
> Hopefully these devices will end up with the smarts to manage I/O in a
> priority queue where read latency is considered as well as throughput.
>
>> Their answer was something like, "smart disk drives are a relatively
>> recent invention." But
>
> --
> Craig Ringer
>

Thanks!! Is there a howto somewhere on disabling this on a Seagate
Barracuda? I'm in the middle of rebuilding my tools and analysis
algorithms as a "real open source project", and I'd love to be able to
have some working examples that people without access to industrial
strength I/O subsystems can use. :)


--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Re: postgresql 8.3 tps rate

From
Greg Smith
Date:
On Mon, 26 Jan 2009, M. Edward (Ed) Borasky wrote:

> Is there a howto somewhere on disabling this on a Seagate Barracuda?

http://inferno.slug.org/cgi-bin/wiki?Western_Digital_NCQ is a good
discussion of disabling NCQ support under Linux (both in user-space and
directly in the kernel itself).  A good summary of the problems with the
WD NCQ support alluded to there are at http://lkml.org/lkml/2007/4/3/159
if you're curious.

It's a tough time to be picking up inexpensive consumer SATA disks right
now.  Seagate's drive reliability has been falling hard the last couple of
years, but all the WD drives I've started trying out instead have just
awful firmware.  At last they're all cheap I guess.

P.S. I have several of the same basic Seagate drive you have (160GB, even
bought at CompUSA!) and would expect at least 2-3X better pgbench results
than you're seeing.  I realized that I've never actually run that test
without first tweaking the postgresql.conf
(shared_buffers,checkpoint_segments) so that may be part of it.  One of my
systems here has just one of those disk in it, next time I boot that up
I'll see what results I get with an untuned config.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
Greg Smith wrote:
> It's a tough time to be picking up inexpensive consumer SATA disks right
> now.  Seagate's drive reliability has been falling hard the last couple
> of years, but all the WD drives I've started trying out instead have
> just awful firmware.  At last they're all cheap I guess.

I also have a 250 GB drive in another machine that I need to move to the
main desktop. I don't remember the model of it -- it's been shut down
for a year. :( Given that drive, I really don't have a need to buy a
disk drive right now, and I'm saving my pennies for an
industrial-strength laptop that I can dual-boot Windows 64-bit and
Linux. I've gotten amazing use out of my ancient Compaq Presario 2110US,
considering it's a "home unit", but it's heavy, slow and has only 512 MB
of RAM.

> P.S. I have several of the same basic Seagate drive you have (160GB,
> even bought at CompUSA!) and would expect at least 2-3X better pgbench
> results than you're seeing.  I realized that I've never actually run
> that test without first tweaking the postgresql.conf
> (shared_buffers,checkpoint_segments) so that may be part of it.  One of
> my systems here has just one of those disk in it, next time I boot that
> up I'll see what results I get with an untuned config.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
Thanks!! I'm just getting into the PostgreSQL tuning part of things.

--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Re: postgresql 8.3 tps rate

From
Ron Mayer
Date:
M. Edward (Ed) Borasky wrote:
> At the CMG meeting I asked the disk drive engineers, "well, if the
> drives are doing the scheduling, why does Linux go to all the trouble?"
> Their answer was something like, "smart disk drives are a relatively
> recent invention." But

One more reason?
I imagine the disk drives wouldn't be aware of IO scheduling classes
and priorities ( http://linux.die.net/man/1/ionice ), while I imagine
the linux I/O schedulers should.

Re: postgresql 8.3 tps rate

From
"M. Edward (Ed) Borasky"
Date:
Ron Mayer wrote:
> M. Edward (Ed) Borasky wrote:
>> At the CMG meeting I asked the disk drive engineers, "well, if the
>> drives are doing the scheduling, why does Linux go to all the trouble?"
>> Their answer was something like, "smart disk drives are a relatively
>> recent invention." But
>
> One more reason?
> I imagine the disk drives wouldn't be aware of IO scheduling classes
> and priorities ( http://linux.die.net/man/1/ionice ), while I imagine
> the linux I/O schedulers should.
>
Thanks!! I see I have much to learn. Jens Axboe is one of my heroes, BTW
-- he's half of the blktrace team. :)

--
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.