Thread: Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

From
Joost Kraaijeveld
Date:
Hi,

I am experiencing very long update queries and I want to know if it
reasonable to expect them to perform better.

The query below is running for more than 1.5 hours (5500 seconds) now,
while the rest of the system does nothing (I don't even type or move a
mouse...).

- Is that to be expected?
- Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, given
the fact that fsync is off?  (Note: with bonnie++ I get write
performance > 50 MB/sec and read performace > 70 MB/sec with > 2000
read/write ops /sec?
- Does anyone else have any experience with the 3Ware RAID controller
(which is my suspect)?
- Any good idea how to determine the real botleneck if this is not the
performance I can expect?

My hard- and software:

- PostgreSQL 8.0.3
- Debian 3.1 (Sarge) AMD64
- Dual Opteron
- 4GB RAM
- 3ware Raid5 with 5 disks

Pieces of my postgresql.conf (All other is default):
shared_buffers = 7500
work_mem = 260096
fsync=false
effective_cache_size = 32768



The query with explain (amount and orderbedrag_valuta are float8,
ordernummer and ordernumber int4):

explain update prototype.orders set amount =
odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber =
odbc.orders.ordernummer;
                                 QUERY PLAN
-----------------------------------------------------------------------------
Hash Join  (cost=50994.74..230038.17 rows=1104379 width=466)
   Hash Cond: ("outer".ordernumber = "inner".ordernummer)
   ->  Seq Scan on orders  (cost=0.00..105360.68 rows=3991868 width=455)
   ->  Hash  (cost=48233.79..48233.79 rows=1104379 width=15)
         ->  Seq Scan on orders  (cost=0.00..48233.79 rows=1104379
width=15)


Sample output from iostat during query (about avarage):
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
hdc               0.00         0.00         0.00          0          0
sda               0.00         0.00         0.00          0          0
sdb             187.13        23.76      8764.36         24       8852


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

From
Tom Lane
Date:
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes:
> I am experiencing very long update queries and I want to know if it
> reasonable to expect them to perform better.

Does that table have any triggers that would fire on the update?

            regards, tom lane

Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
On Sun, 2005-11-06 at 12:17 -0500, Tom Lane wrote:
> Does that table have any triggers that would fire on the update?
Alas, no trigger, constrainst, foreign keys, indixes (have I forgotten
something?)

All queries are slow. E.g (after vacuum):

select objectid from prototype.orders

Explain analyse (with PgAdmin):

Seq Scan on orders  (cost=0.00..58211.79 rows=1104379 width=40) (actual
time=441.971..3252.698 rows=1104379 loops=1)
Total runtime: 5049.467 ms

Actual execution time: 82163 MS (without getting the data)


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Tom Lane
Date:
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes:
> Explain analyse (with PgAdmin):
> ...
> Total runtime: 5049.467 ms
> Actual execution time: 82163 MS (without getting the data)

I'm confused --- where's the 82sec figure coming from, exactly?

We've heard reports of performance issues in PgAdmin with large
result sets ... if you do the same query in psql, what happens?

            regards, tom lane

Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
Hi Tom,

On Sun, 2005-11-06 at 15:26 -0500, Tom Lane wrote:
> I'm confused --- where's the 82sec figure coming from, exactly?
From actually executing the query.

From PgAdmin:

-- Executing query:
select objectid from prototype.orders

Total query runtime: 78918 ms.
Data retrieval runtime: 188822 ms.
1104379 rows retrieved.


> We've heard reports of performance issues in PgAdmin with large
> result sets ... if you do the same query in psql, what happens?
jkr@Panoramix:~/postgresql$ time psql muntdev -c "select objectid from
prototype.orders" > output.txt

real    0m5.554s
user    0m1.121s
sys     0m0.470s


Now *I* am confused. What does PgAdmin do more than giving the query to
the database?

(BTW: I have repeated both measurements and the numbers above were all
from the last measurement I did and are about average)

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Christopher Kings-Lynne
Date:
> Now *I* am confused. What does PgAdmin do more than giving the query to
> the database?

It builds it into the data grid GUI object.

Chris


Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
On Mon, 2005-11-07 at 12:37 +0800, Christopher Kings-Lynne wrote:
> > Now *I* am confused. What does PgAdmin do more than giving the query to
> > the database?
>
> It builds it into the data grid GUI object.

Is that not the difference between the total query runtime and the data
retrieval runtime (see below)?

-- Executing query:
select objectid from prototype.orders

Total query runtime: 78918 ms.
Data retrieval runtime: 188822 ms.
1104379 rows retrieved.
--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
Hi Christopher,

On Mon, 2005-11-07 at 12:37 +0800, Christopher Kings-Lynne wrote:
> > Now *I* am confused. What does PgAdmin do more than giving the query to
> > the database?
>
> It builds it into the data grid GUI object.
But my initial question was about a query that does not produce data at
all (well, a response from the server saying it is finished). I broke
that query off after several hours.

I am now running the query from my initial question with psql (now for
>1 hour, in a transaction, fsyn off).

Some statistics :

uptime:
06:35:55 up  9:47,  6 users,  load average: 7.08, 7.21, 6.08

iostat -x -k 1 (this output appears to be representative):

avg-cpu:  %user   %nice    %sys %iowait   %idle
           1.00    0.00    0.50   98.51    0.00

Device:        sda        sdb

rrqm/s        0.00        0.00
wrqm/s        14.00        611.00
r/s        0.00        1.00
w/s        3.00        201.00
rsec/s        0.00        32.00
wsec/s        136.00        6680.00
rkB/s        0.00        16.00
wkB/s        68.00        3340.00
avgrq-sz    45.33        33.23
avgqu-sz    0.00        145.67
await        0.67        767.19
svctm        0.67        4.97
%util        0.20        100.30


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

From
Alex Turner
Date:
Where are the pg_xlog and data directories with respect to each other?
 From this IOStat it looks like they might be on the same partition,
which is not ideal, and actualy surprising that throughput is this
good.  You need to seperate pg_xlog and data directories to get any
kind of reasonable performance.  Also don't use RAID 5 - RAID 5 bites,
no really - it bites. Use multiple RAID 1s, or RAID 10s, you will get
better performance.  50MB/70MB is about the same as you get from a
single disk or a RAID 1.

We use 2x9506S8MI controlers, and have maintained excellent
performance with 2xRAID 10 and 2xRAID 1.  Make sure you get the
firmware update if you have these controllers though.

Alex Turner
NetEconomist

On 11/6/05, Joost Kraaijeveld <J.Kraaijeveld@askesis.nl> wrote:
> Hi,
>
> I am experiencing very long update queries and I want to know if it
> reasonable to expect them to perform better.
>
> The query below is running for more than 1.5 hours (5500 seconds) now,
> while the rest of the system does nothing (I don't even type or move a
> mouse...).
>
> - Is that to be expected?
> - Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, given
> the fact that fsync is off?  (Note: with bonnie++ I get write
> performance > 50 MB/sec and read performace > 70 MB/sec with > 2000
> read/write ops /sec?
> - Does anyone else have any experience with the 3Ware RAID controller
> (which is my suspect)?
> - Any good idea how to determine the real botleneck if this is not the
> performance I can expect?
>
> My hard- and software:
>
> - PostgreSQL 8.0.3
> - Debian 3.1 (Sarge) AMD64
> - Dual Opteron
> - 4GB RAM
> - 3ware Raid5 with 5 disks
>
> Pieces of my postgresql.conf (All other is default):
> shared_buffers = 7500
> work_mem = 260096
> fsync=false
> effective_cache_size = 32768
>
>
>
> The query with explain (amount and orderbedrag_valuta are float8,
> ordernummer and ordernumber int4):
>
> explain update prototype.orders set amount =
> odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber =
> odbc.orders.ordernummer;
>                                  QUERY PLAN
> -----------------------------------------------------------------------------
> Hash Join  (cost=50994.74..230038.17 rows=1104379 width=466)
>    Hash Cond: ("outer".ordernumber = "inner".ordernummer)
>    ->  Seq Scan on orders  (cost=0.00..105360.68 rows=3991868 width=455)
>    ->  Hash  (cost=48233.79..48233.79 rows=1104379 width=15)
>          ->  Seq Scan on orders  (cost=0.00..48233.79 rows=1104379
> width=15)
>
>
> Sample output from iostat during query (about avarage):
> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> hdc               0.00         0.00         0.00          0          0
> sda               0.00         0.00         0.00          0          0
> sdb             187.13        23.76      8764.36         24       8852
>
>
> --
> Groeten,
>
> Joost Kraaijeveld
> Askesis B.V.
> Molukkenstraat 14
> 6524NB Nijmegen
> tel: 024-3888063 / 06-51855277
> fax: 024-3608416
> e-mail: J.Kraaijeveld@Askesis.nl
> web: www.askesis.nl
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

Re: Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

From
Dave Cramer
Date:
Joost,

I've got experience with these controllers and which version do you
have. I'd expect to see higher than 50MB/s although I've never tried
RAID 5

I routinely see closer to 100MB/s with RAID 1+0 on their 9000 series

I would also suggest that shared buffers should be higher than 7500,
closer to 30000, and effective cache should be up around 200k

work_mem is awfully high, remember that this will be given to each
and every connection and can be more than 1x this number per
connection depending on the number of sorts
done in the query.

fsync=false ? I'm not even sure why we have this option, but I'd
never set it to false.

Dave

On 6-Nov-05, at 8:30 AM, Joost Kraaijeveld wrote:

> Hi,
>
> I am experiencing very long update queries and I want to know if it
> reasonable to expect them to perform better.
>
> The query below is running for more than 1.5 hours (5500 seconds) now,
> while the rest of the system does nothing (I don't even type or move a
> mouse...).
>
> - Is that to be expected?
> - Is 180-200 tps with ~ 9000 KB (see output iostat below) not low,
> given
> the fact that fsync is off?  (Note: with bonnie++ I get write
> performance > 50 MB/sec and read performace > 70 MB/sec with > 2000
> read/write ops /sec?
> - Does anyone else have any experience with the 3Ware RAID controller
> (which is my suspect)?
> - Any good idea how to determine the real botleneck if this is not the
> performance I can expect?
>
> My hard- and software:
>
> - PostgreSQL 8.0.3
> - Debian 3.1 (Sarge) AMD64
> - Dual Opteron
> - 4GB RAM
> - 3ware Raid5 with 5 disks
>
> Pieces of my postgresql.conf (All other is default):
> shared_buffers = 7500
> work_mem = 260096
> fsync=false
> effective_cache_size = 32768
>
>
>
> The query with explain (amount and orderbedrag_valuta are float8,
> ordernummer and ordernumber int4):
>
> explain update prototype.orders set amount =
> odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber =
> odbc.orders.ordernummer;
>                                  QUERY PLAN
> ----------------------------------------------------------------------
> -------
> Hash Join  (cost=50994.74..230038.17 rows=1104379 width=466)
>    Hash Cond: ("outer".ordernumber = "inner".ordernummer)
>    ->  Seq Scan on orders  (cost=0.00..105360.68 rows=3991868
> width=455)
>    ->  Hash  (cost=48233.79..48233.79 rows=1104379 width=15)
>          ->  Seq Scan on orders  (cost=0.00..48233.79 rows=1104379
> width=15)
>
>
> Sample output from iostat during query (about avarage):
> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> hdc               0.00         0.00         0.00          0          0
> sda               0.00         0.00         0.00          0          0
> sdb             187.13        23.76      8764.36         24       8852
>
>
> --
> Groeten,
>
> Joost Kraaijeveld
> Askesis B.V.
> Molukkenstraat 14
> 6524NB Nijmegen
> tel: 024-3888063 / 06-51855277
> fax: 024-3608416
> e-mail: J.Kraaijeveld@Askesis.nl
> web: www.askesis.nl
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly
>


Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
Hi Dave,

On Mon, 2005-11-14 at 18:51 -0500, Dave Cramer wrote:
> Joost,
>
> I've got experience with these controllers and which version do you
> have. I'd expect to see higher than 50MB/s although I've never tried
> RAID 5
>
> I routinely see closer to 100MB/s with RAID 1+0 on their 9000 series
OK, than there must be hope.

> I would also suggest that shared buffers should be higher than 7500,
> closer to 30000, and effective cache should be up around 200k
In my current 8.1 situation I use shared_buffers = 40000,
effective_cache_size = 131072 .

> work_mem is awfully high, remember that this will be given to each
> and every connection and can be more than 1x this number per
> connection depending on the number of sorts
> done in the query.
I use such a high number because I am the only user querying and my
queries do sorted joins etc.


> fsync=false ? I'm not even sure why we have this option, but I'd
> never set it to false.
I want as much speed as possible for a database conversion that MUST be
handled in 1 weekend (it lasts now, with the current speed almost 7
centuries. I may be off a millenium). If it fails because of hardware
problem (the only reason we want and need fsync?) we will try next
weekend until it finally goes right.

What I can see is that only the *write* performance of *long updates*
(and not inserts) are slow and they get slower in time: the first few
thousand go relatively fast, after that PostgreSQL crawls to a halt
(other "benchmarks" like bonnie++ or just dd'ing a big file don't have
this behavior).

I did notice that changing the I/O scheduler's nr_request from the
default 128 to 1024 or even 4096 made a remarkable performance
improvement. I suspect that experimenting with other I/O schedululers
could improve performance. But it is hard to find any useful
documentation about I/O schedulers.



--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
"Luke Lonergan"
Date:
Joost,

On 11/15/05 8:35 AM, "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> wrote:

thousand go relatively fast, after that PostgreSQL crawls to a halt
(other "benchmarks" like bonnie++ or just dd'ing a big file don't have
this behavior).

With RAID5, it could matter a lot what block size you run your “dd bigfile” test with.  You should run “dd if=/dev/zero of=bigfile bs=8k count=500000” for a 2GB main memory machine, multiply the count by (<your mem>/2GB).

It is very important with the 3Ware cards to match the driver to the firmware revision.
   
I did notice that changing the I/O scheduler's nr_request from the
default 128 to 1024 or even 4096 made a remarkable performance
improvement. I suspect that experimenting with other I/O schedululers
could improve performance. But it is hard to find any useful
documentation about I/O schedulers.

You could try deadline, there’s no harm, but I’ve found that when you reach the point of experimenting with schedulers, you are probably not addressing the real problem.

On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should get 100MB/s write rate, which is double what Postgres can use.  We find that Postgres, even with fsync=false, will only run at a net COPY speed of about 8-12 MB/s, where 12 is the Bizgres number.  8.1 might do 10.  But to get the 10 or 12, the WAL writing and other writing is about 4-5X more than the net write speed, or the speed at which the input file is parsed and read into the database.

So, if you can get your “dd bigfile” test to write data at 50MB/s+ with a blocksize of 8KB, you should be doing well enough.

Incidentally, we also find that using the XFS filesystem and setting the readahead to 8MB or more is extremely beneficial for performance with the 3Ware cards (and with others, but especially for the older 3Ware cards).
 
Regards,

- Luke

Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
Hi Luke,

On Tue, 2005-11-15 at 10:42 -0800, Luke Lonergan wrote:
> With RAID5, it could matter a lot what block size you run your “dd
> bigfile” test with.  You should run “dd if=/dev/zero of=bigfile bs=8k
> count=500000” for a 2GB main memory machine, multiply the count by
> (<your mem>/2GB).
If I understand correctly (I have 4GB ram):

jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
1000000+0 records in
1000000+0 records out
8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec)

Which looks suspicious: 26308 MB/sec???

> It is very important with the 3Ware cards to match the driver to the
> firmware revision.
OK, I am running 1 driver behind the firmware.

>         I did notice that changing the I/O scheduler's nr_request from
>         the
>         default 128 to 1024 or even 4096 made a remarkable performance
>         improvement. I suspect that experimenting with other I/O
>         schedululers
>         could improve performance. But it is hard to find any useful
>         documentation about I/O schedulers.
>
> You could try deadline, there’s no harm, but I’ve found that when you
> reach the point of experimenting with schedulers, you are probably not
> addressing the real problem.
It depends. I/O Schedulers (I assume) have a purpose: some schedulers
should be more appropriate for some circumstances. And maybe my specific
circumstances (converting a database with *many updates*) is a specific
circumstance. I really don't know....

> On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should
> get 100MB/s write rate, which is double what Postgres can use.  We
> find that Postgres, even with fsync=false, will only run at a net COPY
> speed of about 8-12 MB/s, where 12 is the Bizgres number.  8.1 might
> do 10.  But to get the 10 or 12, the WAL writing and other writing is
> about 4-5X more than the net write speed, or the speed at which the
> input file is parsed and read into the database.
As I have an (almost) seperate WAL disk: iostat does not show any
significant writing on the WAL disk....

> So, if you can get your “dd bigfile” test to write data at 50MB/s+
> with a blocksize of 8KB, you should be doing well enough.
See above.

> Incidentally, we also find that using the XFS filesystem and setting
> the readahead to 8MB or more is extremely beneficial for performance
> with the 3Ware cards (and with others, but especially for the older
> 3Ware cards).
I don't have problems with my read performance but *only* with my
*update* performance (and not even insert performance). But than again I
am not the only one with these problems:

http://www.issociate.de/board/goto/894541/3ware_+_RAID5_
+_xfs_performance.html#msg_894541
http://lkml.org/lkml/2005/4/20/110
http://seclists.org/lists/linux-kernel/2005/Oct/1171.html

I am happy to share the tables against which I am running my checks....

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Steve Wampler
Date:
Joost Kraaijeveld wrote:
> If I understand correctly (I have 4GB ram):
>
> jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
> 1000000+0 records in
> 1000000+0 records out
> 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
>
> Which looks suspicious: 26308 MB/sec???

Eh?  That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.

--
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote:
> Joost Kraaijeveld wrote:
> > If I understand correctly (I have 4GB ram):
> >
> > jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
> > 1000000+0 records in
> > 1000000+0 records out
> > 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
> >
> > Which looks suspicious: 26308 MB/sec???
>
> Eh?  That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.
Oooops. This calculation error is not typical for my testing (I think ;-)).

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
"Luke Lonergan"
Date:
Joost,

On 11/15/05 11:51 AM, "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> wrote:

On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote:
> Joost Kraaijeveld wrote:
> > If I understand correctly (I have 4GB ram):
> >
> > jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
> > 1000000+0 records in
> > 1000000+0 records out
> > 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
> >
> > Which looks suspicious: 26308 MB/sec???
>
> Eh?  That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.
Oooops. This calculation error is not typical for my testing (I think ;-)).

Summarizing the two facts of note: the write result is 1/4 of what you should be getting, and you are running 1 driver behind the firmware.

You might update your driver, rerun the test, and if you still have the slow result, verify that your filesystem isn’t fragmented (multiple undisciplined apps on the same filesystem will do that).

WAL on a separate disk, on a separate controller?  What is the write performance there?

Regards,

- Luke

Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
Hi Luke,

On Tue, 2005-11-15 at 22:07 -0800, Luke Lonergan wrote:

> You might update your driver,
I will do that (but I admit that I am not looking forward to it. When I
was young and did not make money with my computer, I liked challenges
like compiling kernels and not being able to boot the computer. Not any
more :-)).


>
> WAL on a separate disk, on a separate controller?  What is the write
> performance there?
WAL is on a separate disk and a separate controller, write performance:

jkr@Panoramix:/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
1000000+0 records in
1000000+0 records out
8192000000 bytes transferred in 166.499230 seconds (49201429 bytes/sec)

The quest continues...


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Re: Performance PG 8.0 on dual opteron / 4GB / 3ware

From
Joost Kraaijeveld
Date:
Hi Luke,


> It is very important with the 3Ware cards to match the driver to the
> firmware revision.
> So, if you can get your “dd bigfile” test to write data at 50MB/s+
> with a blocksize of 8KB, you should be doing well enough.

I recompiled my kernel, added the driver and:

jkr@Panoramix:~$ dmesg | grep 3w
3ware 9000 Storage Controller device driver for Linux v2.26.03.019fw.
scsi4 : 3ware 9000 Storage Controller
3w-9xxx: scsi4: Found a 3ware 9000 Storage Controller at 0xfd8ffc00,
IRQ: 28.
3w-9xxx: scsi4: Firmware FE9X 2.08.00.005, BIOS BE9X 2.03.01.052, Ports:
8.


jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
1000000+0 records in
1000000+0 records out
8192000000 bytes transferred in 200.982055 seconds (40759858 bytes/sec)

Which is an remarkable increase in speed (38.9 MB/sec vs 25.7 MB/sec).

Thanks for your suggestions.


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl



Perl DBD and an alarming problem

From
"Craig A. James"
Date:
I am mystified by the behavior of "alarm" in conjunction with Postgres/perl/DBD.  Here is roughly what I'm doing:

   eval {
      local $SIG{ALRM} = sub {die("Timeout");};
      $time = gettimeofday;
      alarm 20;
      $sth = $dbh->prepare("a query that may take a long time...");
      $sth->execute();
      alarm 0;
   };
   if ($@ && $@ =~ /Timeout/) {
      my $elapsed = gettimeofday - $time;
      print "Timed out after $elapsed seconds";
   }

Now the mystery: It works, but it hardly matters what time I use for the alarm call, the actual alarm event always
happensat 26 seconds.  I can set "alarm 1" or "alarm 20", and it almost always hits right at 26 seconds. 

Now if I increase alarm to anything in the range of about 25-60 seconds, the actual alarm arrives somewhere around the
90second mark.  It seems as though there are "windows of opportunity" for the alarm, and it is ignored until those
"windows"arrive. 

Anyone have a clue what's going on and/or how I can fix it?

A secondary question: It appears that $sth->cancel() is not implemented in the Pg DBD module.  Is that true?

Thanks,
Craig

Re: Perl DBD and an alarming problem

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> I am mystified by the behavior of "alarm" in conjunction with
> Postgres/perl/DBD.  Here is roughly what I'm doing:

       
> Anyone have a clue what's going on and/or how I can fix it?

Not really, but alarm has never worked well (or claimed to) with
DBI. The DBI docs recommend trying out Sys::Sigaction:

http://search.cpan.org/~lbaxter/Sys-SigAction/

> A secondary question: It appears that $sth->cancel() is not
> implemented in the Pg DBD module.  Is that true?

Correct. DBD::Pg does not support asynchronous connections. It's
possible it may in the future, but there has not been much of a
perceived need. Feel free to enter a request on CPAN:

http://rt.cpan.org/NoAuth/Bugs.html?Dist=DBD-Pg

       
There may be another way around it, if you can tell us some more
about what exactly it is you are trying to do.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200511161830
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFDe8ErvJuQZxSWSsgRAoZ6AJ9h6gV5U7PyLDJIqXLpSB6r7NWaaQCdESSR
CdNexfvYvSQjOLkEdPXd26U=
=/W5F
-----END PGP SIGNATURE-----



Re: Perl DBD and an alarming problem

From
Michael Fuhr
Date:
On Wed, Nov 16, 2005 at 12:59:21PM -0800, Craig A. James wrote:
>   eval {
>      local $SIG{ALRM} = sub {die("Timeout");};
>      $time = gettimeofday;
>      alarm 20;
>      $sth = $dbh->prepare("a query that may take a long time...");
>      $sth->execute();
>      alarm 0;
>   };
>   if ($@ && $@ =~ /Timeout/) {
>      my $elapsed = gettimeofday - $time;
>      print "Timed out after $elapsed seconds";
>   }
>
> Now the mystery: It works, but it hardly matters what time I use for the
> alarm call, the actual alarm event always happens at 26 seconds.  I can set
> "alarm 1" or "alarm 20", and it almost always hits right at 26 seconds.

High-level languages' signal handlers don't always work well with
low-level libraries.  I haven't dug into the Perl source code but
I'd guess that since only certain things are safe to do in a signal
handler, Perl's handler simply sets some kind of state that the
interpreter will examine later during normal execution.  If you're
using only Perl facilities then that probably happens fairly timely,
but if you're stuck in a low-level library (e.g., libpq) then you
might have to wait until that library returns control to Perl before
Perl recognizes that a signal occurred.

As an example, if I run code such as yours with alarm(2) and a query
that takes 5 seconds, I see the following in a process trace (from
ktrace/kdump on FreeBSD):

55395 perl     0.000978 CALL  poll(0xbfbfe1b8,0x1,0xffffffff)
55395 perl     1.996629 RET   poll -1 errno 4 Interrupted system call
55395 perl     0.000013 PSIG  SIGALRM caught handler=0x281be22c mask=0x0 code=0x0
55395 perl     0.000050 CALL  sigprocmask(0x1,0,0x805411c)
55395 perl     0.000005 RET   sigprocmask 0
55395 perl     0.000020 CALL  sigreturn(0xbfbfde60)
55395 perl     0.000007 RET   sigreturn JUSTRETURN
55395 perl     0.000019 CALL  poll(0xbfbfe1b8,0x1,0xffffffff)
55395 perl     3.004065 RET   poll 1
55395 perl     0.000024 CALL  recvfrom(0x3,0x81c6000,0x4000,0,0,0)
55395 perl     0.000016 GIO   fd 3 read 60 bytes

The poll() call is interrupted by SIGALRM after 2 seconds but then
it starts again and doesn't return until the query completes after
the remaining 3 seconds.  Only sometime later does Perl invoke the
ALRM handler I installed, presumably because it can't do so until
the low-level code returns control to Perl.

Is there a reason you're using alarm() in the client instead of
setting statement_timeout on the server?

--
Michael Fuhr

Re: Perl DBD and an alarming problem

From
"Craig A. James"
Date:
Thanks for the info on alarm and timeouts, this was a big help.  One further comment:

Michael Fuhr wrote:
>>  eval {
>>     local $SIG{ALRM} = sub {die("Timeout");};
>>     $time = gettimeofday;
>>     alarm 20;
>>     $sth = $dbh->prepare("a query that may take a long time...");
>>     $sth->execute();
>>     alarm 0;
>>  };
>>  if ($@ && $@ =~ /Timeout/) {
>>     my $elapsed = gettimeofday - $time;
>>     print "Timed out after $elapsed seconds";
>>  }
>>
>>Now the mystery: It works, but it hardly matters what time I use for the
>>alarm call, the actual alarm event always happens at 26 seconds...
>
>
> High-level languages' signal handlers don't always work well with
> low-level libraries...
>
> Is there a reason you're using alarm() in the client instead of
> setting statement_timeout on the server?

statement_timeout solved the problem, thanks VERY much for the pointer.  To answer your question, I use alarm() because
allthe books and web references said that was how to do it. :-).  I've used alarm() with Perl (with a 3rd-party C lib
thathad a potential infinite loop) very successfully. 

So thanks for the pointer to statement_timeout.  But...

When I set statement_timeout in the config file, it just didn't do anything - it never timed out (PG 8.0.3).  I finally
foundin the documentation that I can do "set statement_timeout = xxx" from PerlDBI on a per-client basis, and that
works.

Craig

Re: Perl DBD and an alarming problem

From
Michael Fuhr
Date:
On Thu, Nov 17, 2005 at 01:04:21PM -0800, Craig A. James wrote:
> When I set statement_timeout in the config file, it just didn't do anything
> - it never timed out (PG 8.0.3).  I finally found in the documentation that
> I can do "set statement_timeout = xxx" from PerlDBI on a per-client basis,
> and that works.

You probably shouldn't set statement_timeout on a global basis
anyway, but did you reload the server after you made the change?
Setting statement_timeout in postgresql.conf and then reloading the
server works here in 8.0.4.

--
Michael Fuhr

Re: Perl DBD and an alarming problem

From
Michael Fuhr
Date:
[Please copy the mailing list on replies.]

On Thu, Nov 17, 2005 at 05:38:13PM -0800, Craig A. James wrote:
> >You probably shouldn't set statement_timeout on a global basis
> >anyway
>
> The server is a "one trick pony" so setting a global timeout value is
> actually appropriate.

Beware that statement_timeout also applies to maintenance commands
like VACUUM; it might be more appropriate to set per-user timeouts
with ALTER USER.  If you do set a global timeout then you might
want to set a per-user timeout of 0 for database superusers so
maintenance activities don't get timed out.

> >... but did you reload the server after you made the change?
> >Setting statement_timeout in postgresql.conf and then reloading the
> >server works here in 8.0.4.
>
> Yes.  By "reload" I assume you mean restarting it from scratch.

Either a restart or a "pg_ctl reload", which sends a SIGHUP to the
server.  You can effect some changes by sending a signal to a running
server without having to restart it entirely.

> In this case, I use
>
>    /etc/init.d/postgresql restart
>
> It definitely had no effect at all.  I tried values clear down to 1
> millisecond, but the server never timed out for any query.

Did you use "SHOW statement_timeout" to see if the value was set
to what you wanted?  Are you sure you edited the right file?  As a
database superuser execute "SHOW config_file" to see what file the
server is using.  What exactly did the line look like after you
changed it?

--
Michael Fuhr

Storage/Performance and splitting a table

From
"Craig A. James"
Date:
In a recent thread, several people pointed out that UPDATE = DELETE+INSERT.  This got me to wondering.

I have a table that, roughly, looks like this:

  create table doc (
     id         integer primary key,
     document   text,
     keywords   tsvector
   );

where "keywords" has a GIST index.  There are about 10 million rows in the table, and an average of 20 keywords per
document. I have two questions. 

First, I occasionally rebuild the keywords, after which the VACUUM FULL ANALYZE takes a LONG time - like 24 hours.
Giventhe UPDATE = DELETE+INSERT, it sounds like I'd be better off with something like this: 

  create table doc (
     id         integer primary key,
     document   text,
   );
  create table keywords (
     id         integer primary key,
     keywords   tsvector
   );

Then I could just drop the GIST index, truncate the keywords table, rebuild the keywords, and reindex.  My suspicion is
thatVACUUM FULL ANALYZE would be quick -- there would be no garbage to collect, so all it would to do is the ANALYZE
part.

My second question: With the doc and keywords split into two tables, would the tsearch2/GIST performance be faster?
Thesecond schema's "keywords" table has just pure keywords (no documents); does that translate to fewer blocks being
readduring a tsearch2/GIST query?  Or are the "document" and "keywords" columns of the first schema already stored
separatelyon disk so that the size of the "document" data doesn't affect the "keywords" search performance? 

Thanks,
Craig

Re: Storage/Performance and splitting a table

From
Michael Stone
Date:
On Sat, Nov 19, 2005 at 09:54:23AM -0800, Craig A. James wrote:
>First, I occasionally rebuild the keywords, after which the VACUUM FULL
>ANALYZE takes a LONG time - like 24 hours.

You know you just need vacuum, not vacuum full, right?

Mike Stone

Re: Perl DBD and an alarming problem

From
"Craig A. James"
Date:
>> When I set statement_timeout in the config file, it just didn't
>> do anything - it never timed out (PG 8.0.3).
>
>... but did you reload the server after you [changed statement_timeout]?

Mystery solved.  I have two servers; I was reconfiguring one and restarting the other.  Duh.

Thanks,
Craig

Hyperthreading slows processes?

From
"Craig A. James"
Date:
This article on ZDNet claims that hyperthreading can *hurt* performance, due to contention in the L1/L2 cache by a
secondprocess: 

   http://news.zdnet.co.uk/0,39020330,39237341,00.htm

Has anyone tested this on Postgres yet?  (And based on a recent somewhat caustic thread about performance on this
forum,I'm going to avoid speculation, and let those who actually *know* answer! ;-) 

Craig

Re: Hyperthreading slows processes?

From
Dave Cramer
Date:
Yeah, it's pretty much a known issue for postgres

Dave
On 20-Nov-05, at 4:46 PM, Craig A. James wrote:

> This article on ZDNet claims that hyperthreading can *hurt*
> performance, due to contention in the L1/L2 cache by a second process:
>
>   http://news.zdnet.co.uk/0,39020330,39237341,00.htm
>
> Has anyone tested this on Postgres yet?  (And based on a recent
> somewhat caustic thread about performance on this forum, I'm going
> to avoid speculation, and let those who actually *know* answer! ;-)
>
> Craig
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


Re: 15,000 tables

From
"Craig A. James"
Date:
So say I need 10,000 tables, but I can create tablespaces.  Wouldn't that solve the performance problem caused by
Linux's(or ext2/3's) problems with large directories? 

For example, if each user creates (say) 10 tables, and I have 1000 users, I could create 100 tablespaces, and assign
groupsof 10 users to each tablespace.  This would limit each tablespace to 100 tables, and keep the ext2/3 file-system
directoriesmanageable. 

Would this work?  Would there be other problems?

Thanks,
Craig

Re: 15,000 tables

From
David Lang
Date:
On Thu, 1 Dec 2005, Craig A. James wrote:

> So say I need 10,000 tables, but I can create tablespaces.  Wouldn't that
> solve the performance problem caused by Linux's (or ext2/3's) problems with
> large directories?
>
> For example, if each user creates (say) 10 tables, and I have 1000 users, I
> could create 100 tablespaces, and assign groups of 10 users to each
> tablespace.  This would limit each tablespace to 100 tables, and keep the
> ext2/3 file-system directories manageable.
>
> Would this work?  Would there be other problems?

This would definantly help, however there's still the question of how
large the tables get, and how many total files are needed to hold the 100
tables.

you still have the problem of having to seek around to deal with all these
different files (and tablespaces just spread them further apart), you
can't solve this, but a large write-back journal (as opposed to
metadata-only) would mask the problem.

it would be a trade-off, you would end up writing all your data twice, so
the throughput would be lower, but since the data is safe as soon as it
hits the journal the latency for any one request would be lower, which
would allow the system to use the CPU more and overlap it with your
seeking.

David Lang