Thread: Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??
Hi, I am experiencing very long update queries and I want to know if it reasonable to expect them to perform better. The query below is running for more than 1.5 hours (5500 seconds) now, while the rest of the system does nothing (I don't even type or move a mouse...). - Is that to be expected? - Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, given the fact that fsync is off? (Note: with bonnie++ I get write performance > 50 MB/sec and read performace > 70 MB/sec with > 2000 read/write ops /sec? - Does anyone else have any experience with the 3Ware RAID controller (which is my suspect)? - Any good idea how to determine the real botleneck if this is not the performance I can expect? My hard- and software: - PostgreSQL 8.0.3 - Debian 3.1 (Sarge) AMD64 - Dual Opteron - 4GB RAM - 3ware Raid5 with 5 disks Pieces of my postgresql.conf (All other is default): shared_buffers = 7500 work_mem = 260096 fsync=false effective_cache_size = 32768 The query with explain (amount and orderbedrag_valuta are float8, ordernummer and ordernumber int4): explain update prototype.orders set amount = odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber = odbc.orders.ordernummer; QUERY PLAN ----------------------------------------------------------------------------- Hash Join (cost=50994.74..230038.17 rows=1104379 width=466) Hash Cond: ("outer".ordernumber = "inner".ordernummer) -> Seq Scan on orders (cost=0.00..105360.68 rows=3991868 width=455) -> Hash (cost=48233.79..48233.79 rows=1104379 width=15) -> Seq Scan on orders (cost=0.00..48233.79 rows=1104379 width=15) Sample output from iostat during query (about avarage): Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn hdc 0.00 0.00 0.00 0 0 sda 0.00 0.00 0.00 0 0 sdb 187.13 23.76 8764.36 24 8852 -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes: > I am experiencing very long update queries and I want to know if it > reasonable to expect them to perform better. Does that table have any triggers that would fire on the update? regards, tom lane
On Sun, 2005-11-06 at 12:17 -0500, Tom Lane wrote: > Does that table have any triggers that would fire on the update? Alas, no trigger, constrainst, foreign keys, indixes (have I forgotten something?) All queries are slow. E.g (after vacuum): select objectid from prototype.orders Explain analyse (with PgAdmin): Seq Scan on orders (cost=0.00..58211.79 rows=1104379 width=40) (actual time=441.971..3252.698 rows=1104379 loops=1) Total runtime: 5049.467 ms Actual execution time: 82163 MS (without getting the data) Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes: > Explain analyse (with PgAdmin): > ... > Total runtime: 5049.467 ms > Actual execution time: 82163 MS (without getting the data) I'm confused --- where's the 82sec figure coming from, exactly? We've heard reports of performance issues in PgAdmin with large result sets ... if you do the same query in psql, what happens? regards, tom lane
Hi Tom, On Sun, 2005-11-06 at 15:26 -0500, Tom Lane wrote: > I'm confused --- where's the 82sec figure coming from, exactly? From actually executing the query. From PgAdmin: -- Executing query: select objectid from prototype.orders Total query runtime: 78918 ms. Data retrieval runtime: 188822 ms. 1104379 rows retrieved. > We've heard reports of performance issues in PgAdmin with large > result sets ... if you do the same query in psql, what happens? jkr@Panoramix:~/postgresql$ time psql muntdev -c "select objectid from prototype.orders" > output.txt real 0m5.554s user 0m1.121s sys 0m0.470s Now *I* am confused. What does PgAdmin do more than giving the query to the database? (BTW: I have repeated both measurements and the numbers above were all from the last measurement I did and are about average) -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
> Now *I* am confused. What does PgAdmin do more than giving the query to > the database? It builds it into the data grid GUI object. Chris
On Mon, 2005-11-07 at 12:37 +0800, Christopher Kings-Lynne wrote: > > Now *I* am confused. What does PgAdmin do more than giving the query to > > the database? > > It builds it into the data grid GUI object. Is that not the difference between the total query runtime and the data retrieval runtime (see below)? -- Executing query: select objectid from prototype.orders Total query runtime: 78918 ms. Data retrieval runtime: 188822 ms. 1104379 rows retrieved. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Hi Christopher, On Mon, 2005-11-07 at 12:37 +0800, Christopher Kings-Lynne wrote: > > Now *I* am confused. What does PgAdmin do more than giving the query to > > the database? > > It builds it into the data grid GUI object. But my initial question was about a query that does not produce data at all (well, a response from the server saying it is finished). I broke that query off after several hours. I am now running the query from my initial question with psql (now for >1 hour, in a transaction, fsyn off). Some statistics : uptime: 06:35:55 up 9:47, 6 users, load average: 7.08, 7.21, 6.08 iostat -x -k 1 (this output appears to be representative): avg-cpu: %user %nice %sys %iowait %idle 1.00 0.00 0.50 98.51 0.00 Device: sda sdb rrqm/s 0.00 0.00 wrqm/s 14.00 611.00 r/s 0.00 1.00 w/s 3.00 201.00 rsec/s 0.00 32.00 wsec/s 136.00 6680.00 rkB/s 0.00 16.00 wkB/s 68.00 3340.00 avgrq-sz 45.33 33.23 avgqu-sz 0.00 145.67 await 0.67 767.19 svctm 0.67 4.97 %util 0.20 100.30 -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Where are the pg_xlog and data directories with respect to each other? From this IOStat it looks like they might be on the same partition, which is not ideal, and actualy surprising that throughput is this good. You need to seperate pg_xlog and data directories to get any kind of reasonable performance. Also don't use RAID 5 - RAID 5 bites, no really - it bites. Use multiple RAID 1s, or RAID 10s, you will get better performance. 50MB/70MB is about the same as you get from a single disk or a RAID 1. We use 2x9506S8MI controlers, and have maintained excellent performance with 2xRAID 10 and 2xRAID 1. Make sure you get the firmware update if you have these controllers though. Alex Turner NetEconomist On 11/6/05, Joost Kraaijeveld <J.Kraaijeveld@askesis.nl> wrote: > Hi, > > I am experiencing very long update queries and I want to know if it > reasonable to expect them to perform better. > > The query below is running for more than 1.5 hours (5500 seconds) now, > while the rest of the system does nothing (I don't even type or move a > mouse...). > > - Is that to be expected? > - Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, given > the fact that fsync is off? (Note: with bonnie++ I get write > performance > 50 MB/sec and read performace > 70 MB/sec with > 2000 > read/write ops /sec? > - Does anyone else have any experience with the 3Ware RAID controller > (which is my suspect)? > - Any good idea how to determine the real botleneck if this is not the > performance I can expect? > > My hard- and software: > > - PostgreSQL 8.0.3 > - Debian 3.1 (Sarge) AMD64 > - Dual Opteron > - 4GB RAM > - 3ware Raid5 with 5 disks > > Pieces of my postgresql.conf (All other is default): > shared_buffers = 7500 > work_mem = 260096 > fsync=false > effective_cache_size = 32768 > > > > The query with explain (amount and orderbedrag_valuta are float8, > ordernummer and ordernumber int4): > > explain update prototype.orders set amount = > odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber = > odbc.orders.ordernummer; > QUERY PLAN > ----------------------------------------------------------------------------- > Hash Join (cost=50994.74..230038.17 rows=1104379 width=466) > Hash Cond: ("outer".ordernumber = "inner".ordernummer) > -> Seq Scan on orders (cost=0.00..105360.68 rows=3991868 width=455) > -> Hash (cost=48233.79..48233.79 rows=1104379 width=15) > -> Seq Scan on orders (cost=0.00..48233.79 rows=1104379 > width=15) > > > Sample output from iostat during query (about avarage): > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > hdc 0.00 0.00 0.00 0 0 > sda 0.00 0.00 0.00 0 0 > sdb 187.13 23.76 8764.36 24 8852 > > > -- > Groeten, > > Joost Kraaijeveld > Askesis B.V. > Molukkenstraat 14 > 6524NB Nijmegen > tel: 024-3888063 / 06-51855277 > fax: 024-3608416 > e-mail: J.Kraaijeveld@Askesis.nl > web: www.askesis.nl > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Joost, I've got experience with these controllers and which version do you have. I'd expect to see higher than 50MB/s although I've never tried RAID 5 I routinely see closer to 100MB/s with RAID 1+0 on their 9000 series I would also suggest that shared buffers should be higher than 7500, closer to 30000, and effective cache should be up around 200k work_mem is awfully high, remember that this will be given to each and every connection and can be more than 1x this number per connection depending on the number of sorts done in the query. fsync=false ? I'm not even sure why we have this option, but I'd never set it to false. Dave On 6-Nov-05, at 8:30 AM, Joost Kraaijeveld wrote: > Hi, > > I am experiencing very long update queries and I want to know if it > reasonable to expect them to perform better. > > The query below is running for more than 1.5 hours (5500 seconds) now, > while the rest of the system does nothing (I don't even type or move a > mouse...). > > - Is that to be expected? > - Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, > given > the fact that fsync is off? (Note: with bonnie++ I get write > performance > 50 MB/sec and read performace > 70 MB/sec with > 2000 > read/write ops /sec? > - Does anyone else have any experience with the 3Ware RAID controller > (which is my suspect)? > - Any good idea how to determine the real botleneck if this is not the > performance I can expect? > > My hard- and software: > > - PostgreSQL 8.0.3 > - Debian 3.1 (Sarge) AMD64 > - Dual Opteron > - 4GB RAM > - 3ware Raid5 with 5 disks > > Pieces of my postgresql.conf (All other is default): > shared_buffers = 7500 > work_mem = 260096 > fsync=false > effective_cache_size = 32768 > > > > The query with explain (amount and orderbedrag_valuta are float8, > ordernummer and ordernumber int4): > > explain update prototype.orders set amount = > odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber = > odbc.orders.ordernummer; > QUERY PLAN > ---------------------------------------------------------------------- > ------- > Hash Join (cost=50994.74..230038.17 rows=1104379 width=466) > Hash Cond: ("outer".ordernumber = "inner".ordernummer) > -> Seq Scan on orders (cost=0.00..105360.68 rows=3991868 > width=455) > -> Hash (cost=48233.79..48233.79 rows=1104379 width=15) > -> Seq Scan on orders (cost=0.00..48233.79 rows=1104379 > width=15) > > > Sample output from iostat during query (about avarage): > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > hdc 0.00 0.00 0.00 0 0 > sda 0.00 0.00 0.00 0 0 > sdb 187.13 23.76 8764.36 24 8852 > > > -- > Groeten, > > Joost Kraaijeveld > Askesis B.V. > Molukkenstraat 14 > 6524NB Nijmegen > tel: 024-3888063 / 06-51855277 > fax: 024-3608416 > e-mail: J.Kraaijeveld@Askesis.nl > web: www.askesis.nl > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly >
Hi Dave, On Mon, 2005-11-14 at 18:51 -0500, Dave Cramer wrote: > Joost, > > I've got experience with these controllers and which version do you > have. I'd expect to see higher than 50MB/s although I've never tried > RAID 5 > > I routinely see closer to 100MB/s with RAID 1+0 on their 9000 series OK, than there must be hope. > I would also suggest that shared buffers should be higher than 7500, > closer to 30000, and effective cache should be up around 200k In my current 8.1 situation I use shared_buffers = 40000, effective_cache_size = 131072 . > work_mem is awfully high, remember that this will be given to each > and every connection and can be more than 1x this number per > connection depending on the number of sorts > done in the query. I use such a high number because I am the only user querying and my queries do sorted joins etc. > fsync=false ? I'm not even sure why we have this option, but I'd > never set it to false. I want as much speed as possible for a database conversion that MUST be handled in 1 weekend (it lasts now, with the current speed almost 7 centuries. I may be off a millenium). If it fails because of hardware problem (the only reason we want and need fsync?) we will try next weekend until it finally goes right. What I can see is that only the *write* performance of *long updates* (and not inserts) are slow and they get slower in time: the first few thousand go relatively fast, after that PostgreSQL crawls to a halt (other "benchmarks" like bonnie++ or just dd'ing a big file don't have this behavior). I did notice that changing the I/O scheduler's nr_request from the default 128 to 1024 or even 4096 made a remarkable performance improvement. I suspect that experimenting with other I/O schedululers could improve performance. But it is hard to find any useful documentation about I/O schedulers. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Joost,
On 11/15/05 8:35 AM, "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> wrote:
With RAID5, it could matter a lot what block size you run your “dd bigfile” test with. You should run “dd if=/dev/zero of=bigfile bs=8k count=500000” for a 2GB main memory machine, multiply the count by (<your mem>/2GB).
It is very important with the 3Ware cards to match the driver to the firmware revision.
So, if you can get your “dd bigfile” test to write data at 50MB/s+ with a blocksize of 8KB, you should be doing well enough.
Incidentally, we also find that using the XFS filesystem and setting the readahead to 8MB or more is extremely beneficial for performance with the 3Ware cards (and with others, but especially for the older 3Ware cards).
Regards,
- Luke
On 11/15/05 8:35 AM, "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> wrote:
thousand go relatively fast, after that PostgreSQL crawls to a halt
(other "benchmarks" like bonnie++ or just dd'ing a big file don't have
this behavior).
With RAID5, it could matter a lot what block size you run your “dd bigfile” test with. You should run “dd if=/dev/zero of=bigfile bs=8k count=500000” for a 2GB main memory machine, multiply the count by (<your mem>/2GB).
It is very important with the 3Ware cards to match the driver to the firmware revision.
I did notice that changing the I/O scheduler's nr_request from theYou could try deadline, there’s no harm, but I’ve found that when you reach the point of experimenting with schedulers, you are probably not addressing the real problem.
default 128 to 1024 or even 4096 made a remarkable performance
improvement. I suspect that experimenting with other I/O schedululers
could improve performance. But it is hard to find any useful
documentation about I/O schedulers.
On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should get 100MB/s write rate, which is double what Postgres can use. We find that Postgres, even with fsync=false, will only run at a net COPY speed of about 8-12 MB/s, where 12 is the Bizgres number. 8.1 might do 10. But to get the 10 or 12, the WAL writing and other writing is about 4-5X more than the net write speed, or the speed at which the input file is parsed and read into the database.
So, if you can get your “dd bigfile” test to write data at 50MB/s+ with a blocksize of 8KB, you should be doing well enough.
Incidentally, we also find that using the XFS filesystem and setting the readahead to 8MB or more is extremely beneficial for performance with the 3Ware cards (and with others, but especially for the older 3Ware cards).
Regards,
- Luke
Hi Luke, On Tue, 2005-11-15 at 10:42 -0800, Luke Lonergan wrote: > With RAID5, it could matter a lot what block size you run your “dd > bigfile” test with. You should run “dd if=/dev/zero of=bigfile bs=8k > count=500000” for a 2GB main memory machine, multiply the count by > (<your mem>/2GB). If I understand correctly (I have 4GB ram): jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000 1000000+0 records in 1000000+0 records out 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec) Which looks suspicious: 26308 MB/sec??? > It is very important with the 3Ware cards to match the driver to the > firmware revision. OK, I am running 1 driver behind the firmware. > I did notice that changing the I/O scheduler's nr_request from > the > default 128 to 1024 or even 4096 made a remarkable performance > improvement. I suspect that experimenting with other I/O > schedululers > could improve performance. But it is hard to find any useful > documentation about I/O schedulers. > > You could try deadline, there’s no harm, but I’ve found that when you > reach the point of experimenting with schedulers, you are probably not > addressing the real problem. It depends. I/O Schedulers (I assume) have a purpose: some schedulers should be more appropriate for some circumstances. And maybe my specific circumstances (converting a database with *many updates*) is a specific circumstance. I really don't know.... > On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should > get 100MB/s write rate, which is double what Postgres can use. We > find that Postgres, even with fsync=false, will only run at a net COPY > speed of about 8-12 MB/s, where 12 is the Bizgres number. 8.1 might > do 10. But to get the 10 or 12, the WAL writing and other writing is > about 4-5X more than the net write speed, or the speed at which the > input file is parsed and read into the database. As I have an (almost) seperate WAL disk: iostat does not show any significant writing on the WAL disk.... > So, if you can get your “dd bigfile” test to write data at 50MB/s+ > with a blocksize of 8KB, you should be doing well enough. See above. > Incidentally, we also find that using the XFS filesystem and setting > the readahead to 8MB or more is extremely beneficial for performance > with the 3Ware cards (and with others, but especially for the older > 3Ware cards). I don't have problems with my read performance but *only* with my *update* performance (and not even insert performance). But than again I am not the only one with these problems: http://www.issociate.de/board/goto/894541/3ware_+_RAID5_ +_xfs_performance.html#msg_894541 http://lkml.org/lkml/2005/4/20/110 http://seclists.org/lists/linux-kernel/2005/Oct/1171.html I am happy to share the tables against which I am running my checks.... -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Joost Kraaijeveld wrote: > If I understand correctly (I have 4GB ram): > > jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000 > 1000000+0 records in > 1000000+0 records out > 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec) > > Which looks suspicious: 26308 MB/sec??? Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes. -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote: > Joost Kraaijeveld wrote: > > If I understand correctly (I have 4GB ram): > > > > jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000 > > 1000000+0 records in > > 1000000+0 records out > > 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec) > > > > Which looks suspicious: 26308 MB/sec??? > > Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes. Oooops. This calculation error is not typical for my testing (I think ;-)). -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Joost,
On 11/15/05 11:51 AM, "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> wrote:
Summarizing the two facts of note: the write result is 1/4 of what you should be getting, and you are running 1 driver behind the firmware.
You might update your driver, rerun the test, and if you still have the slow result, verify that your filesystem isn’t fragmented (multiple undisciplined apps on the same filesystem will do that).
WAL on a separate disk, on a separate controller? What is the write performance there?
Regards,
- Luke
On 11/15/05 11:51 AM, "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> wrote:
On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote:
> Joost Kraaijeveld wrote:
> > If I understand correctly (I have 4GB ram):
> >
> > jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000
> > 1000000+0 records in
> > 1000000+0 records out
> > 8192000000 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
> >
> > Which looks suspicious: 26308 MB/sec???
>
> Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.
Oooops. This calculation error is not typical for my testing (I think ;-)).
Summarizing the two facts of note: the write result is 1/4 of what you should be getting, and you are running 1 driver behind the firmware.
You might update your driver, rerun the test, and if you still have the slow result, verify that your filesystem isn’t fragmented (multiple undisciplined apps on the same filesystem will do that).
WAL on a separate disk, on a separate controller? What is the write performance there?
Regards,
- Luke
Hi Luke, On Tue, 2005-11-15 at 22:07 -0800, Luke Lonergan wrote: > You might update your driver, I will do that (but I admit that I am not looking forward to it. When I was young and did not make money with my computer, I liked challenges like compiling kernels and not being able to boot the computer. Not any more :-)). > > WAL on a separate disk, on a separate controller? What is the write > performance there? WAL is on a separate disk and a separate controller, write performance: jkr@Panoramix:/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000 1000000+0 records in 1000000+0 records out 8192000000 bytes transferred in 166.499230 seconds (49201429 bytes/sec) The quest continues... -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
Hi Luke, > It is very important with the 3Ware cards to match the driver to the > firmware revision. > So, if you can get your “dd bigfile” test to write data at 50MB/s+ > with a blocksize of 8KB, you should be doing well enough. I recompiled my kernel, added the driver and: jkr@Panoramix:~$ dmesg | grep 3w 3ware 9000 Storage Controller device driver for Linux v2.26.03.019fw. scsi4 : 3ware 9000 Storage Controller 3w-9xxx: scsi4: Found a 3ware 9000 Storage Controller at 0xfd8ffc00, IRQ: 28. 3w-9xxx: scsi4: Firmware FE9X 2.08.00.005, BIOS BE9X 2.03.01.052, Ports: 8. jkr@Panoramix:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=1000000 1000000+0 records in 1000000+0 records out 8192000000 bytes transferred in 200.982055 seconds (40759858 bytes/sec) Which is an remarkable increase in speed (38.9 MB/sec vs 25.7 MB/sec). Thanks for your suggestions. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl
I am mystified by the behavior of "alarm" in conjunction with Postgres/perl/DBD. Here is roughly what I'm doing: eval { local $SIG{ALRM} = sub {die("Timeout");}; $time = gettimeofday; alarm 20; $sth = $dbh->prepare("a query that may take a long time..."); $sth->execute(); alarm 0; }; if ($@ && $@ =~ /Timeout/) { my $elapsed = gettimeofday - $time; print "Timed out after $elapsed seconds"; } Now the mystery: It works, but it hardly matters what time I use for the alarm call, the actual alarm event always happensat 26 seconds. I can set "alarm 1" or "alarm 20", and it almost always hits right at 26 seconds. Now if I increase alarm to anything in the range of about 25-60 seconds, the actual alarm arrives somewhere around the 90second mark. It seems as though there are "windows of opportunity" for the alarm, and it is ignored until those "windows"arrive. Anyone have a clue what's going on and/or how I can fix it? A secondary question: It appears that $sth->cancel() is not implemented in the Pg DBD module. Is that true? Thanks, Craig
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I am mystified by the behavior of "alarm" in conjunction with > Postgres/perl/DBD. Here is roughly what I'm doing: > Anyone have a clue what's going on and/or how I can fix it? Not really, but alarm has never worked well (or claimed to) with DBI. The DBI docs recommend trying out Sys::Sigaction: http://search.cpan.org/~lbaxter/Sys-SigAction/ > A secondary question: It appears that $sth->cancel() is not > implemented in the Pg DBD module. Is that true? Correct. DBD::Pg does not support asynchronous connections. It's possible it may in the future, but there has not been much of a perceived need. Feel free to enter a request on CPAN: http://rt.cpan.org/NoAuth/Bugs.html?Dist=DBD-Pg There may be another way around it, if you can tell us some more about what exactly it is you are trying to do. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200511161830 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFDe8ErvJuQZxSWSsgRAoZ6AJ9h6gV5U7PyLDJIqXLpSB6r7NWaaQCdESSR CdNexfvYvSQjOLkEdPXd26U= =/W5F -----END PGP SIGNATURE-----
On Wed, Nov 16, 2005 at 12:59:21PM -0800, Craig A. James wrote: > eval { > local $SIG{ALRM} = sub {die("Timeout");}; > $time = gettimeofday; > alarm 20; > $sth = $dbh->prepare("a query that may take a long time..."); > $sth->execute(); > alarm 0; > }; > if ($@ && $@ =~ /Timeout/) { > my $elapsed = gettimeofday - $time; > print "Timed out after $elapsed seconds"; > } > > Now the mystery: It works, but it hardly matters what time I use for the > alarm call, the actual alarm event always happens at 26 seconds. I can set > "alarm 1" or "alarm 20", and it almost always hits right at 26 seconds. High-level languages' signal handlers don't always work well with low-level libraries. I haven't dug into the Perl source code but I'd guess that since only certain things are safe to do in a signal handler, Perl's handler simply sets some kind of state that the interpreter will examine later during normal execution. If you're using only Perl facilities then that probably happens fairly timely, but if you're stuck in a low-level library (e.g., libpq) then you might have to wait until that library returns control to Perl before Perl recognizes that a signal occurred. As an example, if I run code such as yours with alarm(2) and a query that takes 5 seconds, I see the following in a process trace (from ktrace/kdump on FreeBSD): 55395 perl 0.000978 CALL poll(0xbfbfe1b8,0x1,0xffffffff) 55395 perl 1.996629 RET poll -1 errno 4 Interrupted system call 55395 perl 0.000013 PSIG SIGALRM caught handler=0x281be22c mask=0x0 code=0x0 55395 perl 0.000050 CALL sigprocmask(0x1,0,0x805411c) 55395 perl 0.000005 RET sigprocmask 0 55395 perl 0.000020 CALL sigreturn(0xbfbfde60) 55395 perl 0.000007 RET sigreturn JUSTRETURN 55395 perl 0.000019 CALL poll(0xbfbfe1b8,0x1,0xffffffff) 55395 perl 3.004065 RET poll 1 55395 perl 0.000024 CALL recvfrom(0x3,0x81c6000,0x4000,0,0,0) 55395 perl 0.000016 GIO fd 3 read 60 bytes The poll() call is interrupted by SIGALRM after 2 seconds but then it starts again and doesn't return until the query completes after the remaining 3 seconds. Only sometime later does Perl invoke the ALRM handler I installed, presumably because it can't do so until the low-level code returns control to Perl. Is there a reason you're using alarm() in the client instead of setting statement_timeout on the server? -- Michael Fuhr
Thanks for the info on alarm and timeouts, this was a big help. One further comment: Michael Fuhr wrote: >> eval { >> local $SIG{ALRM} = sub {die("Timeout");}; >> $time = gettimeofday; >> alarm 20; >> $sth = $dbh->prepare("a query that may take a long time..."); >> $sth->execute(); >> alarm 0; >> }; >> if ($@ && $@ =~ /Timeout/) { >> my $elapsed = gettimeofday - $time; >> print "Timed out after $elapsed seconds"; >> } >> >>Now the mystery: It works, but it hardly matters what time I use for the >>alarm call, the actual alarm event always happens at 26 seconds... > > > High-level languages' signal handlers don't always work well with > low-level libraries... > > Is there a reason you're using alarm() in the client instead of > setting statement_timeout on the server? statement_timeout solved the problem, thanks VERY much for the pointer. To answer your question, I use alarm() because allthe books and web references said that was how to do it. :-). I've used alarm() with Perl (with a 3rd-party C lib thathad a potential infinite loop) very successfully. So thanks for the pointer to statement_timeout. But... When I set statement_timeout in the config file, it just didn't do anything - it never timed out (PG 8.0.3). I finally foundin the documentation that I can do "set statement_timeout = xxx" from PerlDBI on a per-client basis, and that works. Craig
On Thu, Nov 17, 2005 at 01:04:21PM -0800, Craig A. James wrote: > When I set statement_timeout in the config file, it just didn't do anything > - it never timed out (PG 8.0.3). I finally found in the documentation that > I can do "set statement_timeout = xxx" from PerlDBI on a per-client basis, > and that works. You probably shouldn't set statement_timeout on a global basis anyway, but did you reload the server after you made the change? Setting statement_timeout in postgresql.conf and then reloading the server works here in 8.0.4. -- Michael Fuhr
[Please copy the mailing list on replies.] On Thu, Nov 17, 2005 at 05:38:13PM -0800, Craig A. James wrote: > >You probably shouldn't set statement_timeout on a global basis > >anyway > > The server is a "one trick pony" so setting a global timeout value is > actually appropriate. Beware that statement_timeout also applies to maintenance commands like VACUUM; it might be more appropriate to set per-user timeouts with ALTER USER. If you do set a global timeout then you might want to set a per-user timeout of 0 for database superusers so maintenance activities don't get timed out. > >... but did you reload the server after you made the change? > >Setting statement_timeout in postgresql.conf and then reloading the > >server works here in 8.0.4. > > Yes. By "reload" I assume you mean restarting it from scratch. Either a restart or a "pg_ctl reload", which sends a SIGHUP to the server. You can effect some changes by sending a signal to a running server without having to restart it entirely. > In this case, I use > > /etc/init.d/postgresql restart > > It definitely had no effect at all. I tried values clear down to 1 > millisecond, but the server never timed out for any query. Did you use "SHOW statement_timeout" to see if the value was set to what you wanted? Are you sure you edited the right file? As a database superuser execute "SHOW config_file" to see what file the server is using. What exactly did the line look like after you changed it? -- Michael Fuhr
In a recent thread, several people pointed out that UPDATE = DELETE+INSERT. This got me to wondering. I have a table that, roughly, looks like this: create table doc ( id integer primary key, document text, keywords tsvector ); where "keywords" has a GIST index. There are about 10 million rows in the table, and an average of 20 keywords per document. I have two questions. First, I occasionally rebuild the keywords, after which the VACUUM FULL ANALYZE takes a LONG time - like 24 hours. Giventhe UPDATE = DELETE+INSERT, it sounds like I'd be better off with something like this: create table doc ( id integer primary key, document text, ); create table keywords ( id integer primary key, keywords tsvector ); Then I could just drop the GIST index, truncate the keywords table, rebuild the keywords, and reindex. My suspicion is thatVACUUM FULL ANALYZE would be quick -- there would be no garbage to collect, so all it would to do is the ANALYZE part. My second question: With the doc and keywords split into two tables, would the tsearch2/GIST performance be faster? Thesecond schema's "keywords" table has just pure keywords (no documents); does that translate to fewer blocks being readduring a tsearch2/GIST query? Or are the "document" and "keywords" columns of the first schema already stored separatelyon disk so that the size of the "document" data doesn't affect the "keywords" search performance? Thanks, Craig
On Sat, Nov 19, 2005 at 09:54:23AM -0800, Craig A. James wrote: >First, I occasionally rebuild the keywords, after which the VACUUM FULL >ANALYZE takes a LONG time - like 24 hours. You know you just need vacuum, not vacuum full, right? Mike Stone
>> When I set statement_timeout in the config file, it just didn't >> do anything - it never timed out (PG 8.0.3). > >... but did you reload the server after you [changed statement_timeout]? Mystery solved. I have two servers; I was reconfiguring one and restarting the other. Duh. Thanks, Craig
This article on ZDNet claims that hyperthreading can *hurt* performance, due to contention in the L1/L2 cache by a secondprocess: http://news.zdnet.co.uk/0,39020330,39237341,00.htm Has anyone tested this on Postgres yet? (And based on a recent somewhat caustic thread about performance on this forum,I'm going to avoid speculation, and let those who actually *know* answer! ;-) Craig
Yeah, it's pretty much a known issue for postgres Dave On 20-Nov-05, at 4:46 PM, Craig A. James wrote: > This article on ZDNet claims that hyperthreading can *hurt* > performance, due to contention in the L1/L2 cache by a second process: > > http://news.zdnet.co.uk/0,39020330,39237341,00.htm > > Has anyone tested this on Postgres yet? (And based on a recent > somewhat caustic thread about performance on this forum, I'm going > to avoid speculation, and let those who actually *know* answer! ;-) > > Craig > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
So say I need 10,000 tables, but I can create tablespaces. Wouldn't that solve the performance problem caused by Linux's(or ext2/3's) problems with large directories? For example, if each user creates (say) 10 tables, and I have 1000 users, I could create 100 tablespaces, and assign groupsof 10 users to each tablespace. This would limit each tablespace to 100 tables, and keep the ext2/3 file-system directoriesmanageable. Would this work? Would there be other problems? Thanks, Craig
On Thu, 1 Dec 2005, Craig A. James wrote: > So say I need 10,000 tables, but I can create tablespaces. Wouldn't that > solve the performance problem caused by Linux's (or ext2/3's) problems with > large directories? > > For example, if each user creates (say) 10 tables, and I have 1000 users, I > could create 100 tablespaces, and assign groups of 10 users to each > tablespace. This would limit each tablespace to 100 tables, and keep the > ext2/3 file-system directories manageable. > > Would this work? Would there be other problems? This would definantly help, however there's still the question of how large the tables get, and how many total files are needed to hold the 100 tables. you still have the problem of having to seek around to deal with all these different files (and tablespaces just spread them further apart), you can't solve this, but a large write-back journal (as opposed to metadata-only) would mask the problem. it would be a trade-off, you would end up writing all your data twice, so the throughput would be lower, but since the data is safe as soon as it hits the journal the latency for any one request would be lower, which would allow the system to use the CPU more and overlap it with your seeking. David Lang