Thread: Performance Tuning Document?
Hi, I'm playing with OSDB (http://osdb.sf.net/) and trying to get the best numbers possible out of it. I haven't been able to find anything resembling a performance tuning document. Does such a thing exist? Bruce's "Hardware performance tuning" guide mentions a bunch of options but doesn't suggest what they should be set to. It also implies that, simplistically stated, "bigger is better, until it makes you swap", but that seems not to be always true: Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a large number of shared_buffers causes severe slowdown on one of the tests -- it goes from a little over 200 seconds to nearly 2000. I suspect internal lock contention, or maybe it's just that the read() path in Linux is quicker than PG's own cache? Any tips and tricks available? Thus far, I have tried: tcpip_socket = true shared_buffers = {1024,10240,102400} max_fsm_relations = 100 max_fsm_pages = 10000 max_locks_per_transaction = 256 wal_buffers = 1024 and haven't really been able to explain the numbers (which I will post shortly, once I have been able to try a few more parameters). Matthew.
> I'm playing with OSDB (http://osdb.sf.net/) and trying to get > the best numbers possible out of it. > > I haven't been able to find anything resembling a performance > tuning document. Does such a thing exist? Unfortunately, not in any complete sense. There are a few guides from Bruce that make a good effort, but there seems to be a *lot* of other information that can only be gleaned by either being a developper or following the list very closely for a few years. Bruce's hardware tuning guide also doesn't really give any sorts of guidelines or numbers to start from, it merely explains concepts and leaves the investigation and twiddling to you. > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > large number of shared_buffers causes severe slowdown on one of > the tests -- it goes from a little over 200 seconds to nearly > 2000. I suspect internal lock contention, or maybe it's just > that the read() path in Linux is quicker than PG's own cache? > > Any tips and tricks available? Yes. Huge, raging amounts of shared buffers do have the consequence of diminishing your disk cache size. You want to make sure that you can always keep the *entire* database in disk cache, or you end up taking a performance hit by having to read from disk, in the same spirit of keeping your machine from swapping. Steve
On Thu, 28 Mar 2002, Steve Wolfe wrote: > > I haven't been able to find anything resembling a performance > > tuning document. Does such a thing exist? > > Unfortunately, not in any complete sense. > > There are a few guides from Bruce that make a good effort, but there > seems to be a *lot* of other information that can only be gleaned by > either being a developper or following the list very closely for a few > years. Bruce's hardware tuning guide also doesn't really give any > sorts of guidelines or numbers to start from, it merely explains > concepts and leaves the investigation and twiddling to you. Yep. Guess I have a few busy evenings and weekends ahead... > > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > > large number of shared_buffers causes severe slowdown on one of > > the tests -- it goes from a little over 200 seconds to nearly > > 2000. I suspect internal lock contention, or maybe it's just > > that the read() path in Linux is quicker than PG's own cache? > Huge, raging amounts of shared buffers do have the consequence of > diminishing your disk cache size. You want to make sure that you can > always keep the *entire* database in disk cache, or you end up taking > a performance hit by having to read from disk, in the same spirit of > keeping your machine from swapping. I watched the run with "vmstat 1" and there was very close to zero disk read activity (and no swapping). My test machine has ~400Mb or RAM and the shared_buffers shouldn't have been over 80Mb, so that makes sense to me. I guess there's a linear list, a poor hash function, or some heavy lock contention somewhere causing this. Matthew.
> Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > large number of shared_buffers causes severe slowdown on one of > the tests -- it goes from a little over 200 seconds to nearly > 2000. I suspect internal lock contention, or maybe it's just > that the read() path in Linux is quicker than PG's own cache? Matthew, are you using the --postgresql=no_hash_index option for OSDB? It's conceivable that you are hitting an artifactof the hash index problem here. /andy --- Andy Riebs, andy.riebs@compaq.com High Performance Technical (w) 603-884-1521, (fax) 603-884-0630 Computing/Linux Group <http://cub.sourceforge.net/> Compaq Computer Corporation (h) ariebs@earthlink.net <http://www.compaq.com/linux> <http://osdb.sourceforge.net/> <http://opensource.compaq.com>
Matthew Kirkwood wrote: > Hi, > > I'm playing with OSDB (http://osdb.sf.net/) and trying to get > the best numbers possible out of it. > > I haven't been able to find anything resembling a performance > tuning document. Does such a thing exist? > > Bruce's "Hardware performance tuning" guide mentions a bunch > of options but doesn't suggest what they should be set to. It > also implies that, simplistically stated, "bigger is better, > until it makes you swap", but that seems not to be always true: > > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > large number of shared_buffers causes severe slowdown on one of > the tests -- it goes from a little over 200 seconds to nearly > 2000. I suspect internal lock contention, or maybe it's just > that the read() path in Linux is quicker than PG's own cache? > > Any tips and tricks available? Thus far, I have tried: Gererally, I think 1/4 RAM for shared buffers is a good start, and perhaps 2-4% for sort memory. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
El Mar 28, Bruce Momjian escribio: > Matthew Kirkwood wrote: > > Hi, > > > > I'm playing with OSDB (http://osdb.sf.net/) and trying to get > > the best numbers possible out of it. > > Any tips and tricks available? Thus far, I have tried: > > Gererally, I think 1/4 RAM for shared buffers is a good start, and > perhaps 2-4% for sort memory. That 2-4% means 2-4% per backend, or totalled? I don't have your experience, but I think that depends heavily on schemas and queries, more than shared buffers. -- Alvaro Herrera (<alvherre[a]atentus.com>) "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
Alvaro Herrera wrote: > El Mar 28, Bruce Momjian escribio: > > > Matthew Kirkwood wrote: > > > Hi, > > > > > > I'm playing with OSDB (http://osdb.sf.net/) and trying to get > > > the best numbers possible out of it. > > > > Any tips and tricks available? Thus far, I have tried: > > > > Gererally, I think 1/4 RAM for shared buffers is a good start, and > > perhaps 2-4% for sort memory. > > That 2-4% means 2-4% per backend, or totalled? I don't have your > experience, but I think that depends heavily on schemas and queries, > more than shared buffers. It is per-backend, and sort memory only helps for certain ORDER BY queries or mergejoins that can't use an index. It is a very hard value to set and usually is done to improve certain queries you can't fix another way. However, the default is unusually small so it will work on all machines so upping it isn't a major problem. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Thu, 28 Mar 2002, Riebs, Andy wrote: > > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > > large number of shared_buffers causes severe slowdown on one of > > the tests -- it goes from a little over 200 seconds to nearly > > 2000. I suspect internal lock contention, or maybe it's just > > that the read() path in Linux is quicker than PG's own cache? > > Matthew, are you using the --postgresql=no_hash_index option for OSDB? > It's conceivable that you are hitting an artifact of the hash index > problem here. Ah, that would make a lot of sense. I'll do a run again with that option and see what turns up. Thanks, Matthew.
----- Original Message ----- From: "Steve Wolfe" > > Yes. Huge, raging amounts of shared buffers do have the consequence of > diminishing your disk cache size. You want to make sure that you can > always keep the *entire* database in disk cache, or you end up taking a > performance hit by having to read from disk, in the same spirit of keeping > your machine from swapping. > if u imply that you'd better leave huge amounts of memory to the OS buffer cache rather than give them to the DB buffer cache then I strongly disagree. To name two important issues: - the OS block size may be different from the DB block size (which is especially true for Postgres where if I'm not mistaken the DB block is fixed to 8KB). You want to keep DB blocks in memory regardless of the fact if the relevant OS blocks are in the OS buffer cache (just consider the cases where the DB block is larger/smaller than the OS block) - the OS buffer cache is designed to satisfy all different kinds of processes while the DB buffer cache is designed to suit only the database, so it should suit it better. For example (some day) you may have multiple buffer caches with different queue strategies (just like the KEEP and RECYCLE buffer pools in Oracle) while the OS will give you only LRU queuing A good on-topic reading is "Avoid Buffered I/O" by Steve Adams available at http://www.ixora.com.au/tips/avoid_buffered_io.htm also, I don't want to "always keep the *entire* database in disk cache" - I want to keep cached only the frequently accessed parts of the data Marin ---- "...what you brought from your past, is of no use in your present. When you must choose a new path, do not bring old experiences with you. Those who strike out afresh, but who attempt to retain a little of the old life, end up torn apart by their own memories. "
Hello, I'm new to Postgresql and plpgslq. I wrote a plpgsql to return epoch time from a table, but had problem running it. The error returned: NOTICE: Error occurred while executing PL/pgSQL function sleeptime NOTICE: line 10 at assignment ERROR: Bad timestamp external representation 'rec_runtime.runtime' My plpgsql function: create function sleeptime () returns float as ' declare rec_runtime record; ret_sleepsecs float; begin select into rec_runtime runtime from mon_nextrun order by runtime limit 1; if rec_runtime.runtime is null then return 60; end if; ret_sleepsecs := extract (epoch from timestamp ''rec_runtime.runtime'') as float; return ret_sleepsecs; end; ' language 'plpgsql'; What is the correct syntax for extract function in the assignment statement? Thanks in advance. -CT
> if u imply that you'd better leave huge amounts of memory to the OS buffer > cache rather than give them to the DB buffer cache then I strongly disagree. Good, because that wasn't exactly what I was implying. I was implying that increasing either disk cache or shared buffers to ridiculous limits *at the expense of the other* can potentiall be wasteful and even counter-productive. > A good on-topic reading is "Avoid Buffered I/O" by Steve Adams available at > http://www.ixora.com.au/tips/avoid_buffered_io.htm Thanks, if I ever switch from PG to Oracle, I'll keep that in mind. Now, for a little bit of REAL WORLD experience. Once you've got your sort memory and shared buffers to certain levels, increasing them isn't going to help you. In my case, I increased them until I stopped seeing performance increases, then quadrupled them anyway. Increasing them further is *not* going to help me. However, keeping the database in memory cache *does* help me. Even under very significant load (4 processors going full-tilt!), the disk lights only blink *occasionally*, and that's a good thing. Disk bottlenecks really suck. > also, I don't want to "always keep the *entire* database in disk cache" - I > want to keep cached only the frequently accessed parts of the data Why? Do I/O bottlenecks excite you? If you have the RAM, not using it is wasteful. Disks are a place to store data for when the power goes out, not where you want to do your database work from. steve
Oxeye wrote: > I'm new to Postgresql and plpgslq. I wrote a plpgsql to return epoch time > from a table, but had problem running it. The error returned: > > NOTICE: Error occurred while executing PL/pgSQL function sleeptime > NOTICE: line 10 at assignment > ERROR: Bad timestamp external representation 'rec_runtime.runtime' > > My plpgsql function: > > create function sleeptime () returns float as ' > declare > rec_runtime record; > ret_sleepsecs float; > begin > select into rec_runtime runtime from mon_nextrun order by runtime > limit 1; > if rec_runtime.runtime is null > then > return 60; > end if; > ret_sleepsecs := extract (epoch from timestamp > ''rec_runtime.runtime'') as float; > return ret_sleepsecs; > end; > ' language 'plpgsql'; > Just a bit of overkill :). You can easily do that without resorting to a procedural language. If you prefer your queries functional looking: SELECT coalesce( date_part('epoch', runtime ), 60)::float FROM mon_nextrun Or you can use the more 'SQLish' (and verbose) bare word look: SELECT CAST ( CASE WHEN runtime IS NULL THEN 60 ELSE EXTRACT (epoch FROM runtime) END AS FLOAT ) AS sleeptime FROM mon_nextrun If you want it so you get a scalar from sleeptime() just wrap either of those in an sql function. CREATE OR REPLACE FUNCTION sleeptime() RETURNS FLOAT AS ' <query goes here> ' LANGUAGE SQL; If you insert the first in, remeber to escape the single quotes. You could also make it a more general wrapper and take runtime as an argument. References: http://www.postgresql.org/idocs/index.php?functions-conditional.html http://www.postgresql.org/idocs/index.php?functions-datetime.html http://www.postgresql.org/idocs/index.php?xfunc.html
On Thu, 28 Mar 2002, Matthew Kirkwood wrote: [ oops, forgot to send this ages ago ] > > > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > > > large number of shared_buffers causes severe slowdown on one of > > > the tests -- it goes from a little over 200 seconds to nearly > > > 2000. > > --postgresql=no_hash_index > Ah, that would make a lot of sense. I'll do a run again with > that option and see what turns up. That was right on the nose. The numbers are much better now. My initial interest was in benchmarking different filesystems on Linux. In case anyone is interested, here are today's numbers: tuning? single ir cs-ir oltp cs-oltp (sec) (tps) (sec) (tps) (sec) ext3 kn 841.28 61.52 203.33 407.58 159.72 ext3-wb kn 841.19 63.73 217.19 406.30 160.88 ext3-jd kn 839.96 58.96 203.02 307.85 159.89 jfs kn 840.53 62.74 205.90 348.33 177.70 minix kn 841.51 62.12 201.44 343.87 176.68 ext2 kn 840.72 65.02 205.40 338.20 182.22 ext3-wb is ext3 with the "data=writeback" mount option. ext3-jd is ext3 with "data=journal" and a 200Mb journal instead of the usual 32Mb one. All filesystems were mounted noatime. postgresql.conf for all these runs looks like: tcpip_socket = true shared_buffers = 10240 max_fsm_relations = 100 max_fsm_pages = 10000 max_locks_per_transaction = 256 wal_buffers = 10240 sort_mem = 5120000 vacuum_mem = 81920 Without hash indexes, it looks like only OLTP loads can differentiate the filesystems. Sometime (once I have got a more recent kernel going) I'll try a dataset larger than memory. Matthew. ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) --ELM1033070448-26881-0_--
On Thu, 28 Mar 2002, Matthew Kirkwood wrote: [ oops, forgot to send this ages ago ] > > > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > > > large number of shared_buffers causes severe slowdown on one of > > > the tests -- it goes from a little over 200 seconds to nearly > > > 2000. > > --postgresql=no_hash_index > Ah, that would make a lot of sense. I'll do a run again with > that option and see what turns up. That was right on the nose. The numbers are much better now. My initial interest was in benchmarking different filesystems on Linux. In case anyone is interested, here are today's numbers: tuning? single ir cs-ir oltp cs-oltp (sec) (tps) (sec) (tps) (sec) ext3 kn 841.28 61.52 203.33 407.58 159.72 ext3-wb kn 841.19 63.73 217.19 406.30 160.88 ext3-jd kn 839.96 58.96 203.02 307.85 159.89 jfs kn 840.53 62.74 205.90 348.33 177.70 minix kn 841.51 62.12 201.44 343.87 176.68 ext2 kn 840.72 65.02 205.40 338.20 182.22 ext3-wb is ext3 with the "data=writeback" mount option. ext3-jd is ext3 with "data=journal" and a 200Mb journal instead of the usual 32Mb one. All filesystems were mounted noatime. postgresql.conf for all these runs looks like: tcpip_socket = true shared_buffers = 10240 max_fsm_relations = 100 max_fsm_pages = 10000 max_locks_per_transaction = 256 wal_buffers = 10240 sort_mem = 5120000 vacuum_mem = 81920 Without hash indexes, it looks like only OLTP loads can differentiate the filesystems. Sometime (once I have got a more recent kernel going) I'll try a dataset larger than memory. Matthew. ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) --ELM1033070448-26881-0_--
I'd like to write a function that parses XML (which is stored in a table column), returning element values as part of the recordset. For example: SELECT id, title, XML2Columns(article_xml) FROM mytable I don't have a hope of writing something like this in C or Perl, but I could do it in Java, using Xerces. Has anyone successfully used Java as a Procedural Language for PostGreSQL? Or is it possible that a function like this exists already? Many thanks Tom -----------------+ tom dyson t: +44 (0)1608 811870 m: +44 (0)7958 752657 http://torchbox.com
If you are using version 7.2.x, plperlu ("untrusted" Perl) will let you use any of the Perl modules which do the sort of thing you want. To install plperlu, check the instructions under "Procedural Languages" in the docs. --- tom dyson <tom@torchbox.com> wrote: > I'd like to write a function that parses XML (which > is stored in a table > column), returning element values as part of the > recordset. For example: > > SELECT id, title, XML2Columns(article_xml) > FROM mytable > > I don't have a hope of writing something like this > in C or Perl, but I could > do it in Java, using Xerces. Has anyone successfully > used Java as a > Procedural Language for PostGreSQL? Or is it > possible that a function like > this exists already? > > Many thanks > > Tom > -----------------+ > tom dyson > t: +44 (0)1608 811870 > m: +44 (0)7958 752657 > http://torchbox.com > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the > unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/
Look in the contrib directory for 7.2. There is a XML parser in there, but I have not used it yet. On Tue, 16 Apr 2002 14:51:45 +0100 tom dyson <tom@torchbox.com> wrote: > I'd like to write a function that parses XML (which is stored in a table > column), returning element values as part of the recordset. For example: > > SELECT id, title, XML2Columns(article_xml) > FROM mytable > > I don't have a hope of writing something like this in C or Perl, but I > could do it in Java, using Xerces. Has anyone successfully used Java as > a Procedural Language for PostGreSQL? Or is it possible that a function > like this exists already? > > Many thanks > > Tom > -----------------+ > tom dyson > t: +44 (0)1608 811870 > m: +44 (0)7958 752657 > http://torchbox.com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
> Has anyone successfully used Java as a > Procedural Language for PostGreSQL? Java procedural language (beta) home: http://pljava.sourceforge.net Cheers, Jean-Michel
Bruce Momjian wrote: > Matthew Kirkwood wrote: > > Hi, > > > > I'm playing with OSDB (http://osdb.sf.net/) and trying to get > > the best numbers possible out of it. > > > > I haven't been able to find anything resembling a performance > > tuning document. Does such a thing exist? > > > > Bruce's "Hardware performance tuning" guide mentions a bunch > > of options but doesn't suggest what they should be set to. It > > also implies that, simplistically stated, "bigger is better, > > until it makes you swap", but that seems not to be always true: > > > > Under the "crossSectionTests(Mixed IR)" part of an OSDB run, a > > large number of shared_buffers causes severe slowdown on one of > > the tests -- it goes from a little over 200 seconds to nearly > > 2000. I suspect internal lock contention, or maybe it's just > > that the read() path in Linux is quicker than PG's own cache? > > > > Any tips and tricks available? Thus far, I have tried: > > Gererally, I think 1/4 RAM for shared buffers is a good start, and > perhaps 2-4% for sort memory. > I have added this to my performance paper: As a start for tuning, use 25% of ram for cache size, and 2-4% for sort size. Increase if no swapping, and decrease to prevent swapping. Of course, if the frequently accessed tables already fit in the cache, continuing to increase the cache size no longer dramatically improves performance. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > <snip> > > I have added this to my performance paper: > > As a start for tuning, use 25% of ram for cache size, and 2-4% for sort > size. Increase if no swapping, and decrease to prevent swapping. Of > course, if the frequently accessed tables already fit in the cache, > continuing to increase the cache size no longer dramatically improves > performance. That's pretty cool. Reckon you should add in a good and clear example for people though, just to make it 100% clear in their heads. :-) Regards and best wishes, Justin Clift > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
I have added this URL to the TODO list. --------------------------------------------------------------------------- Jean-Michel POURE wrote: > > Has anyone successfully used Java as a > > Procedural Language for PostGreSQL? > Java procedural language (beta) home: > http://pljava.sourceforge.net > > Cheers, > Jean-Michel > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026