Thread: Major Performance decrease after some hours
Hi all, i have a Tomcat application with PostgreSQL 8.1.4 running which performs about 10000 inserts/deletes every 2-4 minutes and updates on a database and after some hours of loadtesting the top output says 0.0% idle, 6-7% system load, load average 32, 31, 28 and there are many exceptions at statement execution like: An I/O error occured while sending to the backend. There are 10-15 postmaster processes running which use all the CPU power. A restart of tomcat and then postgresql results in the same situation. Some postgres processes are in DELETE waiting or SELECT waiting. VACUUM runs through in just about 1-2 seconds and is run via cron every minute and reports that the fsm setting are high enough. The situation just gets better if tomcat is stopped and postgresql is restarted -> 90%idle The tomcat application talks to another application on the server which uses much CPU cycles at startup of tomcat and my observation is that after this startup with 0.0% idle, PostgreSQL can't recover from this situation. Dual Xeon Server machines (Dell PowerEdge 2850) using Heartbeat1. Each server has only one harddisk, no RAID configuration is used. - Each Cluster has 4 drbd Devices, one for the PostgreSQL data - Two of these clusters are using the same PostgreSQL installation to share the data, the database can be moved from one cluster to the other in case of failure - OS: Debian Sarge with postgresql 8.1.4 - Two cronjobs are configured to perform a "vacuumdb --all" every 1 minute and a "vacuumdb --all--analyze" every 23 minutes - There are 3 applications using the PostgreSQL installation, each with their own database. - The main application is based on Tomcat 4.1-30 partly a Web-Application, partly a terminal login protocol based on http and XML, so the database access is of course done using JDBC - A cronjob is configured to perform a pg_dump of the main database every 4 hours Any ideas are welcome. thx, Peter
2006/10/1, Peter Bauer <peter.m.bauer@gmail.com>: > Hi all, > > i have a Tomcat application with PostgreSQL 8.1.4 running which > performs about 10000 inserts/deletes every 2-4 minutes and updates on > a database and after some hours of loadtesting the top output says > 0.0% idle, 6-7% system load, load average 32, 31, 28 and there are > many exceptions at statement execution like: > An I/O error occured while sending to the backend. > > There are 10-15 postmaster processes running which use all the CPU power. > A restart of tomcat and then postgresql results in the same situation. > Some postgres processes are in DELETE waiting or SELECT waiting. > VACUUM runs through in just about 1-2 seconds and is run via cron > every minute and reports that the fsm setting are high enough. > > The situation just gets better if tomcat is stopped and postgresql is > restarted -> 90%idle > > The tomcat application talks to another application on the server > which uses much CPU cycles at startup of tomcat and my observation is > that after this startup with 0.0% idle, PostgreSQL can't recover from > this situation. > > Dual Xeon Server > machines (Dell PowerEdge 2850) using Heartbeat1. Each server has only > one harddisk, no RAID configuration is used. > - Each Cluster has 4 drbd Devices, one for the PostgreSQL data > - Two of these clusters are using the same PostgreSQL installation to > share the data, the database can be moved from one cluster to the > other in case of failure > - OS: Debian Sarge with postgresql 8.1.4 > - Two cronjobs are configured to perform a "vacuumdb --all" > every 1 minute and a "vacuumdb --all--analyze" every 23 > minutes > - There are 3 applications using the PostgreSQL installation, each > with their own database. > - The main application is based on Tomcat 4.1-30 partly a > Web-Application, partly a terminal login protocol based on http and > XML, so the database access is of course done using JDBC > - A cronjob is configured to perform a pg_dump of the main database > every 4 hours > > > Any ideas are welcome. > > thx, > Peter > Hi again, i decreased the load of the test and the system is stable for some hours now, no Exceptions so far, top says 80-90% idle. The question is what are the reasons for this behaviour? Is a hardware upgrade required? How can i check how bad the condition of the database is? thx, Peter
Hi, a few random question... > > i have a Tomcat application with PostgreSQL 8.1.4 running which > > performs about 10000 inserts/deletes every 2-4 minutes and updates on > > a database and after some hours of loadtesting the top output says > > 0.0% idle, 6-7% system load, load average 32, 31, 28 Who is responsible for this high load values? Do you see many postmaster processes at the top? > and there are > > many exceptions at statement execution like: > > An I/O error occured while sending to the backend. Is this from the application? How many connections does the application open in parallel? Using JDBC, I guess? > > - Each Cluster has 4 drbd Devices, one for the PostgreSQL data > > - Two of these clusters are using the same PostgreSQL installation to > > share the data, the database can be moved from one cluster to the > > other in case of failure Just to be 100% sure: just one server at a time runs PostgreSQL on that shared data disk, right? Bye, Chris.
Peter Bauer wrote: [...] > There are 10-15 postmaster processes running which use all the CPU power. > A restart of tomcat and then postgresql results in the same situation. > Some postgres processes are in DELETE waiting or SELECT waiting. > VACUUM runs through in just about 1-2 seconds and is run via cron > every minute and reports that the fsm setting are high enough. Pure speculation: are you sure you aren't vacuuming too agressively? The DELETE waiting and SELECT waiting sound to me like they are waiting for a lock that another vacuum is holding. You've said that you spawn a vacumm process by cron every minute, that may be fine when the server isn't too loaded. But when vacuums begins to take longer cron will spawn more and more vacuums killing your machine. I'll add a little check in the cron script to see if there isn't already a vacuum process running or use an idependant script, while [ 1 ] do vacuum --all sleep 60 done [...] HTH, -- MaXX
2006/10/1, Chris Mair <chrisnospam@1006.org>: > Hi, > > a few random question... > > > > i have a Tomcat application with PostgreSQL 8.1.4 running which > > > performs about 10000 inserts/deletes every 2-4 minutes and updates on > > > a database and after some hours of loadtesting the top output says > > > 0.0% idle, 6-7% system load, load average 32, 31, 28 > > Who is responsible for this high load values? > Do you see many postmaster processes at the top? yes, there are about 10 postmaster processes in top which eat up all of the CPU cycles at equal parts. > > and there are > > > many exceptions at statement execution like: > > > An I/O error occured while sending to the backend. > > Is this from the application? Yes, this happens when the sql statements are executed. > How many connections does the application open in parallel? 30-40, one tomcat runs on the same machine as PostgreSQL, another runs on the other cluster and uses the database via network. > Using JDBC, I guess? yes, postgresql-8.1-407.jdbc3 is used > > > - Each Cluster has 4 drbd Devices, one for the PostgreSQL data > > > - Two of these clusters are using the same PostgreSQL installation to > > > share the data, the database can be moved from one cluster to the > > > other in case of failure > > Just to be 100% sure: just one server at a time runs PostgreSQL on > that shared data disk, right? One server of a cluster runs the database and the tomcat, the other one just runs tomcat, so they share this common database. > Bye, > Chris. thx, Peter
MaXX wrote: >> There are 10-15 postmaster processes running which use all the CPU >> power. >> A restart of tomcat and then postgresql results in the same situation. >> Some postgres processes are in DELETE waiting or SELECT waiting. >> VACUUM runs through in just about 1-2 seconds and is run via cron >> every minute and reports that the fsm setting are high enough. > > Pure speculation: are you sure you aren't vacuuming too agressively? Have you tried autovacuum? You can manually tweak the settings so that your hot spot tables are vacuumed aggressivly. One of the main upsides to autovacuum is tha tit saves cycles, that is, it only vacuums when you need it.
"Peter Bauer" <peter.m.bauer@gmail.com> writes: > yes, there are about 10 postmaster processes in top which eat up all > of the CPU cycles at equal parts. What are these processes doing exactly --- can you show us the queries they're executing? It might be worth attaching to a few of them with gdb to get stack traces, to see if there's any commonality about the traces: $ gdb /path/to/postgres PID-of-backend gdb> bt gdb> quit are you sure? y $ <repeat with other backends, and/or same backend after a few sec> regards, tom lane
2006/10/1, MaXX <bs139412@skynet.be>: > Peter Bauer wrote: > > 2006/10/1, MaXX <bs139412@skynet.be>: > >> Peter Bauer wrote: > >> [...] > >> > There are 10-15 postmaster processes running which use all the CPU > >> power. > >> > A restart of tomcat and then postgresql results in the same situation. > >> > Some postgres processes are in DELETE waiting or SELECT waiting. > >> > VACUUM runs through in just about 1-2 seconds and is run via cron > >> > every minute and reports that the fsm setting are high enough. > >> > >> Pure speculation: are you sure you aren't vacuuming too agressively? > > > > The previous configuration was to vacuum all 10 minutes. This resulted > > in 60-70% system load (harddisk i suppose) so i thought more vacuums > > would be a good idea. > It won't hurt to ensure that you will never have 2 vacuum processes > running at the same time... > >> The DELETE waiting and SELECT waiting sound to me like they are waiting > >> for a lock that another vacuum is holding. You've said that you spawn a > >> vacumm process by cron every minute, that may be fine when the server > >> isn't too loaded. But when vacuums begins to take longer cron will spawn > >> more and more vacuums killing your machine. > > When the postmaster processes eat up all CPU cycles the vacuum still > > does only take some seconds and there are no postgres VACUUM processes > > hanging around, so i don't think they hamper each other. > > I have the impression that the database "condition" is getting worse > > over time so the queries take longer and longer and at some point the > > I/O Exceptions start because the data cannot be written to the disk in > > time. I just don't know how to pinpoint this bad "condition" or > > whatever gets worse over time. > > you can connect to your server with pgAdmin, go to "tools -> server > status" to identify wich process is causing troubles on a sunday ;-) > (you may have to turn on some logging option on postresql.conf) I find > it more friendly than psql but it's a matter of taste. > You are running pg on a Linux platform, try systat and iostat. (I'm > running FreeBSD, I suppose those 2 are availlable on Linux) > > Long running transactions are not you friends too... I once made a typo > in a script which prevented commits from appenning... not good... This was the first thing we checked when these problems appeared. The only locking is done in functions which should not be a problem. Of course it is possible that there are bugs which prevent some transactions from being commited, but its a pretty big application so i would need some advice for finding and debugging such problems. Is it possible to check which transactions are currently running, which thread or java process runs them and for how long? I will attach a logfile to the reply to Toms mail which contains all sql statements executed from the start of the loadtest to the point the problems began, so maybe you can find something in it. thx, Peter
2006/10/1, Matthew T. O'Connor <matthew@zeut.net>: > MaXX wrote: > >> There are 10-15 postmaster processes running which use all the CPU > >> power. > >> A restart of tomcat and then postgresql results in the same situation. > >> Some postgres processes are in DELETE waiting or SELECT waiting. > >> VACUUM runs through in just about 1-2 seconds and is run via cron > >> every minute and reports that the fsm setting are high enough. > > > > Pure speculation: are you sure you aren't vacuuming too agressively? > > Have you tried autovacuum? You can manually tweak the settings so that > your hot spot tables are vacuumed aggressivly. One of the main upsides > to autovacuum is tha tit saves cycles, that is, it only vacuums when you > need it. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > Autovacuum is enabled and runs in addition to the configured cronjob vacuums. I will attach logfiles to the reply to Toms mail, so please have a look at it if autovacuum can be reconfigured to replace the cronjobs. thx, Peter
"Peter Bauer" <peter.m.bauer@gmail.com> writes: > Attached you can find the postgresql logfiles and a logfile which > contains alls SQL statements executed in the relevant time together > with the excpetions thrown. I also attached a file with all used > Pl/pgSQL functions. Since we were not able to find a problem for so > long, i think it makes no sense to filter the information because we > are probably not looking on the right spots. I hope the pgsql-general moderators are not going to approve an 8MB message :-(. I'll hang onto my copy though, just in case it proves useful. >> It might be worth attaching to a few of them with >> gdb to get stack traces, > Should this be done when the problems are appearing or during normal > operation or both to compare the results? When the problems are appearing. regards, tom lane
2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: > "Peter Bauer" <peter.m.bauer@gmail.com> writes: > > Attached you can find the postgresql logfiles and a logfile which > > contains alls SQL statements executed in the relevant time together > > with the excpetions thrown. I also attached a file with all used > > Pl/pgSQL functions. Since we were not able to find a problem for so > > long, i think it makes no sense to filter the information because we > > are probably not looking on the right spots. > > I hope the pgsql-general moderators are not going to approve an 8MB > message :-(. I'll hang onto my copy though, just in case it proves > useful. Sorry, i put the files on http://dagobert.apus.co.at/pglogfiles/ so please use them. > >> It might be worth attaching to a few of them with > >> gdb to get stack traces, > > > Should this be done when the problems are appearing or during normal > > operation or both to compare the results? > > When the problems are appearing. ok, i will try to reproduce it and post the stack traces thx, Peter
On Sun, Oct 01, 2006 at 12:55:51PM +0200, MaXX wrote: > > Pure speculation: are you sure you aren't vacuuming too agressively? > The DELETE waiting and SELECT waiting sound to me like they are waiting > for a lock that another vacuum is holding. How would one determine the lock situation definitively? Is there an internal mechanism that can be queried?
Ray Stell <stellr@cns.vt.edu> writes: > How would one determine the lock situation definitively? Is there > an internal mechanism that can be queried? pg_locks view. regards, tom lane
Hi all, inspired by the last posting "Weird disk write load caused by PostgreSQL?" i increased the work_mem from 1 to 7MB and did some loadtest with vacuum every 10 minutes. The system load (harddisk) went down and everything was very stable at 80% idle for nearly 24 hours! I am currently performing some pgbench runs to evaluate the hardware and configuration for the system but i think the biggest problems are solved so far. thx everybody, Peter 2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: > Ray Stell <stellr@cns.vt.edu> writes: > > How would one determine the lock situation definitively? Is there > > an internal mechanism that can be queried? > > pg_locks view. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
It appears to me that work_mem is a more significant configuration option than previously assumed by many PostgreSQL users, myself included. As with many database optimizations, it's an obscure problem to diagnose because you generally only observe it through I/O activity. One possibility would be to log a warning whenever work_mem is exceeded (or exceeded by a certain ratio). I would also love a couple of new statistics counters tracking the amount of work memory used and the amount of work memory that has spilled over into pgsql_tmp. Alexander. On Oct 5, 2006, at 10:48 , Peter Bauer wrote: > Hi all, > > inspired by the last posting "Weird disk write load caused by > PostgreSQL?" i increased the work_mem from 1 to 7MB and did some > loadtest with vacuum every 10 minutes. The system load (harddisk) went > down and everything was very stable at 80% idle for nearly 24 hours! > I am currently performing some pgbench runs to evaluate the hardware > and configuration for the system but i think the biggest problems are > solved so far. > > thx everybody, > Peter > > 2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: >> Ray Stell <stellr@cns.vt.edu> writes: >> > How would one determine the lock situation definitively? Is there >> > an internal mechanism that can be queried? >> >> pg_locks view. >> >> regards, tom lane >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 2: Don't 'kill -9' the postmaster >> > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
I finished the little benchmarking on our server and the results are quite curios. With the numbers from http://sitening.com/tools/postgresql-benchmark/ in mind i did ./pgbench -i pgbench and then performed some pgbench tests, for example ./pgbench -c 1 -t 1000 -s 1 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 1000 number of transactions actually processed: 1000/1000 tps = 50.703609 (including connections establishing) tps = 50.709265 (excluding connections establishing) So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the performance of the server described in the article! I did some tests on a Xen machine running on my workstation and the results are about 400-500tps which seems to be quite reasonable. I also tried to disable drbd and put the data directory elsewhere, but the performance was the same. any ideas? thx, Peter 2006/10/5, Alexander Staubo <alex@purefiction.net>: > It appears to me that work_mem is a more significant configuration > option than previously assumed by many PostgreSQL users, myself > included. As with many database optimizations, it's an obscure > problem to diagnose because you generally only observe it through I/O > activity. > > One possibility would be to log a warning whenever work_mem is > exceeded (or exceeded by a certain ratio). I would also love a couple > of new statistics counters tracking the amount of work memory used > and the amount of work memory that has spilled over into pgsql_tmp. > > Alexander. > > On Oct 5, 2006, at 10:48 , Peter Bauer wrote: > > > Hi all, > > > > inspired by the last posting "Weird disk write load caused by > > PostgreSQL?" i increased the work_mem from 1 to 7MB and did some > > loadtest with vacuum every 10 minutes. The system load (harddisk) went > > down and everything was very stable at 80% idle for nearly 24 hours! > > I am currently performing some pgbench runs to evaluate the hardware > > and configuration for the system but i think the biggest problems are > > solved so far. > > > > thx everybody, > > Peter > > > > 2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: > >> Ray Stell <stellr@cns.vt.edu> writes: > >> > How would one determine the lock situation definitively? Is there > >> > an internal mechanism that can be queried? > >> > >> pg_locks view. > >> > >> regards, tom lane > >> > >> ---------------------------(end of > >> broadcast)--------------------------- > >> TIP 2: Don't 'kill -9' the postmaster > >> > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > >
I forgot to mention that top does not show a noticeable increase of CPU or system load during the pgbench runs (postmaster has 4-8% CPU). Shouldn't the machine be busy during such a test? thx, Peter 2006/10/5, Peter Bauer <peter.m.bauer@gmail.com>: > I finished the little benchmarking on our server and the results are > quite curios. > With the numbers from http://sitening.com/tools/postgresql-benchmark/ > in mind i did > ./pgbench -i pgbench > and then performed some pgbench tests, for example > ./pgbench -c 1 -t 1000 -s 1 pgbench > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > number of clients: 1 > number of transactions per client: 1000 > number of transactions actually processed: 1000/1000 > tps = 50.703609 (including connections establishing) > tps = 50.709265 (excluding connections establishing) > > So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the > performance of the server described in the article! > > I did some tests on a Xen machine running on my workstation and the > results are about 400-500tps which seems to be quite reasonable. > > I also tried to disable drbd and put the data directory elsewhere, but > the performance was the same. > > any ideas? > > thx, > Peter > > > 2006/10/5, Alexander Staubo <alex@purefiction.net>: > > It appears to me that work_mem is a more significant configuration > > option than previously assumed by many PostgreSQL users, myself > > included. As with many database optimizations, it's an obscure > > problem to diagnose because you generally only observe it through I/O > > activity. > > > > One possibility would be to log a warning whenever work_mem is > > exceeded (or exceeded by a certain ratio). I would also love a couple > > of new statistics counters tracking the amount of work memory used > > and the amount of work memory that has spilled over into pgsql_tmp. > > > > Alexander. > > > > On Oct 5, 2006, at 10:48 , Peter Bauer wrote: > > > > > Hi all, > > > > > > inspired by the last posting "Weird disk write load caused by > > > PostgreSQL?" i increased the work_mem from 1 to 7MB and did some > > > loadtest with vacuum every 10 minutes. The system load (harddisk) went > > > down and everything was very stable at 80% idle for nearly 24 hours! > > > I am currently performing some pgbench runs to evaluate the hardware > > > and configuration for the system but i think the biggest problems are > > > solved so far. > > > > > > thx everybody, > > > Peter > > > > > > 2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: > > >> Ray Stell <stellr@cns.vt.edu> writes: > > >> > How would one determine the lock situation definitively? Is there > > >> > an internal mechanism that can be queried? > > >> > > >> pg_locks view. > > >> > > >> regards, tom lane > > >> > > >> ---------------------------(end of > > >> broadcast)--------------------------- > > >> TIP 2: Don't 'kill -9' the postmaster > > >> > > > > > > ---------------------------(end of > > > broadcast)--------------------------- > > > TIP 4: Have you searched our list archives? > > > > > > http://archives.postgresql.org > > > > >
If you are on Linux, I recommend iostat(1) and vmstat(8) over top. Iostat will report I/O transfer statistics; it's how I discovered that work_mem buffers were spilling over to disk files. For Vmstat, look in particular at the load (ie., how many processes are competing for the scheduler) in the first field ("r") and how many processes are blocked by I/O waits ("b"). Alexander. On Oct 5, 2006, at 14:35 , Peter Bauer wrote: > I forgot to mention that top does not show a noticeable increase of > CPU or system load during the pgbench runs (postmaster has 4-8% CPU). > Shouldn't the machine be busy during such a test? > > thx, > Peter > > 2006/10/5, Peter Bauer <peter.m.bauer@gmail.com>: >> I finished the little benchmarking on our server and the results are >> quite curios. >> With the numbers from http://sitening.com/tools/postgresql-benchmark/ >> in mind i did >> ./pgbench -i pgbench >> and then performed some pgbench tests, for example >> ./pgbench -c 1 -t 1000 -s 1 pgbench >> starting vacuum...end. >> transaction type: TPC-B (sort of) >> scaling factor: 1 >> number of clients: 1 >> number of transactions per client: 1000 >> number of transactions actually processed: 1000/1000 >> tps = 50.703609 (including connections establishing) >> tps = 50.709265 (excluding connections establishing) >> >> So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the >> performance of the server described in the article! >> >> I did some tests on a Xen machine running on my workstation and the >> results are about 400-500tps which seems to be quite reasonable. >> >> I also tried to disable drbd and put the data directory elsewhere, >> but >> the performance was the same. >> >> any ideas? >> >> thx, >> Peter >> >> >> 2006/10/5, Alexander Staubo <alex@purefiction.net>: >> > It appears to me that work_mem is a more significant configuration >> > option than previously assumed by many PostgreSQL users, myself >> > included. As with many database optimizations, it's an obscure >> > problem to diagnose because you generally only observe it >> through I/O >> > activity. >> > >> > One possibility would be to log a warning whenever work_mem is >> > exceeded (or exceeded by a certain ratio). I would also love a >> couple >> > of new statistics counters tracking the amount of work memory used >> > and the amount of work memory that has spilled over into pgsql_tmp. >> > >> > Alexander. >> > >> > On Oct 5, 2006, at 10:48 , Peter Bauer wrote: >> > >> > > Hi all, >> > > >> > > inspired by the last posting "Weird disk write load caused by >> > > PostgreSQL?" i increased the work_mem from 1 to 7MB and did some >> > > loadtest with vacuum every 10 minutes. The system load >> (harddisk) went >> > > down and everything was very stable at 80% idle for nearly 24 >> hours! >> > > I am currently performing some pgbench runs to evaluate the >> hardware >> > > and configuration for the system but i think the biggest >> problems are >> > > solved so far. >> > > >> > > thx everybody, >> > > Peter >> > > >> > > 2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: >> > >> Ray Stell <stellr@cns.vt.edu> writes: >> > >> > How would one determine the lock situation definitively? >> Is there >> > >> > an internal mechanism that can be queried? >> > >> >> > >> pg_locks view. >> > >> >> > >> regards, tom lane >> > >> >> > >> ---------------------------(end of >> > >> broadcast)--------------------------- >> > >> TIP 2: Don't 'kill -9' the postmaster >> > >> >> > > >> > > ---------------------------(end of >> > > broadcast)--------------------------- >> > > TIP 4: Have you searched our list archives? >> > > >> > > http://archives.postgresql.org >> > >> > >> > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
it seems that the machine doesn't really care about the pgbench run. I did a pgbench -c 10 -t 10000 -s 10 pgbench and here is the output of vmstat 1 100 which has been started some seconds before pgbench: vmstat 1 100 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 10236 1240952 111324 553908 0 0 3 2 1 3 5 3 92 0 0 0 10236 1240944 111324 553908 0 0 0 0 167 48 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 20 169 52 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 44 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 164 44 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 47 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 47 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 164 49 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 40 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 41 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 45 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 48 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 164 42 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 45 0 0 100 0 0 0 10236 1240936 111332 553908 0 0 0 0 165 41 0 0 100 0 0 2 10236 1237688 111332 550256 0 0 0 10976 449 1793 13 1 86 0 0 2 10236 1237688 111332 550256 0 0 0 1928 345 3206 2 1 97 0 0 2 10236 1237680 111340 550256 0 0 0 1969 352 3012 4 1 95 0 0 2 10236 1237600 111340 550336 0 0 0 2096 353 2986 2 0 98 0 0 2 10236 1237608 111340 550336 0 0 0 1932 351 2985 1 2 97 0 0 2 10236 1237600 111340 550336 0 0 0 1836 350 3097 4 1 95 0 0 2 10236 1237600 111340 550336 0 0 0 1852 353 2971 3 1 95 0 4 1 10236 1237600 111340 550336 0 0 0 1975 372 2682 3 0 97 0 0 2 10236 1237608 111340 550336 0 0 0 2056 372 2458 2 1 96 0 0 2 10236 1237600 111340 550336 0 0 0 2028 377 2360 1 1 98 0 0 2 10236 1237588 111340 550336 0 0 0 2000 372 2630 3 1 95 0 0 2 10236 1237588 111340 550336 0 0 0 2044 372 2326 3 0 97 0 0 2 10236 1237588 111340 550336 0 0 0 1976 372 2171 2 0 98 0 0 2 10236 1237588 111340 550336 0 0 0 1972 383 2275 4 1 95 0 1 2 10236 1237588 111340 550336 0 0 0 1924 382 2500 3 1 95 0 0 2 10236 1237588 111340 550336 0 0 0 1804 372 2798 3 1 96 0 0 2 10236 1237588 111340 550336 0 0 0 1900 374 2974 6 2 92 0 0 2 10236 1237588 111340 550336 0 0 0 1820 361 2871 2 2 95 0 0 2 10236 1237576 111340 550336 0 0 0 1876 366 2762 4 1 95 0 0 2 10236 1237576 111340 550336 0 0 0 1904 370 2724 3 0 96 0 0 2 10236 1237576 111340 550336 0 0 0 1972 378 2585 6 0 93 0 6 1 10236 1237576 111340 550336 0 0 0 1800 371 2838 1 1 98 0 0 2 10236 1237576 111340 550336 0 0 0 1792 362 2826 3 1 96 0 0 2 10236 1237576 111340 550344 0 0 0 1804 362 3068 3 0 96 0 1 1 10236 1237560 111340 550360 0 0 0 1936 373 2718 2 1 96 0 6 1 10236 1237552 111340 550360 0 0 0 1788 365 2447 4 1 95 0 0 1 10236 1237552 111340 550360 0 0 0 1804 368 2362 5 2 93 0 0 2 10236 1237544 111340 550368 0 0 0 1908 370 2434 3 4 93 0 0 2 10236 1237544 111340 550368 0 0 0 1848 369 2360 4 1 94 0 0 2 10236 1237504 111340 550408 0 0 0 1796 358 2655 3 1 96 0 0 2 10236 1237496 111340 550416 0 0 0 1988 374 2491 4 1 95 0 0 2 10236 1237488 111340 550424 0 0 0 1960 372 2111 2 1 97 0 0 2 10236 1237488 111340 550424 0 0 0 1760 360 2433 4 1 95 0 0 2 10236 1237488 111340 550424 0 0 0 1944 374 2064 2 1 97 0 0 2 10236 1237496 111340 550424 0 0 0 1868 373 2169 3 0 97 0 4 1 10236 1237476 111340 550432 0 0 0 1868 372 2170 3 1 96 0 i did the same on the Xen machine: vmstat 1 100 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 4 3964 5916 99288 0 0 1 21 34 53 0 0 99 0 0 0 4 3964 5916 99288 0 0 0 0 50 20 0 0 100 0 0 0 4 3964 5916 99288 0 0 0 0 38 20 0 0 100 0 0 0 4 3964 5916 99288 0 0 0 0 33 18 0 0 100 0 0 0 4 3964 5916 99288 0 0 0 0 28 20 0 1 99 0 0 0 4 3964 5924 99288 0 0 0 28 50 25 0 0 100 0 0 0 4 3964 5924 99288 0 0 0 0 24 19 0 0 100 0 0 0 4 3968 5924 99288 0 0 0 0 37 20 0 0 100 0 0 0 4 3968 5924 99288 0 0 0 0 50 24 0 0 100 0 0 0 4 3772 5924 99292 0 0 0 0 33 29 0 0 100 0 0 0 4 3748 5924 99292 0 0 0 0 31 19 0 0 100 0 0 0 4 3744 5924 99292 0 0 0 0 43 25 0 0 100 0 1 0 12 2560 5496 100048 0 0 44 17004 1896 812 26 10 56 8 2 0 16 2388 5080 95152 0 0 180 25788 2505 11372 54 19 2 25 1 1 16 2576 5084 94956 0 0 0 6824 1215 23437 50 19 1 31 2 0 16 2668 5084 94872 0 0 0 6588 1188 22923 51 12 2 35 1 1 16 2600 5088 94840 0 0 0 8664 1701 22326 51 10 2 37 0 1 16 2432 5100 94968 0 0 0 5492 1183 9985 26 6 2 66 0 1 16 2464 5112 95048 0 0 0 2404 495 5670 18 5 1 76 1 0 16 2596 5112 94980 0 0 32 6036 1082 21986 42 16 2 41 1 0 16 2244 5124 95268 0 0 0 7740 1526 20645 37 14 2 48 2 1 16 2540 5108 95064 0 0 0 7016 1343 18769 46 12 2 41 0 1 16 2752 5108 94668 0 0 0 5244 1165 8660 16 6 1 77 0 2 16 2780 5116 94668 0 0 8 648 80 95 0 1 0 99 4 0 16 2736 5140 94716 0 0 0 1160 363 2556 9 2 1 88 1 0 16 2268 5148 95036 0 0 8 5112 853 21498 67 13 2 18 2 1 16 2788 5048 94676 0 0 8 7876 1535 21278 65 12 2 21 0 1 16 2764 5060 94788 0 0 0 5372 1203 7024 21 6 1 72 1 0 16 2648 5076 94932 0 0 12 3112 596 10241 24 9 2 65 1 0 16 2728 5056 94772 0 0 24 6152 1142 19822 47 16 4 34 1 0 16 2504 5068 95068 0 0 0 7196 1387 20474 50 17 2 32 1 1 16 2652 5064 94844 0 0 8 7108 1018 17572 56 8 3 33 0 1 16 2568 5076 94916 0 0 0 4460 1003 5825 14 1 1 84 0 1 16 2572 5076 94924 0 0 0 704 52 90 0 0 2 98 1 0 16 2300 5096 95116 0 0 0 3688 762 14174 52 9 2 37 1 1 16 2436 5080 95080 0 0 0 7256 1407 19964 66 21 1 12 1 0 16 2640 5092 94904 0 0 0 6504 1223 19809 69 15 1 15 0 1 16 2528 5104 94960 0 0 0 4964 1165 4573 18 3 1 78 1 1 16 2332 5116 95064 0 0 0 2492 521 7197 15 5 1 79 1 1 16 2580 5080 94900 0 0 0 5076 863 19775 60 14 2 24 2 0 16 2728 5020 94732 0 0 0 7636 1533 19246 51 15 1 33 0 0 16 2484 5032 94980 0 0 0 6068 1200 16340 50 16 2 32 1 0 16 2316 5044 95036 0 0 0 3940 934 3570 10 5 3 82 0 2 16 2788 5024 94628 0 0 0 640 63 46 0 0 0 100 0 1 16 2876 5060 94636 0 0 0 1212 402 602 2 0 3 95 1 1 16 2580 5072 94876 0 0 0 6440 1255 17866 68 11 2 19 1 0 16 2636 5084 94788 0 0 0 7188 1405 18753 66 16 1 17 0 1 16 2580 5084 94828 0 0 0 544 116 2877 12 1 0 87 2 1 16 2536 5104 94908 0 0 0 2968 656 5413 26 5 2 67 The only difference i see is that there is always at least one process waiting for I/O. thx, Peter 2006/10/5, Alexander Staubo <alex@purefiction.net>: > If you are on Linux, I recommend iostat(1) and vmstat(8) over top. > > Iostat will report I/O transfer statistics; it's how I discovered > that work_mem buffers were spilling over to disk files. For Vmstat, > look in particular at the load (ie., how many processes are competing > for the scheduler) in the first field ("r") and how many processes > are blocked by I/O waits ("b"). > > Alexander. > > On Oct 5, 2006, at 14:35 , Peter Bauer wrote: > > > I forgot to mention that top does not show a noticeable increase of > > CPU or system load during the pgbench runs (postmaster has 4-8% CPU). > > Shouldn't the machine be busy during such a test? > > > > thx, > > Peter > > > > 2006/10/5, Peter Bauer <peter.m.bauer@gmail.com>: > >> I finished the little benchmarking on our server and the results are > >> quite curios. > >> With the numbers from http://sitening.com/tools/postgresql-benchmark/ > >> in mind i did > >> ./pgbench -i pgbench > >> and then performed some pgbench tests, for example > >> ./pgbench -c 1 -t 1000 -s 1 pgbench > >> starting vacuum...end. > >> transaction type: TPC-B (sort of) > >> scaling factor: 1 > >> number of clients: 1 > >> number of transactions per client: 1000 > >> number of transactions actually processed: 1000/1000 > >> tps = 50.703609 (including connections establishing) > >> tps = 50.709265 (excluding connections establishing) > >> > >> So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the > >> performance of the server described in the article! > >> > >> I did some tests on a Xen machine running on my workstation and the > >> results are about 400-500tps which seems to be quite reasonable. > >> > >> I also tried to disable drbd and put the data directory elsewhere, > >> but > >> the performance was the same. > >> > >> any ideas? > >> > >> thx, > >> Peter > >> > >> > >> 2006/10/5, Alexander Staubo <alex@purefiction.net>: > >> > It appears to me that work_mem is a more significant configuration > >> > option than previously assumed by many PostgreSQL users, myself > >> > included. As with many database optimizations, it's an obscure > >> > problem to diagnose because you generally only observe it > >> through I/O > >> > activity. > >> > > >> > One possibility would be to log a warning whenever work_mem is > >> > exceeded (or exceeded by a certain ratio). I would also love a > >> couple > >> > of new statistics counters tracking the amount of work memory used > >> > and the amount of work memory that has spilled over into pgsql_tmp. > >> > > >> > Alexander. > >> > > >> > On Oct 5, 2006, at 10:48 , Peter Bauer wrote: > >> > > >> > > Hi all, > >> > > > >> > > inspired by the last posting "Weird disk write load caused by > >> > > PostgreSQL?" i increased the work_mem from 1 to 7MB and did some > >> > > loadtest with vacuum every 10 minutes. The system load > >> (harddisk) went > >> > > down and everything was very stable at 80% idle for nearly 24 > >> hours! > >> > > I am currently performing some pgbench runs to evaluate the > >> hardware > >> > > and configuration for the system but i think the biggest > >> problems are > >> > > solved so far. > >> > > > >> > > thx everybody, > >> > > Peter > >> > > > >> > > 2006/10/2, Tom Lane <tgl@sss.pgh.pa.us>: > >> > >> Ray Stell <stellr@cns.vt.edu> writes: > >> > >> > How would one determine the lock situation definitively? > >> Is there > >> > >> > an internal mechanism that can be queried? > >> > >> > >> > >> pg_locks view. > >> > >> > >> > >> regards, tom lane > >> > >> > >> > >> ---------------------------(end of > >> > >> broadcast)--------------------------- > >> > >> TIP 2: Don't 'kill -9' the postmaster > >> > >> > >> > > > >> > > ---------------------------(end of > >> > > broadcast)--------------------------- > >> > > TIP 4: Have you searched our list archives? > >> > > > >> > > http://archives.postgresql.org > >> > > >> > > >> > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > >
"Peter Bauer" <peter.m.bauer@gmail.com> writes: > tps = 50.703609 (including connections establishing) > tps = 50.709265 (excluding connections establishing) That's about what you ought to expect for a single transaction stream running on honest disk hardware (ie, disks that don't lie about write complete). You can't commit a transaction more often than once per disk revolution, because you have to wait for the current WAL file endpoint to pass under the heads again. If there are multiple clients then "ganging" concurrent commits is possible, but you tested only one. The benchmark you reference might have been done on disks with battery backed write cache. Or it might have been just plain unsafe (ie, the equivalent of fsync off, but in hardware :-() regards, tom lane
Hi all, 2006/10/5, Tom Lane <tgl@sss.pgh.pa.us>: > "Peter Bauer" <peter.m.bauer@gmail.com> writes: > > tps = 50.703609 (including connections establishing) > > tps = 50.709265 (excluding connections establishing) > > That's about what you ought to expect for a single transaction stream > running on honest disk hardware (ie, disks that don't lie about write > complete). You can't commit a transaction more often than once per disk > revolution, because you have to wait for the current WAL file endpoint > to pass under the heads again. If there are multiple clients then > "ganging" concurrent commits is possible, but you tested only one. > > The benchmark you reference might have been done on disks with battery > backed write cache. Or it might have been just plain unsafe (ie, the > equivalent of fsync off, but in hardware :-() You are right, i performed the pgbench tests on another machine with the same hardware but a kernel which supports the onboard Dell Raid Controller with battery backed write cache and the result is about 400 tps. We will see how much difference this makes in practice but at least i know where the "problem" was. thx, Peter