Thread: analyzing postgresql performance for dbt-2
I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having some trouble figuring out what I should be looking for when I'm trying to tune the database. I have results for a decent baseline, but when I try to increase the load on the database, the performance drops. Nothing in the graphs (in the links listed later) sticks out to me so I'm wondering if there are other database statitics I should try to collect. Any suggestions would be great and let me know if I can answer any other questions. Here are a pair of results where I just raise the load on the database, where increasing the load increases the area of the database touched in addition to increasing the transaction rate. The overall metric increases somewhat, but the response time for most of the interactions also increases significantly: http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] - load of 100 warehouses - metric 1249.65 http://developer.osdl.org/markw/dbt2-pgsql/149/ - load of 140 warehouses - metric 1323.90 Both of these runs had wal_buffers set to 8, checkpoint_segments 200, and checkpoint_timeout 1800. So far I've only tried various wal_buffers and checkpoint_segments settings in the next set of results for a load of 140 warehouses. http://developer.osdl.org/markw/dbt2-pgsql/148/ - metric 1279.26 - wal_buffers 8 - checkpoint_segments 100 - checkpoint_timeout 300 http://developer.osdl.org/markw/dbt2-pgsql/149/ - metric 1323.90 - wal_buffers 8 - checkpoint_segments 200 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/150/ - metric 1281.13 - wal_buffers 8 - checkpoint_segments 300 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/151/ - metric 1311.99 - wal_buffers 32 - checkpoint_segments 200 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/152/ - metric 1268.37 - wal_buffers 64 - checkpoint_segments 200 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/154/ - metric 1314.62 - wal_buffers 16 - checkpoint_segments 200 - checkpoint_timeout 1800 Thanks! -- Mark Wong - - markw@osdl.org Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005 (503) 626-2455 x 32 (office) (503) 626-2436 (fax)
markw@osdl.org wrote: > I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having > some trouble figuring out what I should be looking for when I'm trying > to tune the database. I have results for a decent baseline, but when I > try to increase the load on the database, the performance drops. > Nothing in the graphs (in the links listed later) sticks out to me so > I'm wondering if there are other database statitics I should try to > collect. Any suggestions would be great and let me know if I can answer > any other questions. > > Here are a pair of results where I just raise the load on the > database, where increasing the load increases the area of the database > touched in addition to increasing the transaction rate. The overall > metric increases somewhat, but the response time for most of the > interactions also increases significantly: > > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] > - load of 100 warehouses > - metric 1249.65 > > http://developer.osdl.org/markw/dbt2-pgsql/149/ > - load of 140 warehouses > - metric 1323.90 I looked at these charts and they looked normal to me. It looked like your the load increased until your computer was saturated. Is there something I am missing? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Oct 21, 2003 at 08:35:56PM -0400, Bruce Momjian wrote: > markw@osdl.org wrote: > > I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having > > some trouble figuring out what I should be looking for when I'm trying > > to tune the database. I have results for a decent baseline, but when I > > try to increase the load on the database, the performance drops. > > Nothing in the graphs (in the links listed later) sticks out to me so > > I'm wondering if there are other database statitics I should try to > > collect. Any suggestions would be great and let me know if I can answer > > any other questions. > > > > Here are a pair of results where I just raise the load on the > > database, where increasing the load increases the area of the database > > touched in addition to increasing the transaction rate. The overall > > metric increases somewhat, but the response time for most of the > > interactions also increases significantly: > > > > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] > > - load of 100 warehouses > > - metric 1249.65 > > > > http://developer.osdl.org/markw/dbt2-pgsql/149/ > > - load of 140 warehouses > > - metric 1323.90 > > I looked at these charts and they looked normal to me. It looked like > your the load increased until your computer was saturated. Is there > something I am missing? I've run some i/o tests so I'm pretty sure I haven't saturated that. And it looks like I have almost 10% more processor time left. I do agree that it appears something might be saturated, I just don't know where to look... Thanks, Mark
Mark Wong wrote: > > > Here are a pair of results where I just raise the load on the > > > database, where increasing the load increases the area of the database > > > touched in addition to increasing the transaction rate. The overall > > > metric increases somewhat, but the response time for most of the > > > interactions also increases significantly: > > > > > > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] > > > - load of 100 warehouses > > > - metric 1249.65 > > > > > > http://developer.osdl.org/markw/dbt2-pgsql/149/ > > > - load of 140 warehouses > > > - metric 1323.90 > > > > I looked at these charts and they looked normal to me. It looked like > > your the load increased until your computer was saturated. Is there > > something I am missing? > > I've run some i/o tests so I'm pretty sure I haven't saturated that. And it > looks like I have almost 10% more processor time left. I do agree that it > appears something might be saturated, I just don't know where to look... Could the 10% be context switching time, or is the I/O saturated? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On 26 Oct, Bruce Momjian wrote: > Mark Wong wrote: >> > > Here are a pair of results where I just raise the load on the >> > > database, where increasing the load increases the area of the database >> > > touched in addition to increasing the transaction rate. The overall >> > > metric increases somewhat, but the response time for most of the >> > > interactions also increases significantly: >> > > >> > > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] >> > > - load of 100 warehouses >> > > - metric 1249.65 >> > > >> > > http://developer.osdl.org/markw/dbt2-pgsql/149/ >> > > - load of 140 warehouses >> > > - metric 1323.90 >> > >> > I looked at these charts and they looked normal to me. It looked like >> > your the load increased until your computer was saturated. Is there >> > something I am missing? >> >> I've run some i/o tests so I'm pretty sure I haven't saturated that. And it >> looks like I have almost 10% more processor time left. I do agree that it >> appears something might be saturated, I just don't know where to look... > > Could the 10% be context switching time, or is the I/O saturated? There are about 14,000 to 17,000 context switches/s according to the vmstat output. This is on a 1.5Ghz hyperthreaded Xeon processor. I don't know what I'm supposed to be able to expect in terms of context switching. I really doubt the i/o is saturated because I've run disktest (part of the Linux Test Project suite) and saw much higher throughput for various sequential/random read/write tests. I'm starting to collect oprofile data (and will hopefully have some results soon) to get an idea where the database is spending its time, just in case that may have something to do with it. Mark
I've done a better controlled series of tests where I restore the database before each test and have grabbed sar and oprofile data: http://developer.osdl.org/markw/dbt2-pgsql/176/ - load of 100 warehouses - metric 1234.52 http://developer.osdl.org/markw/dbt2-pgsql/177/ - load of 120 warehouses - metric 1259.43 http://developer.osdl.org/markw/dbt2-pgsql/178/ - load of 140 warehouses - metric 1244.33 For the most part our primary metric, and the vmstat and sar output look fairly close for each run. Here are a couple of things that I've found to be considerably different from run 176 to 178: - oprofile says postgresql calls to SearchCatCache increased ~ 20% - readprofile says there are 50% more calls in the linux kernel to do_signaction (in kernel/signal.c) Would these two things offer any insight to what might be throttling the throughput? Thanks, Mark