Thread: Overhead for stats_command_string et al, take 2
I redid my previous measurements after finishing up the weekend's hacking. The numbers shown below are elapsed time in seconds for time psql -f testfile.sql postgres >/dev/null using CVS HEAD and REL8_1_STABLE branch tip, compiled --enable-debug --disable-cassert, no nondefault options except for turning fsync off (which doesn't particularly affect read-only tests like these anyway). The machines are both running current Fedora Core 5. The older x86 machine is known to have the slow-gettimeofday problem from previous experimentation with EXPLAIN ANALYZE. Each number is the median of 3 or more tests, rounded off to 0.1 second (I wouldn't put a lot of faith in differences of 0.1 sec or so, because of the variance I saw in the tests). x86 x86_64 8.1 HEAD 8.1 HEAD 100000 "SELECT 1;" 25.9 27.0 9.2 9.1 with stats_command_string=1 63.5 27.6 18.7 9.2 with log_min_duration_statement=100 26.9 27.8 9.6 9.2 with statement_timeout=100 27.5 28.6 9.6 9.8 with all 3 features 66.1 29.3 19.5 9.7 BEGIN, 100000 "SELECT 1;", COMMIT 21.2 23.1 8.3 8.4 with stats_command_string=1 52.3 23.5 15.4 8.5 with log_min_duration_statement=100 22.1 23.4 8.4 8.4 with statement_timeout=100 23.7 24.3 8.6 8.8 with all 3 features 55.2 25.5 16.0 8.8 I chose the log_min_duration_statement and statement_timeout settings high enough so that no actual logging or timeout would happen --- the point is to measure the measurement overhead. The good news is that we've pretty much licked the problem of stats_command_string costing an unreasonable amount. The bad news is that except in the stats_command_string cases, HEAD is noticeably slower than 8.1 on the machine with slow gettimeofday. In the single-transaction test this might be blamed on the addition of statement_timestamp support (which requires a gettimeofday per statement that wasn't there in 8.1) ... but in the one-transaction- per-statement tests that doesn't hold water, because each branch is doing a gettimeofday per statement, just in different places. Can anyone else reproduce this slowdown? It might be only an artifact of these particular builds, but it's a bit too consistent in my x86 data to just ignore. BTW, according to "top" the CPU usage percentages in these tests are on the order of 55% backend, 45% psql. Methinks psql needs a round of performance tuning ... regards, tom lane
I wrote: > BTW, according to "top" the CPU usage percentages in these tests are > on the order of 55% backend, 45% psql. Methinks psql needs a round > of performance tuning ... oprofile tells the tale: CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 240000 GLOBAL_POWER_E...| samples| %| ------------------ 682534 52.7683 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux 274747 21.2413 /home/tgl/testversion/bin/postgres 226306 17.4962 /lib64/libc-2.4.so 54296 4.1977 /home/tgl/testversion/bin/psql 45376 3.5081 /home/tgl/testversion/lib/libpq.so.5.0 5302 0.4099 /usr/bin/oprofiled 1954 0.1511 /oprofile It's all about the kernel process-switch overhead, which is being blamed equally on both processes. I did find some low-hanging fruit in GetVariable(), but nothing else that looked readily improvable. regards, tom lane
Sorry I am traveling for EnterpriseDB Wednesday and Thursday so I can't run the tests right now. Seeing stats_command_string with almost zero overhead is great news! Should we remove that setting and just have it enabled all the time? As far as pg_stat_activity.query_start, I never suspected that column would show the time of start of IDLE. I think we can just document that that column is the time of the start of the last query, and avoid the gettimeofday() call on IDLE start. If we get demand for the old behavior, we can consider re-adding it, but I bet when they hear the downside (a second gettimeofday() call for every query), no one will want the old behavior. --------------------------------------------------------------------------- Tom Lane wrote: > I redid my previous measurements after finishing up the weekend's > hacking. The numbers shown below are elapsed time in seconds for > > time psql -f testfile.sql postgres >/dev/null > > using CVS HEAD and REL8_1_STABLE branch tip, compiled --enable-debug > --disable-cassert, no nondefault options except for turning fsync off > (which doesn't particularly affect read-only tests like these anyway). > The machines are both running current Fedora Core 5. The older x86 > machine is known to have the slow-gettimeofday problem from previous > experimentation with EXPLAIN ANALYZE. Each number is the median of 3 > or more tests, rounded off to 0.1 second (I wouldn't put a lot of faith > in differences of 0.1 sec or so, because of the variance I saw in the > tests). > > x86 x86_64 > > 8.1 HEAD 8.1 HEAD > > 100000 "SELECT 1;" 25.9 27.0 9.2 9.1 > with stats_command_string=1 63.5 27.6 18.7 9.2 > with log_min_duration_statement=100 26.9 27.8 9.6 9.2 > with statement_timeout=100 27.5 28.6 9.6 9.8 > with all 3 features 66.1 29.3 19.5 9.7 > > BEGIN, 100000 "SELECT 1;", COMMIT 21.2 23.1 8.3 8.4 > with stats_command_string=1 52.3 23.5 15.4 8.5 > with log_min_duration_statement=100 22.1 23.4 8.4 8.4 > with statement_timeout=100 23.7 24.3 8.6 8.8 > with all 3 features 55.2 25.5 16.0 8.8 > > I chose the log_min_duration_statement and statement_timeout settings > high enough so that no actual logging or timeout would happen --- the > point is to measure the measurement overhead. > > The good news is that we've pretty much licked the problem of > stats_command_string costing an unreasonable amount. > > The bad news is that except in the stats_command_string cases, HEAD > is noticeably slower than 8.1 on the machine with slow gettimeofday. > In the single-transaction test this might be blamed on the addition > of statement_timestamp support (which requires a gettimeofday per > statement that wasn't there in 8.1) ... but in the one-transaction- > per-statement tests that doesn't hold water, because each branch is > doing a gettimeofday per statement, just in different places. > > Can anyone else reproduce this slowdown? It might be only an artifact > of these particular builds, but it's a bit too consistent in my x86 data > to just ignore. > > BTW, according to "top" the CPU usage percentages in these tests are > on the order of 55% backend, 45% psql. Methinks psql needs a round > of performance tuning ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > The bad news is that except in the stats_command_string cases, HEAD > is noticeably slower than 8.1 on the machine with slow gettimeofday. > In the single-transaction test this might be blamed on the addition > of statement_timestamp support (which requires a gettimeofday per > statement that wasn't there in 8.1) ... but in the one-transaction- > per-statement tests that doesn't hold water, because each branch is > doing a gettimeofday per statement, just in different places. > > Can anyone else reproduce this slowdown? It might be only an artifact > of these particular builds, but it's a bit too consistent in my x86 data > to just ignore. This is what I get on a fast AMD Dual Opteron box(Running Debian Sarge/AMD64): 8.1.4 HEAD 1000000 SELECT 1; 74,74,73 77,76,77 stats_command_string=1; 105,99,106 78,79,78 log_min_duration_statement=100 79,80,81 75,80,76 statement_timeout=100 78,79,78 75,79,77 all 3 104,108,107 82,81,81 all values in seconds with 3 consecutive runs of one million "SELECT 1;" queries. It takes about 48 seconds to run the same test without stat-collection btw. Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > This is what I get on a fast AMD Dual Opteron box(Running Debian > Sarge/AMD64): > 8.1.4 HEAD > 1000000 SELECT 1; 74,74,73 77,76,77 > stats_command_string=1; 105,99,106 78,79,78 > log_min_duration_statement=100 79,80,81 75,80,76 > statement_timeout=100 78,79,78 75,79,77 > all 3 104,108,107 82,81,81 > all values in seconds with 3 consecutive runs of one million "SELECT 1;" > queries. It takes about 48 seconds to run the same test without > stat-collection btw. I'm confused. Isn't your first table row for the case of no stat collection? Or do you mean that you have stats_row_level and/or stats_block_level on in all four cases? regards, tom lane
Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> This is what I get on a fast AMD Dual Opteron box(Running Debian >> Sarge/AMD64): > >> 8.1.4 HEAD >> 1000000 SELECT 1; 74,74,73 77,76,77 >> stats_command_string=1; 105,99,106 78,79,78 >> log_min_duration_statement=100 79,80,81 75,80,76 >> statement_timeout=100 78,79,78 75,79,77 >> all 3 104,108,107 82,81,81 > >> all values in seconds with 3 consecutive runs of one million "SELECT 1;" >> queries. It takes about 48 seconds to run the same test without >> stat-collection btw. > > I'm confused. Isn't your first table row for the case of no stat collection? > Or do you mean that you have stats_row_level and/or stats_block_level on > in all four cases? yes - stats_row_level and stats_block_level on in all cases (sorry for the confusion) - I can easily redo the tests without those - but that's what I had in the running conf and I only remember that after I was nearly done with all the testing :-) Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > Tom Lane wrote: >> Or do you mean that you have stats_row_level and/or stats_block_level on >> in all four cases? > yes - stats_row_level and stats_block_level on in all cases (sorry for > the confusion) - I can easily redo the tests without those - but that's > what I had in the running conf and I only remember that after I was > nearly done with all the testing :-) It'd be interesting to compare 8.1 and HEAD for the no-overhead case; I don't think you need to redo all four cases, but I'd like to see that one. regards, tom lane
Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> Tom Lane wrote: >>> Or do you mean that you have stats_row_level and/or stats_block_level on >>> in all four cases? > >> yes - stats_row_level and stats_block_level on in all cases (sorry for >> the confusion) - I can easily redo the tests without those - but that's >> what I had in the running conf and I only remember that after I was >> nearly done with all the testing :-) > > It'd be interesting to compare 8.1 and HEAD for the no-overhead case; > I don't think you need to redo all four cases, but I'd like to see that one. 8.1: 50,50,49 HEAD: 49,48,49 Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > Tom Lane wrote: >> It'd be interesting to compare 8.1 and HEAD for the no-overhead case; >> I don't think you need to redo all four cases, but I'd like to see that one. > 8.1: 50,50,49 > HEAD: 49,48,49 OK, so that seems comparable to my results on a dual Xeon ... probably, both your machine and my newer one have fast-to-read clock hardware. We need to get some numbers from one of the people who have complained about EXPLAIN ANALYZE overhead. I'll have to try the stats-collection-active case on my machines, too. I was still planning to look into removing the buffer process to reduce context-swap overhead for stats collection ... regards, tom lane
Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > > Tom Lane wrote: > >> It'd be interesting to compare 8.1 and HEAD for the no-overhead case; > >> I don't think you need to redo all four cases, but I'd like to see that one. > > > 8.1: 50,50,49 > > HEAD: 49,48,49 > > OK, so that seems comparable to my results on a dual Xeon ... probably, > both your machine and my newer one have fast-to-read clock hardware. > We need to get some numbers from one of the people who have complained > about EXPLAIN ANALYZE overhead. I'm compiling here without the assert stuff. It takes a while ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Bruce Momjian <bruce@momjian.us> writes: > Seeing stats_command_string with almost zero overhead is great news! > Should we remove that setting and just have it enabled all > the time? If you don't need it, you shouldn't have to pay any overhead for it, I think. One could make an argument now for having stats_command_string default to ON, though. Something that might also be interesting is an option to suppress per-command ps_status reporting. On machines where updating ps status takes a kernel call, there's now a pretty good argument why you might want to turn that off and rely on pg_stat_activity instead. regards, tom lane
Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> Tom Lane wrote: >>> It'd be interesting to compare 8.1 and HEAD for the no-overhead case; >>> I don't think you need to redo all four cases, but I'd like to see that one. > >> 8.1: 50,50,49 >> HEAD: 49,48,49 > > OK, so that seems comparable to my results on a dual Xeon ... probably, > both your machine and my newer one have fast-to-read clock hardware. > We need to get some numbers from one of the people who have complained > about EXPLAIN ANALYZE overhead. > Data from two (identical) dual P-III, one running Linux and one running Freebsd - both doing the 100000 SELECT 1 test: Freebsd 6.1: - 8.1 21.5 (median times) - HEAD 22.2 Linux 2.6.16 - 8.1 16.1 - HEAD 17.2 The variation in run times seems to be up to 0.5 seconds, so I'm not sure that I'm seeing a real difference between 8.1 and HEAD (though this test seems to run noticeably slower on Freebsd - recall from my previous posting featuring these boxes that EXPLAIN ANALYZE seems to have a *much* higher overhead on Freebsd). (8.1 is 8.1.3 on the Freebsd box and 8.1.4 on the linux one. HEAD is from today). regards Mark
Tom Lane wrote: > I redid my previous measurements after finishing up the weekend's > hacking. The numbers shown below are elapsed time in seconds for > > time psql -f testfile.sql postgres >/dev/null Average of 5 runs, for the first two cases, on the x86 machine that shows high overhead in gettimeofday. I used only 30000 SELECT 1 queries instead of 100k. 30000 SELECT 1; HEAD 8.1 no overhead 21.9 23.1 stats_command_string=1 22.4 36.6 BEGIN; 30000 SELECT 1; COMMIT; HEAD 8.1 no overhead 19.1 20.3 stats_command_string=1 19.4 30.3 It can be observed that HEAD in the no overhead case is actually faster than 8.1 on this machine. And while stats_command_string adds some overhead, it still is faster than the no overhead case in 8.1. (These results took me by surprise actually.) For the curious, here are medians and averages for each run ("file3" is the plain SELECT, while "file4" is BEGIN-30k * SELECT-COMMIT. "caso1" is no overhead, "caso2" is stats_command_string=1). I couldn't find a quick'n dirty way to calculate stddev in shell but if anyone knows one let me know. median average 8.1-file3-caso1: 23.098 23.145 8.1-file3-caso2: 36.595 36.607 8.1-file4-caso1: 20.304 20.269 8.1-file4-caso2: 30.169 30.256 head-file3-caso1: 21.890 21.931 head-file3-caso2: 22.509 22.390 head-file4-caso1: 19.142 19.126 head-file4-caso2: 19.488 19.442 -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Average of 5 runs, for the first two cases, on the x86 machine that > shows high overhead in gettimeofday. > I used only 30000 SELECT 1 queries instead of 100k. > 30000 SELECT 1; > HEAD 8.1 > no overhead 21.9 23.1 > stats_command_string=1 22.4 36.6 > BEGIN; 30000 SELECT 1; COMMIT; > HEAD 8.1 > no overhead 19.1 20.3 > stats_command_string=1 19.4 30.3 > It can be observed that HEAD in the no overhead case is actually faster > than 8.1 on this machine. That's more or less what I would have hoped to find, because we're always finding ways to squeeze out bits of overhead here and there. I wonder why your results are different from what I got on my older machine? I'll have to break out oprofile again and try to see what's happening there. regards, tom lane
I ran your test with all defaults in 8.1 and CVS HEAD on a BSD/OS dual Xeon and got: 8.1.X 1.946HEAD 1.986 I ran the test ten times on three runs and took the middle value. It is a slowdown of 2%. I used these configure options: configure \ --with-tcl \ --with-perl \ --with-tclconfig=/u/lib \ --with-includes="/usr/local/include/readline/usr/contrib/include" \ --with-libraries="/usr/local/lib /usr/contrib/lib"\ --with-openssl \ --enable-thread-safety \ --enable-nls --------------------------------------------------------------------------- Tom Lane wrote: > I redid my previous measurements after finishing up the weekend's > hacking. The numbers shown below are elapsed time in seconds for > > time psql -f testfile.sql postgres >/dev/null > > using CVS HEAD and REL8_1_STABLE branch tip, compiled --enable-debug > --disable-cassert, no nondefault options except for turning fsync off > (which doesn't particularly affect read-only tests like these anyway). > The machines are both running current Fedora Core 5. The older x86 > machine is known to have the slow-gettimeofday problem from previous > experimentation with EXPLAIN ANALYZE. Each number is the median of 3 > or more tests, rounded off to 0.1 second (I wouldn't put a lot of faith > in differences of 0.1 sec or so, because of the variance I saw in the > tests). > > x86 x86_64 > > 8.1 HEAD 8.1 HEAD > > 100000 "SELECT 1;" 25.9 27.0 9.2 9.1 > with stats_command_string=1 63.5 27.6 18.7 9.2 > with log_min_duration_statement=100 26.9 27.8 9.6 9.2 > with statement_timeout=100 27.5 28.6 9.6 9.8 > with all 3 features 66.1 29.3 19.5 9.7 > > BEGIN, 100000 "SELECT 1;", COMMIT 21.2 23.1 8.3 8.4 > with stats_command_string=1 52.3 23.5 15.4 8.5 > with log_min_duration_statement=100 22.1 23.4 8.4 8.4 > with statement_timeout=100 23.7 24.3 8.6 8.8 > with all 3 features 55.2 25.5 16.0 8.8 > > I chose the log_min_duration_statement and statement_timeout settings > high enough so that no actual logging or timeout would happen --- the > point is to measure the measurement overhead. > > The good news is that we've pretty much licked the problem of > stats_command_string costing an unreasonable amount. > > The bad news is that except in the stats_command_string cases, HEAD > is noticeably slower than 8.1 on the machine with slow gettimeofday. > In the single-transaction test this might be blamed on the addition > of statement_timestamp support (which requires a gettimeofday per > statement that wasn't there in 8.1) ... but in the one-transaction- > per-statement tests that doesn't hold water, because each branch is > doing a gettimeofday per statement, just in different places. > > Can anyone else reproduce this slowdown? It might be only an artifact > of these particular builds, but it's a bit too consistent in my x86 data > to just ignore. > > BTW, according to "top" the CPU usage percentages in these tests are > on the order of 55% backend, 45% psql. Methinks psql needs a round > of performance tuning ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +