Thread: Overhead for stats_command_string et al, take 2

Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
I redid my previous measurements after finishing up the weekend's
hacking.  The numbers shown below are elapsed time in seconds for
time psql -f testfile.sql postgres >/dev/null

using CVS HEAD and REL8_1_STABLE branch tip, compiled --enable-debug
--disable-cassert, no nondefault options except for turning fsync off
(which doesn't particularly affect read-only tests like these anyway).
The machines are both running current Fedora Core 5.  The older x86
machine is known to have the slow-gettimeofday problem from previous
experimentation with EXPLAIN ANALYZE.  Each number is the median of 3
or more tests, rounded off to 0.1 second (I wouldn't put a lot of faith
in differences of 0.1 sec or so, because of the variance I saw in the
tests).
                    x86            x86_64
                8.1    HEAD    8.1    HEAD

100000 "SELECT 1;"            25.9    27.0     9.2    9.1
with stats_command_string=1        63.5    27.6    18.7    9.2
with log_min_duration_statement=100    26.9    27.8     9.6    9.2
with statement_timeout=100        27.5    28.6     9.6    9.8
with all 3 features            66.1    29.3    19.5    9.7

BEGIN, 100000 "SELECT 1;", COMMIT    21.2    23.1     8.3    8.4
with stats_command_string=1        52.3    23.5    15.4    8.5
with log_min_duration_statement=100    22.1    23.4     8.4    8.4
with statement_timeout=100        23.7    24.3     8.6    8.8
with all 3 features            55.2    25.5    16.0    8.8

I chose the log_min_duration_statement and statement_timeout settings
high enough so that no actual logging or timeout would happen --- the
point is to measure the measurement overhead.

The good news is that we've pretty much licked the problem of
stats_command_string costing an unreasonable amount.

The bad news is that except in the stats_command_string cases, HEAD
is noticeably slower than 8.1 on the machine with slow gettimeofday.
In the single-transaction test this might be blamed on the addition
of statement_timestamp support (which requires a gettimeofday per
statement that wasn't there in 8.1) ... but in the one-transaction-
per-statement tests that doesn't hold water, because each branch is
doing a gettimeofday per statement, just in different places.

Can anyone else reproduce this slowdown?  It might be only an artifact
of these particular builds, but it's a bit too consistent in my x86 data
to just ignore.

BTW, according to "top" the CPU usage percentages in these tests are
on the order of 55% backend, 45% psql.  Methinks psql needs a round
of performance tuning ...
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
I wrote:
> BTW, according to "top" the CPU usage percentages in these tests are
> on the order of 55% backend, 45% psql.  Methinks psql needs a round
> of performance tuning ...

oprofile tells the tale:

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 240000
GLOBAL_POWER_E...| samples|      %|
------------------  682534 52.7683 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux  274747 21.2413
/home/tgl/testversion/bin/postgres 226306 17.4962 /lib64/libc-2.4.so   54296  4.1977 /home/tgl/testversion/bin/psql
45376 3.5081 /home/tgl/testversion/lib/libpq.so.5.0    5302  0.4099 /usr/bin/oprofiled    1954  0.1511 /oprofile
 

It's all about the kernel process-switch overhead, which is being blamed
equally on both processes.

I did find some low-hanging fruit in GetVariable(), but nothing else
that looked readily improvable.
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Bruce Momjian
Date:
Sorry I am traveling for EnterpriseDB Wednesday and Thursday so I can't
run the tests right now.

Seeing stats_command_string with almost zero overhead is great news!
Should we remove that setting and just have it enabled all
the time?

As far as pg_stat_activity.query_start, I never suspected that column
would show the time of start of IDLE.  I think we can just document that
that column is the time of the start of the last query, and avoid the
gettimeofday() call on IDLE start.  If we get demand for the old
behavior, we can consider re-adding it, but I bet when they hear the
downside (a second gettimeofday() call for every query), no one will
want the old behavior.

---------------------------------------------------------------------------

Tom Lane wrote:
> I redid my previous measurements after finishing up the weekend's
> hacking.  The numbers shown below are elapsed time in seconds for
> 
>     time psql -f testfile.sql postgres >/dev/null
> 
> using CVS HEAD and REL8_1_STABLE branch tip, compiled --enable-debug
> --disable-cassert, no nondefault options except for turning fsync off
> (which doesn't particularly affect read-only tests like these anyway).
> The machines are both running current Fedora Core 5.  The older x86
> machine is known to have the slow-gettimeofday problem from previous
> experimentation with EXPLAIN ANALYZE.  Each number is the median of 3
> or more tests, rounded off to 0.1 second (I wouldn't put a lot of faith
> in differences of 0.1 sec or so, because of the variance I saw in the
> tests).
> 
>                         x86            x86_64
> 
>                     8.1    HEAD    8.1    HEAD
> 
> 100000 "SELECT 1;"            25.9    27.0     9.2    9.1
> with stats_command_string=1        63.5    27.6    18.7    9.2
> with log_min_duration_statement=100    26.9    27.8     9.6    9.2
> with statement_timeout=100        27.5    28.6     9.6    9.8
> with all 3 features            66.1    29.3    19.5    9.7
> 
> BEGIN, 100000 "SELECT 1;", COMMIT    21.2    23.1     8.3    8.4
> with stats_command_string=1        52.3    23.5    15.4    8.5
> with log_min_duration_statement=100    22.1    23.4     8.4    8.4
> with statement_timeout=100        23.7    24.3     8.6    8.8
> with all 3 features            55.2    25.5    16.0    8.8
> 
> I chose the log_min_duration_statement and statement_timeout settings
> high enough so that no actual logging or timeout would happen --- the
> point is to measure the measurement overhead.
> 
> The good news is that we've pretty much licked the problem of
> stats_command_string costing an unreasonable amount.
> 
> The bad news is that except in the stats_command_string cases, HEAD
> is noticeably slower than 8.1 on the machine with slow gettimeofday.
> In the single-transaction test this might be blamed on the addition
> of statement_timestamp support (which requires a gettimeofday per
> statement that wasn't there in 8.1) ... but in the one-transaction-
> per-statement tests that doesn't hold water, because each branch is
> doing a gettimeofday per statement, just in different places.
> 
> Can anyone else reproduce this slowdown?  It might be only an artifact
> of these particular builds, but it's a bit too consistent in my x86 data
> to just ignore.
> 
> BTW, according to "top" the CPU usage percentages in these tests are
> on the order of 55% backend, 45% psql.  Methinks psql needs a round
> of performance tuning ...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Overhead for stats_command_string et al, take 2

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:

> The bad news is that except in the stats_command_string cases, HEAD
> is noticeably slower than 8.1 on the machine with slow gettimeofday.
> In the single-transaction test this might be blamed on the addition
> of statement_timestamp support (which requires a gettimeofday per
> statement that wasn't there in 8.1) ... but in the one-transaction-
> per-statement tests that doesn't hold water, because each branch is
> doing a gettimeofday per statement, just in different places.
> 
> Can anyone else reproduce this slowdown?  It might be only an artifact
> of these particular builds, but it's a bit too consistent in my x86 data
> to just ignore.

This is what I get on a fast AMD Dual Opteron box(Running Debian
Sarge/AMD64):
              8.1.4          HEAD
1000000 SELECT 1;        74,74,73    77,76,77
stats_command_string=1;        105,99,106    78,79,78
log_min_duration_statement=100    79,80,81    75,80,76
statement_timeout=100        78,79,78    75,79,77
all 3                104,108,107    82,81,81

all values in seconds with 3 consecutive runs of one million "SELECT 1;"
queries. It takes about 48 seconds to run the same test without
stat-collection btw.


Stefan


Re: Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> This is what I get on a fast AMD Dual Opteron box(Running Debian
> Sarge/AMD64):

>                   8.1.4          HEAD
> 1000000 SELECT 1;        74,74,73    77,76,77
> stats_command_string=1;        105,99,106    78,79,78
> log_min_duration_statement=100    79,80,81    75,80,76
> statement_timeout=100        78,79,78    75,79,77
> all 3                104,108,107    82,81,81

> all values in seconds with 3 consecutive runs of one million "SELECT 1;"
> queries. It takes about 48 seconds to run the same test without
> stat-collection btw.

I'm confused.  Isn't your first table row for the case of no stat collection?
Or do you mean that you have stats_row_level and/or stats_block_level on
in all four cases?
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> This is what I get on a fast AMD Dual Opteron box(Running Debian
>> Sarge/AMD64):
> 
>>                   8.1.4          HEAD
>> 1000000 SELECT 1;        74,74,73    77,76,77
>> stats_command_string=1;        105,99,106    78,79,78
>> log_min_duration_statement=100    79,80,81    75,80,76
>> statement_timeout=100        78,79,78    75,79,77
>> all 3                104,108,107    82,81,81
> 
>> all values in seconds with 3 consecutive runs of one million "SELECT 1;"
>> queries. It takes about 48 seconds to run the same test without
>> stat-collection btw.
> 
> I'm confused.  Isn't your first table row for the case of no stat collection?
> Or do you mean that you have stats_row_level and/or stats_block_level on
> in all four cases?

yes - stats_row_level and stats_block_level on in all cases (sorry for
the confusion) - I can easily redo the tests without those - but that's
what I had in the running conf and I only remember that after I was
nearly done with all the testing :-)



Stefan


Re: Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> Or do you mean that you have stats_row_level and/or stats_block_level on
>> in all four cases?

> yes - stats_row_level and stats_block_level on in all cases (sorry for
> the confusion) - I can easily redo the tests without those - but that's
> what I had in the running conf and I only remember that after I was
> nearly done with all the testing :-)

It'd be interesting to compare 8.1 and HEAD for the no-overhead case;
I don't think you need to redo all four cases, but I'd like to see that one.
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> Tom Lane wrote:
>>> Or do you mean that you have stats_row_level and/or stats_block_level on
>>> in all four cases?
> 
>> yes - stats_row_level and stats_block_level on in all cases (sorry for
>> the confusion) - I can easily redo the tests without those - but that's
>> what I had in the running conf and I only remember that after I was
>> nearly done with all the testing :-)
> 
> It'd be interesting to compare 8.1 and HEAD for the no-overhead case;
> I don't think you need to redo all four cases, but I'd like to see that one.

8.1:    50,50,49
HEAD:    49,48,49


Stefan


Re: Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> It'd be interesting to compare 8.1 and HEAD for the no-overhead case;
>> I don't think you need to redo all four cases, but I'd like to see that one.

> 8.1:    50,50,49
> HEAD:    49,48,49

OK, so that seems comparable to my results on a dual Xeon ... probably,
both your machine and my newer one have fast-to-read clock hardware.
We need to get some numbers from one of the people who have complained
about EXPLAIN ANALYZE overhead.

I'll have to try the stats-collection-active case on my machines, too.
I was still planning to look into removing the buffer process to reduce
context-swap overhead for stats collection ...
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> > Tom Lane wrote:
> >> It'd be interesting to compare 8.1 and HEAD for the no-overhead case;
> >> I don't think you need to redo all four cases, but I'd like to see that one.
> 
> > 8.1:    50,50,49
> > HEAD:    49,48,49
> 
> OK, so that seems comparable to my results on a dual Xeon ... probably,
> both your machine and my newer one have fast-to-read clock hardware.
> We need to get some numbers from one of the people who have complained
> about EXPLAIN ANALYZE overhead.

I'm compiling here without the assert stuff.  It takes a while ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Seeing stats_command_string with almost zero overhead is great news!
> Should we remove that setting and just have it enabled all
> the time?

If you don't need it, you shouldn't have to pay any overhead for it,
I think.  One could make an argument now for having stats_command_string
default to ON, though.

Something that might also be interesting is an option to suppress
per-command ps_status reporting.  On machines where updating ps status
takes a kernel call, there's now a pretty good argument why you might
want to turn that off and rely on pg_stat_activity instead.
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Mark Kirkwood
Date:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> Tom Lane wrote:
>>> It'd be interesting to compare 8.1 and HEAD for the no-overhead case;
>>> I don't think you need to redo all four cases, but I'd like to see that one.
> 
>> 8.1:    50,50,49
>> HEAD:    49,48,49
> 
> OK, so that seems comparable to my results on a dual Xeon ... probably,
> both your machine and my newer one have fast-to-read clock hardware.
> We need to get some numbers from one of the people who have complained
> about EXPLAIN ANALYZE overhead.
> 

Data from two (identical) dual P-III, one running Linux and one running 
Freebsd - both doing the 100000 SELECT 1 test:

Freebsd 6.1:
- 8.1   21.5 (median times)
- HEAD  22.2

Linux 2.6.16
- 8.1   16.1
- HEAD  17.2

The variation in run times seems to be up to 0.5 seconds, so I'm not 
sure that I'm seeing a real difference between 8.1 and HEAD (though this 
test seems to run noticeably slower on Freebsd - recall from my previous 
posting featuring these boxes that EXPLAIN ANALYZE seems to have a 
*much* higher overhead on Freebsd).

(8.1 is 8.1.3 on the Freebsd box and 8.1.4 on the linux one. HEAD is 
from today).

regards

Mark



Re: Overhead for stats_command_string et al, take 2

From
Alvaro Herrera
Date:
Tom Lane wrote:
> I redid my previous measurements after finishing up the weekend's
> hacking.  The numbers shown below are elapsed time in seconds for
> 
>     time psql -f testfile.sql postgres >/dev/null

Average of 5 runs, for the first two cases, on the x86 machine that
shows high overhead in gettimeofday.

I used only 30000 SELECT 1 queries instead of 100k.

30000 SELECT 1;        HEAD        8.1
no overhead        21.9        23.1    
stats_command_string=1    22.4        36.6

BEGIN; 30000 SELECT 1; COMMIT;        HEAD        8.1
no overhead        19.1        20.3
stats_command_string=1    19.4        30.3

It can be observed that HEAD in the no overhead case is actually faster
than 8.1 on this machine.   And while stats_command_string adds some
overhead, it still is faster than the no overhead case in 8.1.  (These
results took me by surprise actually.)

For the curious, here are medians and averages for each run ("file3" is
the plain SELECT, while "file4" is BEGIN-30k * SELECT-COMMIT.  "caso1"
is no overhead, "caso2" is stats_command_string=1).  I couldn't find a
quick'n dirty way to calculate stddev in shell but if anyone knows one
let me know.
    median  average
8.1-file3-caso1: 23.098 23.145
8.1-file3-caso2: 36.595 36.607
8.1-file4-caso1: 20.304 20.269
8.1-file4-caso2: 30.169 30.256
head-file3-caso1: 21.890 21.931
head-file3-caso2: 22.509 22.390
head-file4-caso1: 19.142 19.126
head-file4-caso2: 19.488 19.442

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Overhead for stats_command_string et al, take 2

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Average of 5 runs, for the first two cases, on the x86 machine that
> shows high overhead in gettimeofday.

> I used only 30000 SELECT 1 queries instead of 100k.

> 30000 SELECT 1;
>                         HEAD            8.1
> no overhead             21.9            23.1    
> stats_command_string=1  22.4            36.6

> BEGIN; 30000 SELECT 1; COMMIT;
>                         HEAD            8.1
> no overhead             19.1            20.3
> stats_command_string=1  19.4            30.3

> It can be observed that HEAD in the no overhead case is actually faster
> than 8.1 on this machine.

That's more or less what I would have hoped to find, because we're
always finding ways to squeeze out bits of overhead here and there.
I wonder why your results are different from what I got on my older
machine?  I'll have to break out oprofile again and try to see what's
happening there.
        regards, tom lane


Re: Overhead for stats_command_string et al, take 2

From
Bruce Momjian
Date:
I ran your test with all defaults in 8.1 and CVS HEAD on a BSD/OS dual
Xeon and got:
8.1.X    1.946HEAD    1.986

I ran the test ten times on three runs and took the middle value.

It is a slowdown of 2%.  I used these configure options:
configure \    --with-tcl \    --with-perl \    --with-tclconfig=/u/lib \
--with-includes="/usr/local/include/readline/usr/contrib/include" \    --with-libraries="/usr/local/lib
/usr/contrib/lib"\    --with-openssl \    --enable-thread-safety \    --enable-nls
 

---------------------------------------------------------------------------

Tom Lane wrote:
> I redid my previous measurements after finishing up the weekend's
> hacking.  The numbers shown below are elapsed time in seconds for
> 
>     time psql -f testfile.sql postgres >/dev/null
> 
> using CVS HEAD and REL8_1_STABLE branch tip, compiled --enable-debug
> --disable-cassert, no nondefault options except for turning fsync off
> (which doesn't particularly affect read-only tests like these anyway).
> The machines are both running current Fedora Core 5.  The older x86
> machine is known to have the slow-gettimeofday problem from previous
> experimentation with EXPLAIN ANALYZE.  Each number is the median of 3
> or more tests, rounded off to 0.1 second (I wouldn't put a lot of faith
> in differences of 0.1 sec or so, because of the variance I saw in the
> tests).
> 
>                         x86            x86_64
> 
>                     8.1    HEAD    8.1    HEAD
> 
> 100000 "SELECT 1;"            25.9    27.0     9.2    9.1
> with stats_command_string=1        63.5    27.6    18.7    9.2
> with log_min_duration_statement=100    26.9    27.8     9.6    9.2
> with statement_timeout=100        27.5    28.6     9.6    9.8
> with all 3 features            66.1    29.3    19.5    9.7
> 
> BEGIN, 100000 "SELECT 1;", COMMIT    21.2    23.1     8.3    8.4
> with stats_command_string=1        52.3    23.5    15.4    8.5
> with log_min_duration_statement=100    22.1    23.4     8.4    8.4
> with statement_timeout=100        23.7    24.3     8.6    8.8
> with all 3 features            55.2    25.5    16.0    8.8
> 
> I chose the log_min_duration_statement and statement_timeout settings
> high enough so that no actual logging or timeout would happen --- the
> point is to measure the measurement overhead.
> 
> The good news is that we've pretty much licked the problem of
> stats_command_string costing an unreasonable amount.
> 
> The bad news is that except in the stats_command_string cases, HEAD
> is noticeably slower than 8.1 on the machine with slow gettimeofday.
> In the single-transaction test this might be blamed on the addition
> of statement_timestamp support (which requires a gettimeofday per
> statement that wasn't there in 8.1) ... but in the one-transaction-
> per-statement tests that doesn't hold water, because each branch is
> doing a gettimeofday per statement, just in different places.
> 
> Can anyone else reproduce this slowdown?  It might be only an artifact
> of these particular builds, but it's a bit too consistent in my x86 data
> to just ignore.
> 
> BTW, according to "top" the CPU usage percentages in these tests are
> on the order of 55% backend, 45% psql.  Methinks psql needs a round
> of performance tuning ...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +