Thread: Test request for Stats collector performance improvement

Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

15 June 2006, 04:05:49

Would some people please run the attached test procedure and report back
the results?  I basically need to know the patch is an improvement on
more platforms than just my own.  Thanks

---------------------------------------------------------------------------

Run this script and record the time reported:
ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script

Modify postgresql.conf:
stats_command_string = on

and reload the server.  Do "SELECT * FROM pg_stat_activity;" to verify
the command string is enabled.  You should see your query in the
"current query" column.

Rerun the stat.script again and record the time.

Apply this patch to CVS HEAD:
ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.nobuffer

Run the stat.script again and record the time.

Report via email your three times and your platform.

If the patch worked, the first and third times will be similar, and
the second time will be high.


---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Michael Fuhr <mike@fuhr.org> writes:
> > > Further tests show that for this application
> > > the killer is stats_command_string, not stats_block_level or
> > > stats_row_level.
> > 
> > I tried it with pgbench -c 10, and got these results:
> >     41% reduction in TPS rate for stats_command_string
> >     9% reduction in TPS rate for stats_block/row_level (any combination)
> > 
> > strace'ing a backend confirms my belief that stats_block/row_level send
> > just one stats message per transaction (at least for the relatively
> > small number of tables touched per transaction by pgbench).  However
> > stats_command_string sends 14(!) --- there are seven commands per
> > pgbench transaction and each results in sending a <command> message and
> > later an <IDLE> message.
> > 
> > Given the rather lackadaisical way in which the stats collector makes
> > the data available, it seems like the backends are being much too
> > enthusiastic about posting their stats_command_string status
> > immediately.  Might be worth thinking about how to cut back the
> > overhead by suppressing some of these messages.
> 
> I did some research on this because the numbers Tom quotes indicate there
> is something wrong in the way we process stats_command_string
> statistics.
> 
> I made a small test script:
>     
>     if [ ! -f /tmp/pgstat.sql ]
>     then    i=0
>         while [ $i -lt 10000 ]
>         do
>             i=`expr $i + 1`
>             echo "SELECT 1;"
>         done > /tmp/pgstat.sql
>     fi
>     
>     time psql test </tmp/pgstat.sql >/dev/null
> 
> This sends 10,000 "SELECT 1" queries to the backend, and reports the
> execution time.  I found that without stats_command_string defined, it
> ran in 3.5 seconds.  With stats_command_string defined, it took 5.5
> seconds, meaning the command string is causing a 57% slowdown.  That is
> way too much considering that the SELECT 1 has to be send from psql to
> the backend, parsed, optimized, and executed, and the result returned to
> the psql, while stats_command_string only has to send a string to a
> backend collector.  There is _no_ way that collector should take 57% of
> the time it takes to run the actual query.
> 
> With the test program, I tried various options.  The basic code we have
> sends a UDP packet to a statistics buffer process, which recv()'s the
> packet, puts it into a memory queue buffer, and writes it to a pipe()
> that is read by the statistics collector process which processes the
> packet.
> 
> I tried various ways of speeding up the buffer and collector processes. 
> I found if I put a pg_usleep(100) in the buffer process the backend
> speed was good, but packets were lost.  What I found worked well was to
> do multiple recv() calls in a loop.  The previous code did a select(),
> then perhaps a recv() and pipe write() based on the results of the
> select().  This caused many small packets to be written to the pipe and
> the pipe write overhead seems fairly large.  The best fix I found was to
> loop over the recv() call at most 25 times, collecting a group of
> packets that can then be sent to the collector in one pipe write.  The
> recv() socket is non-blocking, so a zero return indicates there are no
> more packets available.  Patch attached.
> 
> This change reduced the stats_command_string time from 5.5 to 3.9, which
> is closer to the 3.5 seconds with stats_command_string off.
> 
> A second improvement I discovered is that the statistics collector is
> calling gettimeofday() for every packet received, so it can determine
> the timeout for the select() call to write the flat file.  I removed
> that behavior and instead used setitimer() to issue a SIGINT every
> 500ms, which was the original behavior.  This eliminates the
> gettimeofday() call and makes the code cleaner.  Second patch attached.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

"Qingqing Zhou"

Date:

15 June 2006, 06:10:21

"Bruce Momjian" <pgman@candle.pha.pa.us> wrote
>
> Would some people please run the attached test procedure and report back
> the results?  I basically need to know the patch is an improvement on
> more platforms than just my own.  Thanks
>

Obviously it matches your expectation.

uname: Linux amd64 2.6.9-5.13smp #1 SMP Wed Aug 10 10:55:44 CST 2005 x86_64
x86_64 x86_64 GNU/Linux
compiler: gcc (GCC) 3.4.3 20041212
configure: '--prefix=/home/qqzhou/pginstall'

--Before patch --
real    0m1.149s
user    0m0.182s
sys     0m0.122s

real    0m1.121s
user    0m0.173s
sys     0m0.103s

real    0m1.128s
user    0m0.116s
sys     0m0.092s

-- After patch --

real    0m1.275s
user    0m0.097s
sys     0m0.160s

real    0m4.063s
user    0m0.663s
sys     0m0.377s

real    0m1.259s
user    0m0.073s
sys     0m0.160s

Re: Test request for Stats collector performance improvement

From

"Larry Rosenman"

Date:

15 June 2006, 08:01:13

Bruce Momjian wrote:
> Would some people please run the attached test procedure and report
> back the results?  I basically need to know the patch is an
> improvement on more platforms than just my own.  Thanks  
> 
>
---------------------------------------------------------------------------
> 
[snip]FreeBSD thebighonker.lerctr.org 6.1-STABLE FreeBSD 6.1-STABLE #60: Mon Jun
12 16:55:31 CDT 2006
root@thebighonker.lerctr.org:/usr/obj/usr/src/sys/THEBIGHONKER  amd64
$
with all stats on, except command string, cvs HEAD, no other patch:
$ sh stat.script       1.92 real         0.35 user         0.42 sys
$
# same as above, with command_string on.
$ sh stat.script       2.51 real         0.34 user         0.45 sys
$
#with patch and command_string ON.
$ sh stat.script       2.37 real         0.35 user         0.34 sys
$ 
The above uname is for a very current RELENG_6 FreeBSD.  This was done on a dual-xeon in 64-bit mode.  HTT *IS*
enabled.

LER

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 512-248-2683                 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3683 US

Re: Test request for Stats collector performance improvement

From

Tom Lane

Date:

15 June 2006, 14:27:40

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> Obviously it matches your expectation.

Hm?  I don't see any improvement there:

> --Before patch --
> real    0m1.149s
> real    0m1.121s
> real    0m1.128s

> -- After patch --
> real    0m1.275s
> real    0m4.063s
> real    0m1.259s
        regards, tom lane

Re: Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

15 June 2006, 15:57:51

Tom Lane wrote:
> "Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> > Obviously it matches your expectation.
> 
> Hm?  I don't see any improvement there:
> 
> > --Before patch --
> > real    0m1.149s
> > real    0m1.121s
> > real    0m1.128s
> 
> > -- After patch --
> > real    0m1.275s
> > real    0m4.063s
> > real    0m1.259s

The report is incomplete.  I need three outputs:
stats offstats onstats on, patched

He only reported two sets of results.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

Stefan Kaltenbrunner

Date:

15 June 2006, 19:58:45

Bruce Momjian wrote:
> Would some people please run the attached test procedure and report back
> the results?  I basically need to know the patch is an improvement on
> more platforms than just my own.  Thanks


OpenBSD 3.9-current/x86:

without stats:   0m6.79s real     0m1.56s user     0m1.12s system

-HEAD + stats:   0m10.44s real     0m2.26s user     0m1.22s system

-HEAD + stats + patch:   0m10.68s real     0m2.16s user     0m1.36s system


Stefan

Re: Test request for Stats collector performance improvement

From

Stefan Kaltenbrunner

Date:

15 June 2006, 20:29:51

Bruce Momjian wrote:
> Would some people please run the attached test procedure and report back
> the results?  I basically need to know the patch is an improvement on
> more platforms than just my own.  Thanks


Debian Sarge/AMD64 Kernel 2.6.16.16 (all tests done multiple times with
variation of less then 10%):

-HEAD:

real    0m0.486s
user    0m0.064s
sys     0m0.048s

-HEAD with 100000 "SELECT 1;" queries:

real    0m4.763s
user    0m0.896s
sys     0m1.232s

-HEAD + stats:


real    0m0.720s
user    0m0.128s
sys     0m0.096s


-HEAD + stats (100k):


real    0m7.204s
user    0m1.504s
sys     0m1.028s


-HEAD + stats + patch:

there is something weird going on here - I get either runtimes like:

real    0m0.729s
user    0m0.092s
sys     0m0.100s

and occasionally:


real    0m3.926s
user    0m0.144s
sys     0m0.140s


(always ~0,7 vs ~4 seconds - same variation as Qingqing Zhou seems to see)


-HEAD + stats + patch(100k):

similiar variation with:

real    0m7.955s
user    0m1.124s
sys     0m1.164s

and

real    0m11.836s
user    0m1.368s
sys     0m1.156s

(ie 7-8 seconds vs 11-12 seconds)


looks like this patch is actually a loss on that box.


Stefan

Re: Test request for Stats collector performance improvement

From

Josh Berkus

Date:

15 June 2006, 21:38:24

Bruce,

> The report is incomplete.  I need three outputs:
>
>     stats off
>     stats on
>     stats on, patched
>
> He only reported two sets of results.

You need stats off, patched too.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Test request for Stats collector performance improvement

From

Tom Lane

Date:

15 June 2006, 21:42:52

Josh Berkus <josh@agliodbs.com> writes:
> You need stats off, patched too.

Shouldn't really be necessary, as the code being patched won't be
executed if stats aren't being collected...
        regards, tom lane

Re: Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

15 June 2006, 21:46:40

Josh Berkus wrote:
> Bruce,
> 
> > The report is incomplete.  I need three outputs:
> >
> >     stats off
> >     stats on
> >     stats on, patched
> >
> > He only reported two sets of results.
> 
> You need stats off, patched too.

No need --- stats off, patched too, should be the same as stats off, no
patch.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

"Qingqing Zhou"

Date:

16 June 2006, 01:34:39

"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> Hm?  I don't see any improvement there:
>

I was referening this sentence, though I am not sure why that's the
expectation:
>
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote
> If the patch worked, the first and third times will be similar, and
> the second time will be high.
>

-- After patch --

real    0m1.275s
user    0m0.097s
sys     0m0.160s

real    0m4.063s
user    0m0.663s
sys     0m0.377s

real    0m1.259s
user    0m0.073s
sys     0m0.160s

Re: Test request for Stats collector performance improvement

From

Tom Lane

Date:

16 June 2006, 01:56:46

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
>> Hm?  I don't see any improvement there:

> I was referening this sentence, though I am not sure why that's the
> expectation:
>> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote
>> If the patch worked, the first and third times will be similar, and
>> the second time will be high.

You need to label your results more clearly then.  I thought you were
showing us three repeats of the same test, and I gather Bruce thought
so too...
        regards, tom lane

Re: Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

16 June 2006, 03:14:28

Qingqing Zhou wrote:
> 
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
> >
> > Hm?  I don't see any improvement there:
> >
> 
> I was referening this sentence, though I am not sure why that's the
> expectation:
> >
> > "Bruce Momjian" <pgman@candle.pha.pa.us> wrote
> > If the patch worked, the first and third times will be similar, and
> > the second time will be high.

I meant that the non-stats and the patched stats should be the similar,
and the stats without the patch (the second test) should be high.

> -- After patch --
> 
> real    0m1.275s
> user    0m0.097s
> sys     0m0.160s
> 
> real    0m4.063s
> user    0m0.663s
> sys     0m0.377s
> 
> real    0m1.259s
> user    0m0.073s
> sys     0m0.160s

I assume the above is just running the same test three times, right?

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

"Qingqing Zhou"

Date:

16 June 2006, 03:28:27

"Bruce Momjian" <pgman@candle.pha.pa.us> wrote
>
> > -- After patch --
> >
> > real    0m1.275s
> > user    0m0.097s
> > sys     0m0.160s
> >
> > real    0m4.063s
> > user    0m0.663s
> > sys     0m0.377s
> >
> > real    0m1.259s
> > user    0m0.073s
> > sys     0m0.160s
>
> I assume the above is just running the same test three times, right?
>

Right -- it is the result of the patched CVS tip runing three times with
stats_command_string = on. And the tests marked "--Before patch--" is the
result of CVS tip running three times with stats_command_string = on.

Regards,
Qingqing

Re: Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

16 June 2006, 03:57:38

Qingqing Zhou wrote:
> 
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote
> >
> > > -- After patch --
> > >
> > > real    0m1.275s
> > > user    0m0.097s
> > > sys     0m0.160s
> > >
> > > real    0m4.063s
> > > user    0m0.663s
> > > sys     0m0.377s
> > >
> > > real    0m1.259s
> > > user    0m0.073s
> > > sys     0m0.160s
> >
> > I assume the above is just running the same test three times, right?
> >
> 
> Right -- it is the result of the patched CVS tip runing three times with
> stats_command_string = on. And the tests marked "--Before patch--" is the
> result of CVS tip running three times with stats_command_string = on.

Any idea why there is such a variance in the result?  The second run
looks quite slow.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

"Qingqing Zhou"

Date:

16 June 2006, 04:49:02

"Bruce Momjian" <pgman@candle.pha.pa.us> wrote
>
> Any idea why there is such a variance in the result?  The second run
> looks quite slow.
>

No luck so far. It is quite repeatble in my machine -- runing times which
show a long execution time: 2, 11, 14, 21 ... But when I do strace, the
weiredness disappered totally.  Have we seen any strange things like this
before?

Regards,
Qingqing

Re: Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

16 June 2006, 16:03:36

OK, based on reports I have seen, generally stats_query_string adds 50%
to the total runtime of a "SELECT 1" query, and the patch reduces the
overhead to 25%.

However, that 25% is still much too large.  Consider that "SELECT 1" has
to travel from psql to the server, go through the
parser/optimizer/executor, and then return, it is clearly wrong that the
stats_query_string performance hit should be measurable.

I am actually surprised that so few people in the community are
concerned about this.  While we have lots of people studying large
queries, these small queries should also get attention from a
performance perspective.

I have created a new test that also turns off writing of the stats file.
This will not pass regression tests, but it will show the stats write
overhead.

Updated test to be run:

---------------------------------------------------------------------------

1)  Run this script and record the time reported:
ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script
   It should take only a few seconds.  

2)  Modify postgresql.conf:
stats_command_string = on
   and reload the server.  Do "SELECT * FROM pg_stat_activity;" to verify   the command string is enabled.  You should
seeyour query in the   "current query" column.

3)  Rerun the stat.script again and record the time.

4)  Apply this patch to CVS HEAD:
ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.nobuffer

5)  Run the stat.script again and record the time.

6)  Revert the patch and apply this patch to CVS HEAD:
ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.nobuffer_nowrite

7)  Run the stat.script again and record the time.

8)  Report the four results and your platform via email to   pgman@candle.pha.pa.us.  Label times:
stats_command_string = offstats_command_string = onstat.nobuffer patchstat.nobuffer_nowrite patch

---------------------------------------------------------------------------

Qingqing Zhou wrote:
> 
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote
> >
> > Any idea why there is such a variance in the result?  The second run
> > looks quite slow.
> >
> 
> No luck so far. It is quite repeatble in my machine -- runing times which
> show a long execution time: 2, 11, 14, 21 ... But when I do strace, the
> weiredness disappered totally.  Have we seen any strange things like this
> before?
> 
> Regards,
> Qingqing
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

Stefan Kaltenbrunner

Date:

16 June 2006, 16:13:16

Bruce Momjian wrote:
> OK, based on reports I have seen, generally stats_query_string adds 50%
> to the total runtime of a "SELECT 1" query, and the patch reduces the
> overhead to 25%.

that is actually not true for both of the platforms(a slow OpenBSD
3.9/x86 and a very fast Linux/x86_64) I tested on. Both of them show
virtually no improvement with the patch and even worst it causes
considerable (negative) variance on at least the Linux box.

> 
> However, that 25% is still much too large.  Consider that "SELECT 1" has
> to travel from psql to the server, go through the
> parser/optimizer/executor, and then return, it is clearly wrong that the
> stats_query_string performance hit should be measurable.
> 
> I am actually surprised that so few people in the community are
> concerned about this.  While we have lots of people studying large
> queries, these small queries should also get attention from a
> performance perspective.
> 
> I have created a new test that also turns off writing of the stats file.
> This will not pass regression tests, but it will show the stats write
> overhead.

will try to run those too in a few.


Stefan

Re: Test request for Stats collector performance improvement

From

Bruce Momjian

Date:

16 June 2006, 16:27:43

Stefan Kaltenbrunner wrote:
> Bruce Momjian wrote:
> > OK, based on reports I have seen, generally stats_query_string adds 50%
> > to the total runtime of a "SELECT 1" query, and the patch reduces the
> > overhead to 25%.
> 
> that is actually not true for both of the platforms(a slow OpenBSD
> 3.9/x86 and a very fast Linux/x86_64) I tested on. Both of them show
> virtually no improvement with the patch and even worst it causes
> considerable (negative) variance on at least the Linux box.

I see the results I suggested on OpenBSD that you reported.

> OpenBSD 3.9-current/x86:
> 
> without stats:
>     0m6.79s real     0m1.56s user     0m1.12s system
> 
> -HEAD + stats:
>     0m10.44s real     0m2.26s user     0m1.22s system
> 
> -HEAD + stats + patch:
>     0m10.68s real     0m2.16s user     0m1.36s system

and I got similar results reported from a Debian:
Linux 2.6.16 on a single processor HT 2.8Ghz Pentium compiledwith gcc 4.0.4.
> > real        0m3.306s> > real        0m4.905s> > real        0m4.448s

I am unclear on the cuase for the widely varying results you saw in
Debian.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Test request for Stats collector performance improvement

From

Stefan Kaltenbrunner

Date:

16 June 2006, 18:15:00

Bruce Momjian wrote:
> Stefan Kaltenbrunner wrote:
>> Bruce Momjian wrote:
>>> OK, based on reports I have seen, generally stats_query_string adds 50%
>>> to the total runtime of a "SELECT 1" query, and the patch reduces the
>>> overhead to 25%.
>> that is actually not true for both of the platforms(a slow OpenBSD
>> 3.9/x86 and a very fast Linux/x86_64) I tested on. Both of them show
>> virtually no improvement with the patch and even worst it causes
>> considerable (negative) variance on at least the Linux box.
> 
> I see the results I suggested on OpenBSD that you reported.
> 
>> OpenBSD 3.9-current/x86:
>>
>> without stats:
>>     0m6.79s real     0m1.56s user     0m1.12s system
>>
>> -HEAD + stats:
>>     0m10.44s real     0m2.26s user     0m1.22s system
>>
>> -HEAD + stats + patch:
>>     0m10.68s real     0m2.16s user     0m1.36s system

yep those are very stable even over a large number of runs

> 
> and I got similar results reported from a Debian:
> 
>     Linux 2.6.16 on a single processor HT 2.8Ghz Pentium compiled
>     with gcc 4.0.4.
> 
>     > > real        0m3.306s
>     > > real        0m4.905s
>     > > real        0m4.448s
> 
> I am unclear on the cuase for the widely varying results you saw in
> Debian.
> 

I can reproduce the widely varying results on a number of x86 and x86_64
based Linux boxes here (Debian,Fedora and CentOS) though I cannot
reproduce it on a Fedora core 5/ppc box.
All the x86 boxes are SMP - while the ppc one is not - that might have
some influence on the results.

Stefan

Re: Test request for Stats collector performance improvement

From

Tom Lane

Date:

17 June 2006, 17:43:18

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 1)  Run this script and record the time reported:
>     ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script

One thing you neglected to specify is that the test must be done on a
NON ASSERT CHECKING build of CVS HEAD (or recent head, at least).
On these trivial "SELECT 1" commands, an assert-checking backend is
going to spend over 50% of its time doing end-of-transaction assert
checks.  I was reminded of this upon trying to do oprofile:

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 240000
samples  %        symbol name
129870   37.0714  AtEOXact_CatCache
67112    19.1571  AllocSetCheck
16611     4.7416  AtEOXact_Buffers
10054     2.8699  base_yyparse
7499      2.1406  hash_seq_search
7037      2.0087  AllocSetAlloc
4267      1.2180  hash_search
4060      1.1589  AtEOXact_RelationCache
2537      0.7242  base_yylex
1984      0.5663  grouping_planner
1873      0.5346  LWLockAcquire
1837      0.5244  AllocSetFree
1808      0.5161  exec_simple_query
1763      0.5032  ExecutorStart
1527      0.4359  PostgresMain
1464      0.4179  MemoryContextAllocZeroAligned

Let's be sure we're all measuring the same thing.
        regards, tom lane