Thread: Re: Stats collector performance improvement

Re: Stats collector performance improvement

From
Tom Lane
Date:
[ moving to -hackers ]

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I did some research on this because the numbers Tom quotes indicate there
> is something wrong in the way we process stats_command_string
> statistics.
> [ ... proposed patch that seems pretty klugy to me ... ]

I wonder whether we shouldn't consider something more drastic, like
getting rid of the intermediate stats buffer process entirely.

The original design for the stats communication code was based on the
premise that it's better to drop data than to make backends wait on
the stats collector.  However, as things have turned out I think this
notion is a flop: the people who are using stats at all want the stats
to be reliable.  We've certainly seen plenty of gripes from people who
are unhappy that backend-exit messages got dropped, and anyone who's
using autovacuum would really like the tuple update counts to be pretty
solid too.

If we abandoned the unreliable-communication approach, could we build
something with less overhead?
        regards, tom lane


Re: Stats collector performance improvement

From
"Qingqing Zhou"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> I wonder whether we shouldn't consider something more drastic, like
> getting rid of the intermediate stats buffer process entirely.
>
> The original design for the stats communication code was based on the
> premise that it's better to drop data than to make backends wait on
> the stats collector.  However, as things have turned out I think this
> notion is a flop: the people who are using stats at all want the stats
> to be reliable.  We've certainly seen plenty of gripes from people who
> are unhappy that backend-exit messages got dropped, and anyone who's
> using autovacuum would really like the tuple update counts to be pretty
> solid too.
>

AFAICS if we can maintain the stats counts solid, then it may hurt 
performance dramatically. Think if we maintain 
pgstat_count_heap_insert()/pgstat_count_heap_delete() pretty well, then we 
get a replacement of count(*). To do so, I believe that will add another 
lock contention on the target table stats.

Regards,
Qingqing 




Re: Stats collector performance improvement

From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2006-01-02 kell 15:20, kirjutas Tom Lane:
> [ moving to -hackers ]
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I did some research on this because the numbers Tom quotes indicate there
> > is something wrong in the way we process stats_command_string
> > statistics.
> > [ ... proposed patch that seems pretty klugy to me ... ]
> 
> I wonder whether we shouldn't consider something more drastic, like
> getting rid of the intermediate stats buffer process entirely.
> 
> The original design for the stats communication code was based on the
> premise that it's better to drop data than to make backends wait on
> the stats collector.  However, as things have turned out I think this
> notion is a flop: the people who are using stats at all want the stats
> to be reliable.  We've certainly seen plenty of gripes from people who
> are unhappy that backend-exit messages got dropped, and anyone who's
> using autovacuum would really like the tuple update counts to be pretty
> solid too.
> 
> If we abandoned the unreliable-communication approach, could we build
> something with less overhead?

Weell, at least it should be non-WAL, and probably non-fsync, at least
optionally . Maybe also inserts inserts + offline aggregator (instead of
updates) to avoid lock contention. Something that collects data in
blocks of local or per-backend shared memory in each backend and then
gives complete blocks to aggregator process. Maybe use 2 alternating
blocks per backend - 1 for ongoing stats collection and another given to
aggregator. this has a little time shift, but will deliver accurate
starts in the end. Things that need up-to-date stats (like
pg_stat_activity), should look (and lock) also the ongoing satas
collection blocks if needed (how do we know know the *if*) and delay
each backend process momentaryly by looking.

-----------------
Hannu




Re: Stats collector performance improvement

From
Tom Lane
Date:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> AFAICS if we can maintain the stats counts solid, then it may hurt 
> performance dramatically. Think if we maintain 
> pgstat_count_heap_insert()/pgstat_count_heap_delete() pretty well, then we 
> get a replacement of count(*).

Not at all.  For one thing, the stats don't attempt to maintain
per-transaction state, so they don't have the MVCC issues of count(*).
I'm not suggesting any fundamental changes in what is counted or when.

The two compromises that were made in the original stats design to make
it fast were (1) stats updates lag behind reality, and (2) some updates
may be missed entirely.  Now that we have a couple of years' field
experience with the code, it seems that (1) is acceptable for real usage
but (2) not so much.  And it's not even clear that we are buying any
performance gain from (2), considering that it's adding the overhead of
passing the data through an extra process.
        regards, tom lane


Re: Stats collector performance improvement

From
Jan Wieck
Date:
On 1/2/2006 3:20 PM, Tom Lane wrote:

> [ moving to -hackers ]
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I did some research on this because the numbers Tom quotes indicate there
>> is something wrong in the way we process stats_command_string
>> statistics.
>> [ ... proposed patch that seems pretty klugy to me ... ]
> 
> I wonder whether we shouldn't consider something more drastic, like
> getting rid of the intermediate stats buffer process entirely.
> 
> The original design for the stats communication code was based on the
> premise that it's better to drop data than to make backends wait on

The original design was geared towards searching for useless/missing 
indexes and tuning activity like that. This never happened, but instead 
people tried to use it as a reliable debugging or access statistics aid 
... which is fine but not what it originally was intended for.

So yes, I think looking at what it usually is used for, a message 
passing system like SysV message queues (puke) or similar would do a 
better job.


Jan

> the stats collector.  However, as things have turned out I think this
> notion is a flop: the people who are using stats at all want the stats
> to be reliable.  We've certainly seen plenty of gripes from people who
> are unhappy that backend-exit messages got dropped, and anyone who's
> using autovacuum would really like the tuple update counts to be pretty
> solid too.
> 
> If we abandoned the unreliable-communication approach, could we build
> something with less overhead?
> 
>             regards, tom lane


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Stats collector performance improvement

From
Simon Riggs
Date:
On Mon, 2006-01-02 at 16:48 -0500, Tom Lane wrote:

> The two compromises that were made in the original stats design to make
> it fast were (1) stats updates lag behind reality, and (2) some updates
> may be missed entirely.  Now that we have a couple of years' field
> experience with the code, it seems that (1) is acceptable for real usage
> but (2) not so much. 

We decided that the stats update had to occur during execution, in case
the statement aborted and row versions were not notified. That means we
must notify things as they happen, yet could use a reliable queuing
system that could suffer a delay in the stats becoming available.

But how often do we lose a backend? Could we simply buffer that a little
better? i.e. don't send message to stats unless we have altered at least
10 rows? So we would buffer based upon the importance of the message,
not the actual size of the message. That way singleton-statements won't
generate the same stats traffic, but we risk losing a buffers worth of
row changes should we crash - everything would still work if we lost a
few small row change notifications.

We can also save lots of cycles on the current statement overhead, which
is currently the worst part of the stats, performance-wise. That
definitely needs redesign. AFAICS we only ever need to know the SQL
statement via the stats system if the statement has been running for
more than a few minutes - the main use case is for an admin to be able
to diagnose a rogue or hung statement. Pushing the statement to stats
every time is just a big overhead. That suggests we should either have a
pull or a deferred push (longer-than-X-secs) approach.

Best Regards, Simon Riggs



Re: Stats collector performance improvement

From
"Jim C. Nasby"
Date:
On Tue, Jan 03, 2006 at 09:40:53AM +0000, Simon Riggs wrote:
> On Mon, 2006-01-02 at 16:48 -0500, Tom Lane wrote:
> We can also save lots of cycles on the current statement overhead, which
> is currently the worst part of the stats, performance-wise. That
> definitely needs redesign. AFAICS we only ever need to know the SQL
> statement via the stats system if the statement has been running for
> more than a few minutes - the main use case is for an admin to be able
> to diagnose a rogue or hung statement. Pushing the statement to stats
> every time is just a big overhead. That suggests we should either have a
> pull or a deferred push (longer-than-X-secs) approach.

I would argue that minutes is too long, but of course this could be
user-adjustable. I suspect that even waiting just a second could be a
huge win, since this only matters if you're executing a lot of
statements and you won't be doing that if those statements are taking
more than a second or two to execute.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Stats collector performance improvement

From
Greg Stark
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:

> I would argue that minutes is too long, but of course this could be
> user-adjustable. I suspect that even waiting just a second could be a
> huge win, since this only matters if you're executing a lot of
> statements and you won't be doing that if those statements are taking
> more than a second or two to execute.

That's not necessarily true at all. You could just as easily have a
performance problem caused by a quick statement that is being executed many
times as a slow statement that is being executed few times.

That is, you could be executing dozens of queries that take seconds or minutes
once a second but none of those might be the problem. The problem might be the
query that's taking only 300ms that you're executing hundreds of of times a
minute.

Moreover, if you're not gathering stats for queries that are fast then how
will you know whether they're performing properly when you look at them when
they do show up?

-- 
greg



Re: Stats collector performance improvement

From
Hannu Krosing
Date:
Ühel kenal päeval, T, 2006-01-03 kell 09:40, kirjutas Simon Riggs:

> We can also save lots of cycles on the current statement overhead, which
> is currently the worst part of the stats, performance-wise. That
> definitely needs redesign. AFAICS we only ever need to know the SQL
> statement via the stats system if the statement has been running for
> more than a few minutes - the main use case is for an admin to be able
> to diagnose a rogue or hung statement. 

Interestingly I use pg_stat_activity view to watch for stuck backends,
"stuck" in the sense that they have not noticed when client want away
and are now waitin the TCP timeout to happen. I query for backends which
have been in "<IDLE>" state for longer than XX seconds. I guess that at
least some kind of indication for this should be available.

Of course this would be much less of a problem if there was a
possibility for sime kind of keepalive system to detect when
client/frontend goes away.

> Pushing the statement to stats
> every time is just a big overhead. That suggests we should either have a
> pull 

I could live with "push", where pg_stat_activity would actually ask each
live backend for its "current query". This surely happens less often
than queries are performed (up to few thousand per sec)

-------------
Hannu




Re: Stats collector performance improvement

From
Greg Stark
Date:
Hannu Krosing <hannu@skype.net> writes:

> Interestingly I use pg_stat_activity view to watch for stuck backends,
> "stuck" in the sense that they have not noticed when client want away
> and are now waitin the TCP timeout to happen. I query for backends which
> have been in "<IDLE>" state for longer than XX seconds. I guess that at
> least some kind of indication for this should be available.

You mean like the tcp_keepalives_idle option?

http://www.postgresql.org/docs/8.1/interactive/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE

-- 
greg



Re: Stats collector performance improvement

From
Hannu Krosing
Date:
Ühel kenal päeval, P, 2006-01-08 kell 11:49, kirjutas Greg Stark:
> Hannu Krosing <hannu@skype.net> writes:
> 
> > Interestingly I use pg_stat_activity view to watch for stuck backends,
> > "stuck" in the sense that they have not noticed when client want away
> > and are now waitin the TCP timeout to happen. I query for backends which
> > have been in "<IDLE>" state for longer than XX seconds. I guess that at
> > least some kind of indication for this should be available.
> 
> You mean like the tcp_keepalives_idle option?
> 
> http://www.postgresql.org/docs/8.1/interactive/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE
> 

Kind of, only I'd like to be able to set timeouts less than 120 minutes.

from:
http://developer.apple.com/documentation/mac/NetworkingOT/NetworkingWOT-390.html#HEADING390-0

kp_timeout       Set the requested timeout value, in minutes. Specify a value of       T_UNSPEC to use the default
value.You may specify any positive       value for this field of 120 minutes or greater. The timeout       value is not
anabsolute requirement; if you specify a value       less than 120 minutes, TCP will renegotiate a timeout of 120
minutes.      
 
-----------
Hannu