Thread: Configurable Additional Stats

Configurable Additional Stats

From
"Simon Riggs"
Date:
I've got a requirement to produce some additional stats from the server
while it executes. Specifically, I'm looking at table interaction stats
to make it easier to determine replication sets accurately for a given
transaction mix.

On brief discussion, seems like a good approach would be to put in a
user exit/plugin/hook in AtEOXact_PgStat(). That way we don't need to
add more log_* parameters for this and every additional need.

Something like this...

if (stats_hook)(* stats_hook)(pgStatTabList);

Any objections to sliding this in?

BTW, do we have a section in the docs on what plugin points are now
available? 

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Configurable Additional Stats

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> if (stats_hook)
>     (* stats_hook)(pgStatTabList);

> Any objections to sliding this in?

Only that it's useless.  What are you going to do in such a hook?
Not send more info to the stats collector, because the message format
is predetermined.  AFAICS, if you want to extend the set of stats
collected, you need much more invasive changes than this, and are
really going to have to resort to changing the server source code.
        regards, tom lane


Re: Configurable Additional Stats

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> if (stats_hook)
>>     (* stats_hook)(pgStatTabList);
>
>> Any objections to sliding this in?
>
> Only that it's useless.  What are you going to do in such a hook?

Simon left for camping before you sent this. My understanding is he wants to
gather the data elsewhere. So either he'll just elog when it and grovel the
logs or send it via some other method. Alternatively he could aggregate the
data himself and save it in a file kind of gprof-style.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Configurable Additional Stats

From
Dave Page
Date:
Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>>> if (stats_hook)
>>>     (* stats_hook)(pgStatTabList);
>>> Any objections to sliding this in?
>> Only that it's useless.  What are you going to do in such a hook?
> 
> Simon left for camping before you sent this. My understanding is he wants to
> gather the data elsewhere. So either he'll just elog when it and grovel the
> logs or send it via some other method. Alternatively he could aggregate the
> data himself and save it in a file kind of gprof-style.
> 

Yes, it's not intended to insert more stats, but to get the raw data out
for external analysis during development and testing of applications and
systems etc.

Regards, Dave.


Re: Configurable Additional Stats

From
Tom Lane
Date:
Dave Page <dpage@postgresql.org> writes:
> Yes, it's not intended to insert more stats, but to get the raw data out
> for external analysis during development and testing of applications and
> systems etc.

Mph --- the proposal was very poorly titled then.  In any case, it still
sounds like a one-off hack that would be equally well served by a local
patch.
        regards, tom lane


Re: Configurable Additional Stats

From
Dave Page
Date:
Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
>> Yes, it's not intended to insert more stats, but to get the raw data out
>> for external analysis during development and testing of applications and
>> systems etc.
> 
> Mph --- the proposal was very poorly titled then.  In any case, it still
> sounds like a one-off hack that would be equally well served by a local
> patch.

It's certainly not intended as a one-off hack, but as a way of analysing
the behaviour of existing and new applications. It would certainly be
useful for us in the future, and most likely others as well once a
plugin or two has been released.

Regards, Dave.


Re: Configurable Additional Stats

From
Tom Lane
Date:
Dave Page <dpage@postgresql.org> writes:
> Tom Lane wrote:
>> Mph --- the proposal was very poorly titled then.  In any case, it still
>> sounds like a one-off hack that would be equally well served by a local
>> patch.

> It's certainly not intended as a one-off hack, but as a way of analysing
> the behaviour of existing and new applications. It would certainly be
> useful for us in the future, and most likely others as well once a
> plugin or two has been released.

I'd be more interested if I saw a believable spec for a plugin using
this.  So far it sounds terribly badly designed --- for starters,
apparently it's intending to ignore the stats aggregation/reporting
infrastructure and reinvent that wheel in some unspecified way, for
unspecified reasons.  If you think you can improve on that mechanism
(which I'm prepared to believe, though I'd like to see some details)
why wouldn't the proposed hook include a way to turn off the overhead
of the regular stats collector?  If that's not the point, then what
is the point?
        regards, tom lane


Re: Configurable Additional Stats

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> So far it sounds terribly badly designed --- for starters, apparently it's
> intending to ignore the stats aggregation/reporting infrastructure and
> reinvent that wheel in some unspecified way, for unspecified reasons.

One way to accomplish the original goal by using the stats
aggregation/reporting infrastructure directly would be to add a stats_domain
guc which stats messages get tagged with. So you could have each application
domain set the guc to a different value and have the stats collector keep
table stats separately for each stats_domain/table pair.

That could be interesting for other purposes such as marking a single
transaction with a new stats_domain so you can look at the stats for just that
transaction.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Configurable Additional Stats

From
Dave Page
Date:
Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
>> Tom Lane wrote:
>>> Mph --- the proposal was very poorly titled then.  In any case, it still
>>> sounds like a one-off hack that would be equally well served by a local
>>> patch.
> 
>> It's certainly not intended as a one-off hack, but as a way of analysing
>> the behaviour of existing and new applications. It would certainly be
>> useful for us in the future, and most likely others as well once a
>> plugin or two has been released.
> 
> I'd be more interested if I saw a believable spec for a plugin using
> this.  So far it sounds terribly badly designed --- for starters,
> apparently it's intending to ignore the stats aggregation/reporting
> infrastructure and reinvent that wheel in some unspecified way, for
> unspecified reasons.  If you think you can improve on that mechanism
> (which I'm prepared to believe, though I'd like to see some details)
> why wouldn't the proposed hook include a way to turn off the overhead
> of the regular stats collector?  If that's not the point, then what
> is the point?

For any kind of design we'll need to wait for Simon, but the initial
application we were discussing was to allow an unknown/closed source
user app to be traced to reveal it's usage patterns for each relation
in the database. By monitoring at that point we can look at data for
that specific backend (on which we are appropriately exercising the
app), and iirc Simon also said we could see the activity on a
per-transaction basis.

As a disclaimer, I reserve the right to have misremembered some of that
- someone (<cough>Magnus</cough>) kept distracting me on IM during the
conversation :-)

Regards, Dave.


Re: Configurable Additional Stats

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> One way to accomplish the original goal by using the stats
> aggregation/reporting infrastructure directly would be to add a stats_domain
> guc which stats messages get tagged with. So you could have each application
> domain set the guc to a different value and have the stats collector keep
> table stats separately for each stats_domain/table pair.

> That could be interesting for other purposes such as marking a single
> transaction with a new stats_domain so you can look at the stats for
> just that transaction.

Hmm.  That has some possibilities.  You'd want a way to get the stats
collector to discard a domain when you didn't want those numbers
anymore, but that seems doable enough.  Also, most likely you'd want
the collector to keep both "global" and per-domain counters, else
resetting a domain loses state --- which'd be bad from autovac's
point of view, if nothing else.
        regards, tom lane


Re: Configurable Additional Stats

From
"Simon Riggs"
Date:
On Fri, 2007-06-29 at 14:43 -0400, Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
> > Yes, it's not intended to insert more stats, but to get the raw data out
> > for external analysis during development and testing of applications and
> > systems etc.
> 
> Mph --- the proposal was very poorly titled then.  In any case, it still
> sounds like a one-off hack that would be equally well served by a local
> patch.

Well, I want it to a) be configurable b) provide additional stats, so
the title was fine, but we can call this whatever you like; I don't have
a fancy name for it.

The purpose is to get access to the stats data while we still know the
username, transactionId and other information. Once it is sent to the
stats collector it is anonymised and summarised.

Examples of the potential uses of such plug-ins would be:

1) Which tables have been touched by this transaction? The purpose of
this is to profile table interactions to allow:
i) an accurate assessment of the replication sets for use with Slony. If
you define the replication set incorrectly then you may not be able to
recover all of your data. 
ii) determining whether it is possible to split a database that serves
two applications into two distinct databases (or not), allowing you to
scale out the Data Tier in a Service Oriented Application.

2) Charge-back accounting. Keep track by userid, user group, time of
access etc of all accesses to the system, so we can provide chargeback
facilities to users. You can put your charging rules into the plugin and
have it spit out appropriate chargeback log records, when/if required.
e.g. log a chargeback record every time a transaction touches > 100
blocks, to keep track of heavy queries but ignore OLTP workloads.

3) Tracing individual transaction types, as Greg suggests.

4) Anything else you might dream up...

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Configurable Additional Stats

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> 2) Charge-back accounting. Keep track by userid, user group, time of
> access etc of all accesses to the system, so we can provide chargeback
> facilities to users. You can put your charging rules into the plugin and
> have it spit out appropriate chargeback log records, when/if required.
> e.g. log a chargeback record every time a transaction touches > 100
> blocks, to keep track of heavy queries but ignore OLTP workloads.

Sure, but I think Tom's question is how do you get from the plugin to wherever
you want this data to be? There's not much you can do with the data at that
point. You would end up having to reconstruct the entire stats collector
infrastructure to ship the data you want out via some communication channel
and then aggregate it somewhere else.

Perhaps your plugin entry point is most useful *alongside* my stats-domain
idea. If you wanted to you could write a plugin which set the stats domain
based on whatever criteria you want whether that's time-of-day, userid, load
on the system, etc.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Configurable Additional Stats

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Sure, but I think Tom's question is how do you get from the plugin to wherever
> you want this data to be? There's not much you can do with the data at that
> point. You would end up having to reconstruct the entire stats collector
> infrastructure to ship the data you want out via some communication channel
> and then aggregate it somewhere else.

Right, and I don't see any reasonable way for a plug-in to establish
such an infrastructure --- how's it going to cause the postmaster to
shepherd a second stats collector process, for instance?

The proposal seems to be in the very early handwaving stage, because
issues like this obviously haven't been thought about.  I would suggest
building a working prototype plugin, and then you'll really know what
hooks you need.  (Comparison point: we'd never have invented the correct
hooks for the index advisor if we'd tried to define them in advance of
having rough working code to look at.)

> Perhaps your plugin entry point is most useful *alongside* my stats-domain
> idea. If you wanted to you could write a plugin which set the stats domain
> based on whatever criteria you want whether that's time-of-day, userid, load
> on the system, etc.

+1.  I'm also thinking that hooks inside the stats collector process
itself might be needed, though I have no idea exactly what.
        regards, tom lane


Re: Configurable Additional Stats

From
"Simon Riggs"
Date:
On Mon, 2007-07-02 at 17:41 +0100, Gregory Stark wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> 
> > 2) Charge-back accounting. Keep track by userid, user group, time of
> > access etc of all accesses to the system, so we can provide chargeback
> > facilities to users. You can put your charging rules into the plugin and
> > have it spit out appropriate chargeback log records, when/if required.
> > e.g. log a chargeback record every time a transaction touches > 100
> > blocks, to keep track of heavy queries but ignore OLTP workloads.
> 
> Sure, but I think Tom's question is how do you get from the plugin to wherever
> you want this data to be? There's not much you can do with the data at that
> point. You would end up having to reconstruct the entire stats collector
> infrastructure to ship the data you want out via some communication channel
> and then aggregate it somewhere else.

I just want to LOG a few extra pieces of information in this simplest
possible way, <sigh/>

There are no more steps in that process than there are for using
log_min_duration_statement and a performance analysis tool.
Outside-the-dbms processing is already required to use PostgreSQL
effectively, so this can't be an argument against the logging of
additional stats. Logging to the dbms means we have to change table
definitions etc, which will ultimately not work as well.

> Perhaps your plugin entry point is most useful *alongside* my stats-domain
> idea. If you wanted to you could write a plugin which set the stats domain
> based on whatever criteria you want whether that's time-of-day, userid, load
> on the system, etc.

Your stats domain idea is great, but it doesn't solve my problem (1). I
don't want this solved, I *need* it solved, since there's no other way
to get this done accurately with a large and complex application.

We could just go back to havinglog_tables_in_transaction = on | off
which would produce output like this:

LOG:  transaction-id: 3456 table list {32456, 37456, 85345, 19436}

I don't expect everybody to like that, but its what I want, so I'm
proposing it in a way that is more acceptable. If somebody has a better
way of doing this, please say. The plugin looks pretty darn simple to
me... and hurts nobody.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com