Thread: Backend Stats Enhancement Request

Backend Stats Enhancement Request

From
David Miller
Date:
I would like to request a change to the PGBE_ACTIVITY_SIZE #define in the pgstat.h file. This value determines the max
lengthof the SQL query contained in the PgBackendStatus structure. 
 

By increasing the value of this #define to 32768 we are able to capture queries larger than the 1024 default using just
pgAdminor Postgres Studio. I would like to have this change considered for future releases.
 

Thanks,

David Miller
River Systems, Inc.



Re: Backend Stats Enhancement Request

From
Tom Lane
Date:
David Miller <miller392@yahoo.com> writes:
> I would like to request a change to the PGBE_ACTIVITY_SIZE #define in the pgstat.h file. This value determines the
maxlength of the SQL query contained in the PgBackendStatus structure. 
 
> By increasing the value of this #define to 32768 we are able to capture queries larger than the 1024 default using
justpgAdmin or Postgres Studio. I would like to have this change considered for future releases.
 

Highly unlikely that we'd push it so high, because of the amount of
shared memory it would eat.
        regards, tom lane


Re: Backend Stats Enhancement Request

From
David Miller
Date:

David Miller <miller392@yahoo.com> writes:

> I would like to request a change to the PGBE_ACTIVITY_SIZE #define in the pgstat.h file. This value determines the
maxlength of the SQL query contained in the PgBackendStatus structure. 
 
> By increasing the value of this #define to 32768 we are able to capture queries larger than the 1024 default using
justpgAdmin or Postgres Studio. I would like to have this change considered for future releases.
 


> Highly unlikely that we'd push it so high, because of the amount of
> shared memory it would eat.

>            regards, tom lane

That is fine.. Maybe a dynamic configurable parameter that can be set/updated while the database is running. This issue
liesin the fact that we have queries larger than 1K and we would like to be able to capture the entire query from
PostgresStudio without having to process the log files..
 


Thanks,

David Miller


Re: Backend Stats Enhancement Request

From
Alvaro Herrera
Date:
David Miller wrote:

> That is fine.. Maybe a dynamic configurable parameter that can be
> set/updated while the database is running.

If it were a parameter, it could not be changed while the database is
running.

> This issue lies in the fact that we have queries larger than 1K and we
> would like to be able to capture the entire query from Postgres Studio
> without having to process the log files..

Have you considered using CSV logs instead?  Should be easier to
process.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Backend Stats Enhancement Request

From
Decibel!
Date:
On Jun 19, 2008, at 10:26 AM, Alvaro Herrera wrote:
> David Miller wrote:
>
>> That is fine.. Maybe a dynamic configurable parameter that can be
>> set/updated while the database is running.
>
> If it were a parameter, it could not be changed while the database is
> running.
>
>> This issue lies in the fact that we have queries larger than 1K  
>> and we
>> would like to be able to capture the entire query from Postgres  
>> Studio
>> without having to process the log files..
>
> Have you considered using CSV logs instead?  Should be easier to
> process.

Would it be hard to have a backend write it's complete command out to  
a file if the command lasts more than X number of seconds, and then  
allow other backends to read it from there? It is extremely annoying  
to not be able to get the full query contents.

Also, I don't necessarily buy that 32k * max_connections is too much  
shared memory; even with max_connections of 1000 that's only 32M,  
which is trivial for any box that's actually configured for 1000  
connections.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: Backend Stats Enhancement Request

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> Also, I don't necessarily buy that 32k * max_connections is too much  
> shared memory; even with max_connections of 1000 that's only 32M,  
> which is trivial for any box that's actually configured for 1000  
> connections.

That's not where the problem is.  The people who will be left holding
the short end of the stick are the ones who can't raise their SHMMAX
setting past a couple of megabytes.

It might be feasible to make pg_stat_activity's max string length
a postmaster-start-time configuration option.
        regards, tom lane


Re: Backend Stats Enhancement Request

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> Decibel! <decibel@decibel.org> writes:
>
> That's not where the problem is.  The people who will be left holding
> the short end of the stick are the ones who can't raise their SHMMAX
> setting past a couple of megabytes.
> 
> It might be feasible to make pg_stat_activity's max string length
> a postmaster-start-time configuration option.

That would seem to me to be the most prudent course. As much as it is 
important to pay attention to those "who can't raise their SHMMAX 
setting past a couple of megabutes", that is certainly not the majority 
of participants in this community.

Sincerely,

Joshua D. Drake


Re: Backend Stats Enhancement Request

From
David Miller
Date:
> That's not where the problem is.  The people who will be left holding
> the short end of the stick are the ones who can't raise their SHMMAX
> setting past a couple of megabytes.
> 
> It might be feasible to make pg_stat_activity's max string length
> a postmaster-start-time configuration option.

I am fine with a postmaster-start-time configuration option. It is not as flexible as I would like, but would serve the
immediateneed and keep me from having to 
 
patch every release of Postgres we install on boxes.

The load on our production servers really prohibits any kind of processing of the log files locally. We have tried
usingseveral log shipping methods to process the 
 
logs on a machine with fewer running processes. These large queries are generated by a third party tool that we have
verylimited control over. Some of the queries 
 
captured are as large 16K. The queries are poorly written/generated. 

David Miller
River Systems, Inc.


Re: Backend Stats Enhancement Request

From
Thomas Lee
Date:
Hi,

I'm new to the postgresql source, thought I'd try my hand at 
implementing the change suggested (i.e. the GUC-ification of the 
PGBE_ACTIVITY_SIZE constant) to get my hands dirty with the code.

How does this sound:

* A new GUC variable -- "activity_message_size" -- will be introduced
* The PGBE_ACTIVITY_SIZE #define becomes PGBE_DEFAULT_ACTIVITY_SIZE
* Minimum value of PGBE_DEFAULT_ACTIVITY_SIZE, maximum value of INT_MAX?

I'm struggling a little to come up with a decent description of the GUC 
variable -- something along the lines of "Sets the maximum length of 
backend status messages". Any suggestions?

Also: how should we allocate the memory for PgBackendStatus.st_activity? 
I'm guessing it's going to be necessary to keep this in shmem ...

Cheers,
T

David Miller wrote:
>> That's not where the problem is.  The people who will be left holding
>> the short end of the stick are the ones who can't raise their SHMMAX
>> setting past a couple of megabytes.
>>
>> It might be feasible to make pg_stat_activity's max string length
>> a postmaster-start-time configuration option.
>>     
>
> I am fine with a postmaster-start-time configuration option. It is not as flexible as I would like, but would serve
theimmediate need and keep me from having to 
 
> patch every release of Postgres we install on boxes.
>
> The load on our production servers really prohibits any kind of processing of the log files locally. We have tried
usingseveral log shipping methods to process the 
 
> logs on a machine with fewer running processes. These large queries are generated by a third party tool that we have
verylimited control over. Some of the queries 
 
> captured are as large 16K. The queries are poorly written/generated. 
>
>
>  David Miller
> River Systems, Inc.
>
>   



Re: Backend Stats Enhancement Request

From
Tom Lane
Date:
Thomas Lee <tom@vector-seven.com> writes:
> How does this sound:

> * A new GUC variable -- "activity_message_size" -- will be introduced

Well, "message" doesn't seem quite le mot juste to me for a column that
is displaying a SQL command.  Usually we'd use "statement", "command",
or "query" to refer to one of those things.  Since the relevant column
of pg_stat_activity is already named "current_query", perhaps the
best choice is "activity_query_size".  Or "activity_query_length"?

Another consideration is that it might be a good idea to name it to
be obviously related to the controlling "track_activities" boolean.
That would lead to "track_activity_query_size", or
"track_activity_max_length", or some such.

> * Minimum value of PGBE_DEFAULT_ACTIVITY_SIZE, maximum value of INT_MAX?

I was thinking about a range of 100 to 100K or thereabouts.  INT_MAX
is just silly...

> I'm struggling a little to come up with a decent description of the GUC 
> variable -- something along the lines of "Sets the maximum length of 
> backend status messages". Any suggestions?

Be specific:
"Sets the maximum length of pg_stat_activity.current_query."

> Also: how should we allocate the memory for PgBackendStatus.st_activity? 
> I'm guessing it's going to be necessary to keep this in shmem ...

Yup.  Look at existing variable-size shmem allocations.
max_prepared_transactions might be a good reference, since it's not
used in very many places.
        regards, tom lane


Re: Backend Stats Enhancement Request

From
Decibel!
Date:
On Jun 20, 2008, at 9:49 AM, Tom Lane wrote:
>> * Minimum value of PGBE_DEFAULT_ACTIVITY_SIZE, maximum value of  
>> INT_MAX?
>
> I was thinking about a range of 100 to 100K or thereabouts.  INT_MAX
> is just silly...


I realize we just got rid of stats_command_string, but if we're  
adding a GUC back in we might as well allow it to be set to 0 which  
disables logging.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: Backend Stats Enhancement Request

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> I realize we just got rid of stats_command_string, but if we're  
> adding a GUC back in we might as well allow it to be set to 0 which  
> disables logging.

How would that not duplicate track_activities?
        regards, tom lane


Re: Backend Stats Enhancement Request

From
Decibel!
Date:
On Jun 20, 2008, at 11:48 AM, Tom Lane wrote:
> Decibel! <decibel@decibel.org> writes:
>> I realize we just got rid of stats_command_string, but if we're
>> adding a GUC back in we might as well allow it to be set to 0 which
>> disables logging.
>
> How would that not duplicate track_activities?


Sorry, I thought there was more rolled into that than just  
current_query.

I know this is quite a bit of churn here, but ISTM we should  
deprecate track_activities in favor of setting the new size GUC to 0.  
Unless folks are really tied to being able to control that without a  
restart...
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: Backend Stats Enhancement Request

From
Robert Treat
Date:
On Friday 20 June 2008 10:49:49 Tom Lane wrote:
> Thomas Lee <tom@vector-seven.com> writes:
> > How does this sound:
> >
> > * A new GUC variable -- "activity_message_size" -- will be introduced
>
> Well, "message" doesn't seem quite le mot juste to me for a column that
> is displaying a SQL command.  Usually we'd use "statement", "command",
> or "query" to refer to one of those things.  Since the relevant column
> of pg_stat_activity is already named "current_query", perhaps the
> best choice is "activity_query_size".  Or "activity_query_length"?
>
> Another consideration is that it might be a good idea to name it to
> be obviously related to the controlling "track_activities" boolean.
> That would lead to "track_activity_query_size", or
> "track_activity_max_length", or some such.
>
> > * Minimum value of PGBE_DEFAULT_ACTIVITY_SIZE, maximum value of INT_MAX?
>
> I was thinking about a range of 100 to 100K or thereabouts.  INT_MAX
> is just silly...
>
> > I'm struggling a little to come up with a decent description of the GUC
> > variable -- something along the lines of "Sets the maximum length of
> > backend status messages". Any suggestions?
>
> Be specific:
> "Sets the maximum length of pg_stat_activity.current_query."
>

I think there are other places this might manifest itself besides 
pg_stat_activity... I'm struggling to come up with something other than our 
custom dtrace prob... ah, well, this will also control the size of statement 
written into the logfile right? So we might want to take that into account. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Backend Stats Enhancement Request

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> On Friday 20 June 2008 10:49:49 Tom Lane wrote:
>> Be specific:
>> "Sets the maximum length of pg_stat_activity.current_query."

> I think there are other places this might manifest itself besides 
> pg_stat_activity...

No, there aren't.
I'm struggling to come up with something other than our 
> custom dtrace prob... ah, well, this will also control the size of statement 
> written into the logfile right?

And *certainly* not that.
        regards, tom lane


Re: Backend Stats Enhancement Request

From
Thomas Lee
Date:
Thanks for the feedback Tom. An initial patch for this has been posted 
to pgsql-patches.

Cheers,
T

Tom Lane wrote:
> Thomas Lee <tom@vector-seven.com> writes:
>   
>> How does this sound:
>>     
>
>   
>> * A new GUC variable -- "activity_message_size" -- will be introduced
>>     
>
> Well, "message" doesn't seem quite le mot juste to me for a column that
> is displaying a SQL command.  Usually we'd use "statement", "command",
> or "query" to refer to one of those things.  Since the relevant column
> of pg_stat_activity is already named "current_query", perhaps the
> best choice is "activity_query_size".  Or "activity_query_length"?
>
> Another consideration is that it might be a good idea to name it to
> be obviously related to the controlling "track_activities" boolean.
> That would lead to "track_activity_query_size", or
> "track_activity_max_length", or some such.
>
>   
>> * Minimum value of PGBE_DEFAULT_ACTIVITY_SIZE, maximum value of INT_MAX?
>>     
>
> I was thinking about a range of 100 to 100K or thereabouts.  INT_MAX
> is just silly...
>
>   
>> I'm struggling a little to come up with a decent description of the GUC 
>> variable -- something along the lines of "Sets the maximum length of 
>> backend status messages". Any suggestions?
>>     
>
> Be specific:
> "Sets the maximum length of pg_stat_activity.current_query."
>
>   
>> Also: how should we allocate the memory for PgBackendStatus.st_activity? 
>> I'm guessing it's going to be necessary to keep this in shmem ...
>>     
>
> Yup.  Look at existing variable-size shmem allocations.
> max_prepared_transactions might be a good reference, since it's not
> used in very many places.
>
>             regards, tom lane
>
>   



Re: Backend Stats Enhancement Request

From
Alvaro Herrera
Date:
Thomas Lee wrote:
> Thanks for the feedback Tom. An initial patch for this has been posted  
> to pgsql-patches.

Thanks for the patch.  I have added it to the current Commitfest wiki
page.  If you plan to continue sending patches, please make sure you get
an account to do that yourself.  Thanks.

http://wiki.postgresql.org/wiki/CommitFest:2008-07

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support