Thread: Re: Auto-explain patch

Re: Auto-explain patch

From
Dean Rasheed
Date:
OK, this one should work against CVS HEAD.
 
Dean.


> Subject: Re: [HACKERS] Auto-explain patch
> From: simon@2ndquadrant.com
> To: dean_rasheed@hotmail.com
> CC: pgsql-hackers@postgresql.org
> Date: Sun, 6 Jul 2008 16:42:55 +0100
>
>
> On Thu, 2008-07-03 at 16:58 +0000, Dean Rasheed wrote:
> > Here is an updated version of the patch
>
> Dean,
>
> I'm getting 4 chunks fail when trying to apply your patch onto CVS HEAD.
>
> I'm sure its just a little bitrot. Can you update the patch please?
>
> Thanks,
>
> --
> Simon Riggs www.2ndQuadrant.com
> PostgreSQL Training, Services and Support
>



Win £3000 to spend on whatever you want at Uni! Click here to WIN!
Attachment

Re: Auto-explain patch

From
Simon Riggs
Date:
On Sun, 2008-07-06 at 17:58 +0000, Dean Rasheed wrote:

> OK, this one should work against CVS HEAD.

OK, still getting some offsets, but it applies.

The patch now outputs things in a useful way to avoid a flood of
information, that's good.

The code seems fine, but it doesn't link up with the other logging code
in postgres.c. That split also seems to drive the way the parameters
work, so you have another set of parameters mirroring the
log_min_duration_statement infrastructure. Fairly yuk, but I understand
why you've done this. It does however beg some serious questions about
the way we currently do logging.

The patch as it stands would produce EXPLAIN output for queries that
will never be logged, which isn't very useful. So Dean, right now I feel
this version of the patch must be rejected while we discuss how things
might work differently. Not an issue with the patch, so much as issues
with the code being patched.

So thinking about the issues we can either ignore them or tackle them:

Ignoring problem option: Rewrite this patch as an executor plugin, using
Itagaki Takahiro's new infrastructure. It's an option, but I guess Dave
will have something to say about that...

Fix problems: Having logging in two separate places is ugly and leads to
the problem of logging EXPLAINs and SQL completely separately, which
isn't very useful.

If we put all the logging code in the executor we miss out any parsing
and planning time but we can log statements and explains for SPI queries
also. 

We could move the logging of the EXPLAIN to postgres.c, but then we'd
need to make querydescs live a little longer than desirable, or we
trigger the EXPLAIN code using some guess at what the planning overhead
was, a horrible kluge.

Neither way is clean. So how do we do it?

My proposal would be to introduce some new logging statements that work
similarly to, yet as an alternative to log_min_duration_statement.

*  log_min_planner_duration - logs any statement whose planning time
exceeds the value in ms (-1 is off).
e.g. LOG  planner time: %s ms  statement: %s 

*  log_min_executor_duration - logs any execution whose execution time
exceeds the value in ms (-1 is off). We need to pass query text through
to the executor, depending upon whether its a portal's source text or an
incoming simple querystring. (This is similar, but not quite the same as
current patch's debug_explain_min_duration).
e.g. LOG  executor time: %s ms  statement: %s 

*  log_explain = on | off - logs EXPLAIN if either
log_min_executor_duration or log_min_planner_duration triggers. We
output the explain on the same line as the SQL statement, so we don't
need to play games to link up the output later. (This is similar to, but
not quite the same as current patch's debug_explain_plan).
e.g. LOG  executor time: %s ms  statement: %s 

If both log_min_executor_duration and log_min_planner_duration trigger
then we get only one log line like this:
e.g. LOG  planner time: %s ms  executor time: %s ms  statement: %s 

We should call these parameters log_xxx not debug_xxx because we want
them for production logging.

If either of these new parameters is > -1 then
log_min_duration_statement does not trigger. (Or we simply make
log_min_duration_statement control the setting of the other two
parameters).

The log messages shown above allow us to differentiate between the
reasons the SQL statement has been logged, as well as providing the
details we want.

If we do this then:

* we can log EXPLAINs for any execution, plus we will always have the
SQL for any EXPLAIN

* we can log statements that take a long time to plan, yet a short time
to execute. That can be an overhead in some cases. It also neatly
sidelines the issue of where logging takes place. (We need to be able to
track planner time, if we think we might want to introduce options to
control planner effort, topic discussed on other recent thread, not
here)

* we can log SQL for any execution, even those via SPI. That means we'll
be able to log the SQL used in functions.

We'll probably need an option to log SQL from SPI or not. Things like RI
trigger code probably doesn't need to be logged, especially if the top
level statement is, like ALTER TABLE.

This alters current performance logging, so I expect discussion.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Auto-explain patch

From
Dean Rasheed
Date:
Simon, I like your proposal, and I think I can see how to code it
fairly easily.

There is one thing that it doesn't allow, however, which the debug_xxx
parameters do, and that is for a non-superuser to trace SQL used in
functions, from an interactive client session. For me, this is quite a
big thing, because I find it most convienient to turn these parameters
on while writing and tweaking stored procedures, and have the output
go straight to my psql window, without having to connect as a superuser
and trawl through log files.

So I suggest grouping these parameters in their own category
(eg. "sql_trace") and then having additional parameters to control
where the output would go. So the sql_trace parameters would be:

* sql_trace_min_planner_duration
* sql_trace_min_executor_duration
* sql_trace_explain_plan

and they would work exactly as you describe, except they would be
settable by normal users. Then the destination(s) for the statement
and EXPLAIN logging would be controlled by:

* client_sql_trace = on | off - settable by a normal user to allow a
client session to see the sql_trace output. If this parameter is on,
the sql_trace will be logged as NOTICE output.

* log_sql_trace = on | off - settable only by superusers. If this
parameter is on, the sql_trace will be logged as LOG output, and the
normal client sessions will see nothing (unless they set client_sql_trace).

If both client_sql_trace and log_sql_trace are on, then the output
goes to both locations, and if both are off, then the sql_trace
parameters would do nothing.

Dean


> Subject: RE: [HACKERS] Auto-explain patch
> From: simon@2ndquadrant.com
> To: dean_rasheed@hotmail.com
> CC: pgsql-hackers@postgresql.org
> Date: Mon, 7 Jul 2008 18:03:15 +0100
>
>
> On Sun, 2008-07-06 at 17:58 +0000, Dean Rasheed wrote:
>
>> OK, this one should work against CVS HEAD.
>
> OK, still getting some offsets, but it applies.
>
> The patch now outputs things in a useful way to avoid a flood of
> information, that's good.
>
> The code seems fine, but it doesn't link up with the other logging code
> in postgres.c. That split also seems to drive the way the parameters
> work, so you have another set of parameters mirroring the
> log_min_duration_statement infrastructure. Fairly yuk, but I understand
> why you've done this. It does however beg some serious questions about
> the way we currently do logging.
>
> The patch as it stands would produce EXPLAIN output for queries that
> will never be logged, which isn't very useful. So Dean, right now I feel
> this version of the patch must be rejected while we discuss how things
> might work differently. Not an issue with the patch, so much as issues
> with the code being patched.
>
> So thinking about the issues we can either ignore them or tackle them:
>
> Ignoring problem option: Rewrite this patch as an executor plugin, using
> Itagaki Takahiro's new infrastructure. It's an option, but I guess Dave
> will have something to say about that...
>
> Fix problems: Having logging in two separate places is ugly and leads to
> the problem of logging EXPLAINs and SQL completely separately, which
> isn't very useful.
>
> If we put all the logging code in the executor we miss out any parsing
> and planning time but we can log statements and explains for SPI queries
> also.
>
> We could move the logging of the EXPLAIN to postgres.c, but then we'd
> need to make querydescs live a little longer than desirable, or we
> trigger the EXPLAIN code using some guess at what the planning overhead
> was, a horrible kluge.
>
> Neither way is clean. So how do we do it?
>
> My proposal would be to introduce some new logging statements that work
> similarly to, yet as an alternative to log_min_duration_statement.
>
> * log_min_planner_duration - logs any statement whose planning time
> exceeds the value in ms (-1 is off).
> e.g. LOG planner time: %s ms statement: %s
>
> * log_min_executor_duration - logs any execution whose execution time
> exceeds the value in ms (-1 is off). We need to pass query text through
> to the executor, depending upon whether its a portal's source text or an
> incoming simple querystring. (This is similar, but not quite the same as
> current patch's debug_explain_min_duration).
> e.g. LOG executor time: %s ms statement: %s
>
> * log_explain = on | off - logs EXPLAIN if either
> log_min_executor_duration or log_min_planner_duration triggers. We
> output the explain on the same line as the SQL statement, so we don't
> need to play games to link up the output later. (This is similar to, but
> not quite the same as current patch's debug_explain_plan).
> e.g. LOG executor time: %s ms statement: %s
>
> If both log_min_executor_duration and log_min_planner_duration trigger
> then we get only one log line like this:
> e.g. LOG planner time: %s ms executor time: %s ms statement: %s
>
> We should call these parameters log_xxx not debug_xxx because we want
> them for production logging.
>
> If either of these new parameters is> -1 then
> log_min_duration_statement does not trigger. (Or we simply make
> log_min_duration_statement control the setting of the other two
> parameters).
>
> The log messages shown above allow us to differentiate between the
> reasons the SQL statement has been logged, as well as providing the
> details we want.
>
> If we do this then:
>
> * we can log EXPLAINs for any execution, plus we will always have the
> SQL for any EXPLAIN
>
> * we can log statements that take a long time to plan, yet a short time
> to execute. That can be an overhead in some cases. It also neatly
> sidelines the issue of where logging takes place. (We need to be able to
> track planner time, if we think we might want to introduce options to
> control planner effort, topic discussed on other recent thread, not
> here)
>
> * we can log SQL for any execution, even those via SPI. That means we'll
> be able to log the SQL used in functions.
>
> We'll probably need an option to log SQL from SPI or not. Things like RI
> trigger code probably doesn't need to be logged, especially if the top
> level statement is, like ALTER TABLE.
>
> This alters current performance logging, so I expect discussion.
>
> --
> Simon Riggs www.2ndQuadrant.com
> PostgreSQL Training, Services and Support
>

_________________________________________________________________
Invite your Facebook friends to chat on Messenger
http://clk.atdmt.com/UKM/go/101719649/direct/01/

Re: Auto-explain patch

From
ITAGAKI Takahiro
Date:
Dean Rasheed <dean_rasheed@hotmail.com> wrote:

> * client_sql_trace = on | off - settable by a normal user to allow a
> client session to see the sql_trace output. If this parameter is on,
> the sql_trace will be logged as NOTICE output.

In terms of security, is it ok to show normal users SQLs used in functions
that are owned by other users? Users can call not-owned functions only if
they have EXECUTE privilege on them. -- presently we can see function
bodies from pg_proc.prosrc freely, though.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Auto-explain patch

From
Simon Riggs
Date:
On Wed, 2008-07-09 at 09:11 +0000, Dean Rasheed wrote:
> Simon, I like your proposal, and I think I can see how to code it
> fairly easily.
> 
> There is one thing that it doesn't allow, however, which the debug_xxx
> parameters do, and that is for a non-superuser to trace SQL used in
> functions, from an interactive client session. For me, this is quite a
> big thing, because I find it most convienient to turn these parameters
> on while writing and tweaking stored procedures, and have the output
> go straight to my psql window, without having to connect as a superuser
> and trawl through log files.

Understood.

> So I suggest grouping these parameters in their own category
> (eg. "sql_trace") and then having additional parameters to control
> where the output would go. So the sql_trace parameters would be:
> 
> * sql_trace_min_planner_duration
> * sql_trace_min_executor_duration
> * sql_trace_explain_plan
> 
> and they would work exactly as you describe, except they would be
> settable by normal users. 

This is already possible, if your crafty.

> Then the destination(s) for the statement
> and EXPLAIN logging would be controlled by:
> 
> * client_sql_trace = on | off - settable by a normal user to allow a
> client session to see the sql_trace output. If this parameter is on,
> the sql_trace will be logged as NOTICE output.

Just set client_min_messages = 'LOG';

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Auto-explain patch

From
"Marko Kreen"
Date:
On 7/9/08, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
>  Dean Rasheed <dean_rasheed@hotmail.com> wrote:
> > * client_sql_trace = on | off - settable by a normal user to allow a
>  > client session to see the sql_trace output. If this parameter is on,
>  > the sql_trace will be logged as NOTICE output.
>
>
> In terms of security, is it ok to show normal users SQLs used in functions
>  that are owned by other users? Users can call not-owned functions only if
>  they have EXECUTE privilege on them. -- presently we can see function
>  bodies from pg_proc.prosrc freely, though.

Different owner is not a problem, but SECURITY DEFINER may be.

That can be solved by turning the setting off on entry to such function,
by non-superuser.  Like we handle search_path.

-- 
marko


Re: Auto-explain patch

From
Dean Rasheed
Date:
> Just set client_min_messages = 'LOG';

True, but you would still need to be a superuser to to set the min_durations and
explain parameters. The other advantage of client_sql_trace is that you could
debug your own functions without filling up the log file. It would work better for
multiple users sharing the same DB.

Dean.

_________________________________________________________________
Play and win great prizes with Live Search and Kung Fu Panda
http://clk.atdmt.com/UKM/go/101719966/direct/01/

Re: Auto-explain patch

From
Dean Rasheed
Date:
Of course you can still sort of see the SQL used in functions declared
SECURITY DEFINER, using debug_print_parse, but this would be opening
up a much more transparent way to do it.

Do I need to worry about this?

Dean

----------------------------------------
> Date: Wed, 9 Jul 2008 14:22:45 +0300
> From: markokr@gmail.com
> To: itagaki.takahiro@oss.ntt.co.jp
> Subject: Re: [HACKERS] Auto-explain patch
> CC: dean_rasheed@hotmail.com; simon@2ndquadrant.com; pgsql-hackers@postgresql.org
>
> On 7/9/08, ITAGAKI Takahiro  wrote:
>>  Dean Rasheed  wrote:
>>> * client_sql_trace = on | off - settable by a normal user to allow a
>> > client session to see the sql_trace output. If this parameter is on,
>> > the sql_trace will be logged as NOTICE output.
>>
>>
>> In terms of security, is it ok to show normal users SQLs used in functions
>>  that are owned by other users? Users can call not-owned functions only if
>>  they have EXECUTE privilege on them. -- presently we can see function
>>  bodies from pg_proc.prosrc freely, though.
>
> Different owner is not a problem, but SECURITY DEFINER may be.
>
> That can be solved by turning the setting off on entry to such function,
> by non-superuser.  Like we handle search_path.
>
> --
> marko

_________________________________________________________________
Find the best and worst places on the planet
http://clk.atdmt.com/UKM/go/101719807/direct/01/

Re: Auto-explain patch

From
"Marko Kreen"
Date:
On 7/9/08, Dean Rasheed <dean_rasheed@hotmail.com> wrote:
>  Of course you can still sort of see the SQL used in functions declared
>  SECURITY DEFINER, using debug_print_parse, but this would be opening
>  up a much more transparent way to do it.
>
>  Do I need to worry about this?

If this allows to see values, then yes.  Otherwise no.

-- 
marko


Re: Auto-explain patch

From
Tom Lane
Date:
"Marko Kreen" <markokr@gmail.com> writes:
> On 7/9/08, Dean Rasheed <dean_rasheed@hotmail.com> wrote:
>> Do I need to worry about this?

> If this allows to see values, then yes.  Otherwise no.

It definitely would open a hole that doesn't exist now, which is to
see values that are inserted into an EXECUTE'd query.
        regards, tom lane


Re: Auto-explain patch

From
Simon Riggs
Date:
On Wed, 2008-07-09 at 12:01 +0000, Dean Rasheed wrote:
> > Just set client_min_messages = 'LOG';
> 
> True, but you would still need to be a superuser to to set the min_durations and
> explain parameters. 

No

> The other advantage of client_sql_trace is that you could
> debug your own functions without filling up the log file. It would work better for
> multiple users sharing the same DB.

Not sure about that. We have debuggers and dbms_put_line already.

I think it would be better to introduce a new logging category than to
do something just specifically for this situation. 

Suggest new level=TRACE. This allows messages which are not typically
logged by server, yet are specifically requested by client. Admin has
the option to log the messages if desired.

Server log sequence is
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, ERROR,
*TRACE*, LOG, FATAL, and PANIC.

Rationale for placement: Default is LOG, so TRACE messages not logged by
default. Sometimes set to ERROR. Setting at this position allows full
system trace to be taken if required, then reset when trace complete.

Client log Sequence is
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, LOG, NOTICE, *TRACE*, WARNING,
ERROR, FATAL, and PANIC.

Rationale for placement: many NOTICE messages are generated by various
commands, so many people can still set this off and still get TRACE
messages.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Auto-explain patch

From
Simon Riggs
Date:
On Wed, 2008-07-09 at 09:11 +0000, Dean Rasheed wrote:
> Simon, I like your proposal, and I think I can see how to code it
> fairly easily.
> 
> There is one thing that it doesn't allow, however, which the debug_xxx
> parameters do, and that is for a non-superuser to trace SQL used in
> functions, from an interactive client session. For me, this is quite a
> big thing, because I find it most convienient to turn these parameters
> on while writing and tweaking stored procedures, and have the output
> go straight to my psql window, without having to connect as a superuser
> and trawl through log files.
> 
> So I suggest grouping these parameters in their own category
> (eg. "sql_trace") and then having additional parameters to control
> where the output would go. So the sql_trace parameters would be:
> 
> * sql_trace_min_planner_duration
> * sql_trace_min_executor_duration
> * sql_trace_explain_plan
> 
> and they would work exactly as you describe, except they would be
> settable by normal users. Then the destination(s) for the statement
> and EXPLAIN logging would be controlled by:
> 
> * client_sql_trace = on | off - settable by a normal user to allow a
> client session to see the sql_trace output. If this parameter is on,
> the sql_trace will be logged as NOTICE output.

After sleeping on this, I think we should follow your idea.

If its possible to do the sql_trace_* parameters as a single one, I
would prefer it, since it makes it more practical to use dynamically.
Not sure how...maybe with a wrapper function?

sql_trace(integer) sets just sql_trace_min_executor_duration
sql_trace(integer, boolean) sets executor and explain
sql_trace(integer, integer, boolean) sets all 3

I think you probably need to drop the sql_ off the front because of
parameter length only.

No need for the other log_... parameter though.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Auto-explain patch

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> On Wed, 2008-07-09 at 09:11 +0000, Dean Rasheed wrote:
>> 
>> So I suggest grouping these parameters in their own category
>> (eg. "sql_trace") and then having additional parameters to control
>> where the output would go. So the sql_trace parameters would be:
>> 
>> * sql_trace_min_planner_duration
>> * sql_trace_min_executor_duration
>> * sql_trace_explain_plan
>> 
> If its possible to do the sql_trace_* parameters as a single one, I
> would prefer it, since it makes it more practical to use dynamically.
> Not sure how...maybe with a wrapper function?
>
> sql_trace(integer) sets just sql_trace_min_executor_duration
> sql_trace(integer, boolean) sets executor and explain
> sql_trace(integer, integer, boolean) sets all 3

Fwiw it seems to me "trace_sql_*" would be nicer, much as we have
track_* guc parameters.

Though I also wonder if there's really any distinction here between "tracing"
and "logging" like log_explain_plan and so on. Perhaps we should keep the word
"trace" for a more in-detail facility.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!