Thread: auto_explain sample rate

auto_explain sample rate

From
Craig Ringer
Date:
Hi all

It's sometimes desirable to collect auto_explain data with ANALYZE in order to track down hard-to-reproduce issues, but the performance impacts can be pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger only on x percent of queries, and also add a sample test hook that can be used to target statements of interest more narrowly (using a C hook function).

Sound like a reasonable approach?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> It's sometimes desirable to collect auto_explain data with ANALYZE in order
> to track down hard-to-reproduce issues, but the performance impacts can be
> pretty hefty on the DB.

> I'm inclined to add a sample rate to auto_explain so that it can trigger
> only on x percent of queries,

That sounds reasonable ...

> and also add a sample test hook that can be
> used to target statements of interest more narrowly (using a C hook
> function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that.  Can't we invent something a bit more user-friendly
for the purpose?  No idea what it should look like though.
        regards, tom lane



Re: auto_explain sample rate

From
Craig Ringer
Date:
On 29 May 2015 at 11:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> It's sometimes desirable to collect auto_explain data with ANALYZE in order
> to track down hard-to-reproduce issues, but the performance impacts can be
> pretty hefty on the DB.

> I'm inclined to add a sample rate to auto_explain so that it can trigger
> only on x percent of queries,

That sounds reasonable ...

Cool, I'll cook that up then. Thanks for the sanity check.
 
> and also add a sample test hook that can be
> used to target statements of interest more narrowly (using a C hook
> function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that.  Can't we invent something a bit more user-friendly
for the purpose?  No idea what it should look like though.

I've been that desperate.

For the majority of users I'm sure it's sufficient to just have a sample rate. 

Anything that's trying to match individual queries could be interested in all sorts of different things. Queries that touch a particular table being one of the more obvious things, or queries that mention a particular literal. Rather than try to design something complicated in advance that anticipates all needs, I'm thinking it makes sense to just throw a hook in there. If some patterns start to emerge in terms of useful real world filtering criteria then that'd better inform any more user accessible design down the track.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Pavel Stehule
Date:


2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 29 May 2015 at 11:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> It's sometimes desirable to collect auto_explain data with ANALYZE in order
> to track down hard-to-reproduce issues, but the performance impacts can be
> pretty hefty on the DB.

> I'm inclined to add a sample rate to auto_explain so that it can trigger
> only on x percent of queries,

That sounds reasonable ...

Cool, I'll cook that up then. Thanks for the sanity check.
 
> and also add a sample test hook that can be
> used to target statements of interest more narrowly (using a C hook
> function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that.  Can't we invent something a bit more user-friendly
for the purpose?  No idea what it should look like though.

I've been that desperate.

For the majority of users I'm sure it's sufficient to just have a sample rate. 

Anything that's trying to match individual queries could be interested in all sorts of different things. Queries that touch a particular table being one of the more obvious things, or queries that mention a particular literal. Rather than try to design something complicated in advance that anticipates all needs, I'm thinking it makes sense to just throw a hook in there. If some patterns start to emerge in terms of useful real world filtering criteria then that'd better inform any more user accessible design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING has about 20%-30% overhead and usually we don't need a perfectly exact numbers

Regards

Pavel
 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Craig Ringer
Date:


On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a sample rate. 

Anything that's trying to match individual queries could be interested in all sorts of different things. Queries that touch a particular table being one of the more obvious things, or queries that mention a particular literal. Rather than try to design something complicated in advance that anticipates all needs, I'm thinking it makes sense to just throw a hook in there. If some patterns start to emerge in terms of useful real world filtering criteria then that'd better inform any more user accessible design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING has about 20%-30% overhead and usually we don't need a perfectly exact numbers

I don't understand what you are suggesting here.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Pavel Stehule
Date:


2015-06-03 9:17 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:


On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a sample rate. 

Anything that's trying to match individual queries could be interested in all sorts of different things. Queries that touch a particular table being one of the more obvious things, or queries that mention a particular literal. Rather than try to design something complicated in advance that anticipates all needs, I'm thinking it makes sense to just throw a hook in there. If some patterns start to emerge in terms of useful real world filtering criteria then that'd better inform any more user accessible design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING has about 20%-30% overhead and usually we don't need a perfectly exact numbers

I don't understand what you are suggesting here.

using some sampling for EXPLAIN ANALYZE statement

Pavel



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Craig Ringer
Date:

On 3 June 2015 at 15:22, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-06-03 9:17 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:


On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a sample rate. 

Anything that's trying to match individual queries could be interested in all sorts of different things. Queries that touch a particular table being one of the more obvious things, or queries that mention a particular literal. Rather than try to design something complicated in advance that anticipates all needs, I'm thinking it makes sense to just throw a hook in there. If some patterns start to emerge in terms of useful real world filtering criteria then that'd better inform any more user accessible design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING has about 20%-30% overhead and usually we don't need a perfectly exact numbers

I don't understand what you are suggesting here.

using some sampling for EXPLAIN ANALYZE statement


Do you mean that you'd like to be able to set a fraction of queries on which auto_explain does ANALYZE, so most of the time it just outputs an ordinary EXPLAIN?
 
Or maybe we're talking about different things re the original proposal? I don't see how this would work. If you run EXPLAIN ANALYZE interactively like you said above. You'd surely want it to report costs and timings, or whatever it is that you ask for, all the time. Not just some of the time based on some background setting.

Are you advocating a profiling-based approach for EXPLAIN ANALYZE timing where we sample which executor node we're under at regular intervals, instead of timing everything? Or suggesting a way to filter out sub-trees so you only get timing data on some sub-portion of a query?


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Pavel Stehule
Date:


2015-06-03 9:46 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 3 June 2015 at 15:22, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-06-03 9:17 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:


On 2 June 2015 at 15:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-06-02 9:07 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

For the majority of users I'm sure it's sufficient to just have a sample rate. 

Anything that's trying to match individual queries could be interested in all sorts of different things. Queries that touch a particular table being one of the more obvious things, or queries that mention a particular literal. Rather than try to design something complicated in advance that anticipates all needs, I'm thinking it makes sense to just throw a hook in there. If some patterns start to emerge in terms of useful real world filtering criteria then that'd better inform any more user accessible design down the track.

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING has about 20%-30% overhead and usually we don't need a perfectly exact numbers

I don't understand what you are suggesting here.

using some sampling for EXPLAIN ANALYZE statement


Do you mean that you'd like to be able to set a fraction of queries on which auto_explain does ANALYZE, so most of the time it just outputs an ordinary EXPLAIN?
 
Or maybe we're talking about different things re the original proposal? I don't see how this would work. If you run EXPLAIN ANALYZE interactively like you said above. You'd surely want it to report costs and timings, or whatever it is that you ask for, all the time. Not just some of the time based on some background setting.

Are you advocating a profiling-based approach for EXPLAIN ANALYZE timing where we sample which executor node we're under at regular intervals, instead of timing everything? Or suggesting a way to filter out sub-trees so you only get timing data on some sub-portion of a query?


lot of variants - I would to see cost and times for EXPLAIN ANALYZE every time - but the precision of time can be reduced to 1ms. It is question if we can significantly reduce the cost (or number of calls) of getting system time.

Pavel
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Craig Ringer
Date:

lot of variants - I would to see cost and times for EXPLAIN ANALYZE every time - but the precision of time can be reduced to 1ms. It is question if we can significantly reduce the cost (or number of calls) of getting system time.

Pavel
 

 
OK, so you're suggesting a sampling-based EXPLAIN.

That'd be interesting, but is totally unrelated to this work on auto_explain.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Craig Ringer
Date:


On 2 June 2015 at 15:07, Craig Ringer <craig@2ndquadrant.com> wrote:
On 29 May 2015 at 11:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> It's sometimes desirable to collect auto_explain data with ANALYZE in order
> to track down hard-to-reproduce issues, but the performance impacts can be
> pretty hefty on the DB.

> I'm inclined to add a sample rate to auto_explain so that it can trigger
> only on x percent of queries,

That sounds reasonable ...

Cool, I'll cook that up then. Thanks for the sanity check.

OK, here we go.

To make sure it doesn't trigger on all backends at once, and to ensure it doesn't rely on a shared point of contention in shmem, this sets up a counter with a random value on each backend start.

Because it needs to either always run both the Start and End hooks, or run neither, this doesn't count nested statements for sampling purposes. So if you run my_huge_plpgsql_function() then either all its statements will be explained or none of them will. This only applies if nested statement explain is enabled. It's possible to get around this by adding a separate nested statement counter that's reset at each top level End hook, but it doesn't seem worthwhile.

The sample rate has no effect on ANALYZE, which remains enabled or disabled for all queries. I don't see any point adding a separate sample rate control to ANALYZE only some sub-proportion of EXPLAINed statements.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Tra
Attachment

Re: auto_explain sample rate

From
Andres Freund
Date:
On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
> OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Andres



Re: auto_explain sample rate

From
Craig Ringer
Date:


On 3 June 2015 at 20:04, Andres Freund <andres@anarazel.de> wrote:
On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
> OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Could be, yeah. I was thinking about the cost of generating a random each time, but it's going to vanish in the noise compared to the rest of the costs in query execution.

---
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 03/06/2015 15:00, Craig Ringer wrote:
> 
> 
> On 3 June 2015 at 20:04, Andres Freund <andres@anarazel.de
> <mailto:andres@anarazel.de>> wrote:
> 
>     On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
>     > OK, here we go.
> 
>     Hm. Wouldn't random sampling be better than what you do? If your queries
>     have a pattern to them (e.g. you always issue the same 10 queries in
>     succession), this will possibly only show a subset of the queries.
> 
>     I think a formulation in fraction (i.e. a float between 0 and 1) will
>     also be easier to understand.
> 
> 
> Could be, yeah. I was thinking about the cost of generating a random
> each time, but it's going to vanish in the noise compared to the rest of
> the costs in query execution.
> 

Hello, I've just reviewed the patch.

I'm not sure if there's a consensus on the sample rate format.  FWIW, I
also think a fraction would be easier to understand.  Any news about
generating a random at each call to avoid the query pattern problem ?

The patch applies without error.  I wonder if there's any reason for
using pg_lrand48() instead of random(), as there's a port for random()
if the system lacks it.

After some quick checks, I found that auto_explain_sample_counter is
always initialized with the same value.  After some digging, it seems
that pg_lrand48() always returns the same values in the same order, at
least on my computer.  Have I missed something?

Otherwise, after replacing the pg_lrand48() call with a random(), it
works just fine.

> ---
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 05/07/2015 18:22, Julien Rouhaud wrote:
> On 03/06/2015 15:00, Craig Ringer wrote:
>>
>>
>> On 3 June 2015 at 20:04, Andres Freund <andres@anarazel.de
>> <mailto:andres@anarazel.de>> wrote:
>>
>>     On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
>>     > OK, here we go.
>>
>>     Hm. Wouldn't random sampling be better than what you do? If your queries
>>     have a pattern to them (e.g. you always issue the same 10 queries in
>>     succession), this will possibly only show a subset of the queries.
>>
>>     I think a formulation in fraction (i.e. a float between 0 and 1) will
>>     also be easier to understand.
>>
>>
>> Could be, yeah. I was thinking about the cost of generating a random
>> each time, but it's going to vanish in the noise compared to the rest of
>> the costs in query execution.
>>
>
> Hello, I've just reviewed the patch.
>
> I'm not sure if there's a consensus on the sample rate format.  FWIW, I
> also think a fraction would be easier to understand.  Any news about
> generating a random at each call to avoid the query pattern problem ?
>
> The patch applies without error.  I wonder if there's any reason for
> using pg_lrand48() instead of random(), as there's a port for random()
> if the system lacks it.
>
> After some quick checks, I found that auto_explain_sample_counter is
> always initialized with the same value.  After some digging, it seems
> that pg_lrand48() always returns the same values in the same order, at
> least on my computer.  Have I missed something?
>

Well, I obviously missed that pg_srand48() is only used if the system
lacks random/srandom, sorry for the noise.  So yes, random() must be
used instead of pg_lrand48().

I'm attaching a new version of the patch fixing this issue just in case.


> Otherwise, after replacing the pg_lrand48() call with a random(), it
> works just fine.
>
>> ---
>>  Craig Ringer                   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>
>


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment

Re: auto_explain sample rate

From
Craig Ringer
Date:
On 7 July 2015 at 21:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

> Well, I obviously missed that pg_srand48() is only used if the system
> lacks random/srandom, sorry for the noise.  So yes, random() must be
> used instead of pg_lrand48().
>
> I'm attaching a new version of the patch fixing this issue just in case.

Thanks for picking this up. I've been trying to find time to come back
to it but been swamped in priority work.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: auto_explain sample rate

From
Michael Paquier
Date:
On Fri, Jul 17, 2015 at 2:53 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 7 July 2015 at 21:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
>> Well, I obviously missed that pg_srand48() is only used if the system
>> lacks random/srandom, sorry for the noise.  So yes, random() must be
>> used instead of pg_lrand48().
>>
>> I'm attaching a new version of the patch fixing this issue just in case.
>
> Thanks for picking this up. I've been trying to find time to come back
> to it but been swamped in priority work.

For now I am marking that as returned with feedback.
-- 
Michael



Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 25/08/2015 14:45, Michael Paquier wrote:
> On Fri, Jul 17, 2015 at 2:53 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 7 July 2015 at 21:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>>
>>> Well, I obviously missed that pg_srand48() is only used if the system
>>> lacks random/srandom, sorry for the noise.  So yes, random() must be
>>> used instead of pg_lrand48().
>>>
>>> I'm attaching a new version of the patch fixing this issue just in case.
>>
>> Thanks for picking this up. I've been trying to find time to come back
>> to it but been swamped in priority work.
>
> For now I am marking that as returned with feedback.
>

PFA v3 of the patch, rebased on current head. It fixes the last issue
(sample a percentage of queries).

I'm adding it to the next commitfest.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment

Re: auto_explain sample rate

From
Alvaro Herrera
Date:
Julien Rouhaud wrote:

Hijacking this macro is just too obscure:

>  #define auto_explain_enabled() \
>      (auto_explain_log_min_duration >= 0 && \
> -     (nesting_level == 0 || auto_explain_log_nested_statements))
> +     (nesting_level == 0 || auto_explain_log_nested_statements) && \
> +     current_query_sampled)

because it then becomes hard to figure out that assigning to _sampled is
what makes the enabled() check pass or not depending on sampling:

> @@ -191,6 +211,14 @@ _PG_fini(void)
>  static void
>  explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
>  {
> +    /*
> +     * For ratio sampling, randomly choose top-level statement. Either
> +     * all nested statements will be explained or none will.
> +     */
> +    if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +        current_query_sampled = (random() < auto_explain_sample_ratio *
> +                MAX_RANDOM_VALUE);
> +
>      if (auto_explain_enabled())
>      {

I think it's better to keep the "enabled" macro unmodified, and just add
another conditional to the "if" test there.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 16/02/2016 22:51, Alvaro Herrera wrote:
> Julien Rouhaud wrote:
>
> Hijacking this macro is just too obscure:
>
>>  #define auto_explain_enabled() \
>>      (auto_explain_log_min_duration >= 0 && \
>> -     (nesting_level == 0 || auto_explain_log_nested_statements))
>> +     (nesting_level == 0 || auto_explain_log_nested_statements) && \
>> +     current_query_sampled)
>
> because it then becomes hard to figure out that assigning to _sampled is
> what makes the enabled() check pass or not depending on sampling:
>
>> @@ -191,6 +211,14 @@ _PG_fini(void)
>>  static void
>>  explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
>>  {
>> +    /*
>> +     * For ratio sampling, randomly choose top-level statement. Either
>> +     * all nested statements will be explained or none will.
>> +     */
>> +    if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
>> +        current_query_sampled = (random() < auto_explain_sample_ratio *
>> +                MAX_RANDOM_VALUE);
>> +
>>      if (auto_explain_enabled())
>>      {
>
> I think it's better to keep the "enabled" macro unmodified, and just add
> another conditional to the "if" test there.
>

Thanks for looking at this!

Agreed, it's too obscure. Attached v4 fixes as you said.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment

Re: auto_explain sample rate

From
Petr Jelinek
Date:
On 17/02/16 01:17, Julien Rouhaud wrote:
>
> Agreed, it's too obscure. Attached v4 fixes as you said.
>

Seems to be simple enough patch and works. However I would like 
documentation to say that the range is 0 to 1 and represents fraction of 
the queries sampled, because right now both the GUC description and the 
documentation say it's in percent but that's not really true as percent 
is 0 to 100.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 10/03/2016 04:37, Petr Jelinek wrote:
> On 17/02/16 01:17, Julien Rouhaud wrote:
>>
>> Agreed, it's too obscure. Attached v4 fixes as you said.
>>
>
> Seems to be simple enough patch and works. However I would like
> documentation to say that the range is 0 to 1 and represents fraction of
> the queries sampled, because right now both the GUC description and the
> documentation say it's in percent but that's not really true as percent
> is 0 to 100.
>

Agreed. v5 attached fixes that.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment

Re: auto_explain sample rate

From
Petr Jelinek
Date:
On 10/03/16 20:59, Julien Rouhaud wrote:
> On 10/03/2016 04:37, Petr Jelinek wrote:
>> On 17/02/16 01:17, Julien Rouhaud wrote:
>>>
>>> Agreed, it's too obscure. Attached v4 fixes as you said.
>>>
>>
>> Seems to be simple enough patch and works. However I would like
>> documentation to say that the range is 0 to 1 and represents fraction of
>> the queries sampled, because right now both the GUC description and the
>> documentation say it's in percent but that's not really true as percent
>> is 0 to 100.
>>
>
> Agreed. v5 attached fixes that.
>

Great, I will test it once more (just because when I don't bugs suddenly 
appear out of nowhere) and mark it ready for committer.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: auto_explain sample rate

From
Magnus Hagander
Date:


On Thu, Mar 10, 2016 at 10:07 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 10/03/16 20:59, Julien Rouhaud wrote:
On 10/03/2016 04:37, Petr Jelinek wrote:
On 17/02/16 01:17, Julien Rouhaud wrote:

Agreed, it's too obscure. Attached v4 fixes as you said.


Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.


Agreed. v5 attached fixes that.


Great, I will test it once more (just because when I don't bugs suddenly appear out of nowhere) and mark it ready for committer.


Coming back to the previous discussions about random() - AFAICT this patch will introduce the random() call always (in explain_ExecutorStart):

+ if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+ current_query_sampled = (random() < auto_explain_sample_ratio *
+ MAX_RANDOM_VALUE);


Not sure what the overhead is, but wouldn't it be better to include a check for current_query_sampled>0 in the  if part of that statement? Regardless of performance, that feels cleaner to me. Or am I missing something?


--

Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 11/03/2016 11:45, Magnus Hagander wrote:
> 
> Coming back to the previous discussions about random() - AFAICT this
> patch will introduce the random() call always (in explain_ExecutorStart):
> 
> +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +current_query_sampled = (random() < auto_explain_sample_ratio *
> +MAX_RANDOM_VALUE);
> 
> 
> Not sure what the overhead is, but wouldn't it be better to include a
> check for current_query_sampled>0 in the  if part of that statement?
> Regardless of performance, that feels cleaner to me. Or am I missing
> something?

You mean check for auto_explain_sample_ratio > 0 ?

There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: auto_explain sample rate

From
Magnus Hagander
Date:


On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
On 11/03/2016 11:45, Magnus Hagander wrote:
>
> Coming back to the previous discussions about random() - AFAICT this
> patch will introduce the random() call always (in explain_ExecutorStart):
>
> +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +current_query_sampled = (random() < auto_explain_sample_ratio *
> +MAX_RANDOM_VALUE);
>
>
> Not sure what the overhead is, but wouldn't it be better to include a
> check for current_query_sampled>0 in the  if part of that statement?
> Regardless of performance, that feels cleaner to me. Or am I missing
> something?

You mean check for auto_explain_sample_ratio > 0 ?

I did, but I think what I should have meant is auto_explain_sample_ratio < 1.

 
There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

There would have to be an else block as well of course, that set it back.
 
 
We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

I guess in the end it probably is - a general random() call is pretty cheap compared to all the things we're doing. I think my mind was stuck in crypto-random which can be more expensive :)


--

Re: auto_explain sample rate

From
Petr Jelinek
Date:
On 11/03/16 11:45, Magnus Hagander wrote:
> On Thu, Mar 10, 2016 at 10:07 PM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
>     On 10/03/16 20:59, Julien Rouhaud wrote:
>
>         On 10/03/2016 04:37, Petr Jelinek wrote:
>
>             On 17/02/16 01:17, Julien Rouhaud wrote:
>
>
>                 Agreed, it's too obscure. Attached v4 fixes as you said.
>
>
>             Seems to be simple enough patch and works. However I would like
>             documentation to say that the range is 0 to 1 and represents
>             fraction of
>             the queries sampled, because right now both the GUC
>             description and the
>             documentation say it's in percent but that's not really true
>             as percent
>             is 0 to 100.
>
>
>         Agreed. v5 attached fixes that.
>
>
>     Great, I will test it once more (just because when I don't bugs
>     suddenly appear out of nowhere) and mark it ready for committer.
>
>
> Coming back to the previous discussions about random() - AFAICT this
> patch will introduce the random() call always (in explain_ExecutorStart):
>
> +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +current_query_sampled = (random() < auto_explain_sample_ratio *
> +MAX_RANDOM_VALUE);
>

No it doesn't as the documented way to turn off auto_explain is to set 
auto_explain_log_min_duration to -1 which is also the default.

In any case the code for that would have to be something like
if (auto_explain_sample_ratio == 0)     current_query_sampled = false
else if <the original if>

Not sure if I consider that cleaner but it would definitely remove the 
call to random() in case user has set auto_explain_log_min_duration to 
something else than -1 and "turned off" the auto_explain by setting 
auto_explain_sample_ratio to 0.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: auto_explain sample rate

From
Magnus Hagander
Date:


On Fri, Mar 11, 2016 at 3:03 PM, Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
On 11/03/2016 11:45, Magnus Hagander wrote:
>
> Coming back to the previous discussions about random() - AFAICT this
> patch will introduce the random() call always (in explain_ExecutorStart):
>
> +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +current_query_sampled = (random() < auto_explain_sample_ratio *
> +MAX_RANDOM_VALUE);
>
>
> Not sure what the overhead is, but wouldn't it be better to include a
> check for current_query_sampled>0 in the  if part of that statement?
> Regardless of performance, that feels cleaner to me. Or am I missing
> something?

You mean check for auto_explain_sample_ratio > 0 ?

I did, but I think what I should have meant is auto_explain_sample_ratio < 1.

 
There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

There would have to be an else block as well of course, that set it back.
 
 
We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

I guess in the end it probably is - a general random() call is pretty cheap compared to all the things we're doing. I think my mind was stuck in crypto-random which can be more expensive :)


Applied with a minor word-fix in the documentation and removal of some unrelated whitespace changes. 

--

Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 11/03/2016 15:11, Magnus Hagander wrote:
> 
> 
> On Fri, Mar 11, 2016 at 3:03 PM, Magnus Hagander <magnus@hagander.net
> <mailto:magnus@hagander.net>> wrote:
> 
> 
> 
>     On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud
>     <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
> 
>         On 11/03/2016 11:45, Magnus Hagander wrote:
>         >
>         > Coming back to the previous discussions about random() - AFAICT this
>         > patch will introduce the random() call always (in explain_ExecutorStart):
>         >
>         > +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
>         > +current_query_sampled = (random() < auto_explain_sample_ratio *
>         > +MAX_RANDOM_VALUE);
>         >
>         >
>         > Not sure what the overhead is, but wouldn't it be better to include a
>         > check for current_query_sampled>0 in the  if part of that statement?
>         > Regardless of performance, that feels cleaner to me. Or am I missing
>         > something?
> 
>         You mean check for auto_explain_sample_ratio > 0 ?
> 
> 
>     I did, but I think what I should have meant is
>     auto_explain_sample_ratio < 1.
> 
>      
> 
>         There would be a corner case if a query is sampled
>         (current_query_sampled is true) and then
>         auto_explain_sample_ratio is
>         set to 0, all subsequent queries in this backend would be processed.
> 
> 
>     There would have to be an else block as well of course, that set it
>     back.
>      
>      
> 
>         We could add a specific test for this case to spare a random()
>         call, but
>         I fear it'd be overkill. Maybe it's better to document that the
>         good way
>         to deactivate auto_explain is to set
>         auto_explain.log_min_duration to -1.
> 
> 
>     I guess in the end it probably is - a general random() call is
>     pretty cheap compared to all the things we're doing. I think my mind
>     was stuck in crypto-random which can be more expensive :)
> 
> 
> Applied with a minor word-fix in the documentation and removal of some
> unrelated whitespace changes. 
> 

Thanks!

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: auto_explain sample rate

From
Tomas Vondra
Date:
Hi,

On Fri, 2016-03-11 at 15:11 +0100, Magnus Hagander wrote:
> 
> > 
> Applied with a minor word-fix in the documentation and removal of
> some unrelated whitespace changes. 

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: auto_explain sample rate

From
Robert Haas
Date:
On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> A bit late, but I think we should rename the GUC variable to
> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
> for the same thing. That'd be more consistent.

I like that idea.  It seems like slightly better terminology.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: auto_explain sample rate

From
Julien Rouhaud
Date:
On 11/03/2016 17:55, Robert Haas wrote:
> On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> A bit late, but I think we should rename the GUC variable to
>> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
>> for the same thing. That'd be more consistent.
>
> I like that idea.  It seems like slightly better terminology.
>

I like it too. I also just noticed that I duplicated the var type by
mistake in the documentation :/

Attached patch fixes both.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment

Re: auto_explain sample rate

From
Magnus Hagander
Date:


On Sat, Mar 12, 2016 at 3:20 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
On 11/03/2016 17:55, Robert Haas wrote:
> On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> A bit late, but I think we should rename the GUC variable to
>> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
>> for the same thing. That'd be more consistent.
>
> I like that idea.  It seems like slightly better terminology.
>

I like it too. I also just noticed that I duplicated the var type by
mistake in the documentation :/

Attached patch fixes both.


I also agree, and thanks for doing the work :) Applied! 

--