Thread: Add parameter jit_warn_above_fraction

Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
This patch adds a configuration parameter jit_warn_above_fraction that
will cause a warning to be logged if the fraction of time spent on
doing JIT is bigger than the specified one. For example, this can be
used to track down those cases where JIT ends up taking 90% of the
query runtime because of bad estimates...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: Add parameter jit_warn_above_fraction

From
Julien Rouhaud
Date:
Hi,

On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:
> This patch adds a configuration parameter jit_warn_above_fraction that
> will cause a warning to be logged if the fraction of time spent on
> doing JIT is bigger than the specified one. For example, this can be
> used to track down those cases where JIT ends up taking 90% of the
> query runtime because of bad estimates...

I think that's tremendously useful, huge +1.

Just a few minor nit:

+         A value of 0 (the default)disables the warning.

missing space

+            ereport(WARNING,
+                    (errmsg("JIT time was %ld ms of %d ms",
+                            jit_time, msecs)));

"JIT time" may a bit obscure for users, how about "JIT total processing time"?"

+            gettext_noop("Sets the fraction of query time spent on JIT before writing"
+                         "a warning to the log."),
+            gettext_noop("Write a message tot he server log if more than this"
+                         "fraction of the query runtime is spent on JIT."
+                         "Zero turns off the warning.")

missing spaces in the concatenated strings.

The rest of the patch looks good to me.



Re: Add parameter jit_warn_above_fraction

From
Andres Freund
Date:
Hi,

On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> This patch adds a configuration parameter jit_warn_above_fraction that
> will cause a warning to be logged if the fraction of time spent on
> doing JIT is bigger than the specified one. For example, this can be
> used to track down those cases where JIT ends up taking 90% of the
> query runtime because of bad estimates...

Hm. Could it make sense to do this as a auto_explain feature?

Greetings,

Andres Freund



Re: Add parameter jit_warn_above_fraction

From
Justin Pryzby
Date:
On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:
> +    {
> +        {"jit_warn_above_fraction", PGC_SUSET, LOGGING_WHEN,
> +            gettext_noop("Sets the fraction of query time spent on JIT before writing"
> +                         "a warning to the log."),
> +            gettext_noop("Write a message tot he server log if more than this"
> +                         "fraction of the query runtime is spent on JIT."
> +                         "Zero turns off the warning.")
> +        },
> +        &jit_warn_above_fraction,
> +        0.0, 0.0, 1.0,
> +        NULL, NULL, NULL
> +    },

Should be PGC_USERSET ?

+                       gettext_noop("Write a message tot he server log if more than this"
                                                                                                          
 

to the

+       if (jit_enabled && jit_warn_above_fraction > 0)
                                                                                                          
 
+       {
                                                                                                          
 
+               int64 jit_time =
                                                                                                          
 
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter) +
                                                                                                          
 
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter) +
                                                                                                          
 
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter) +
                                                                                                          
 
+                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);
                                                                                                          
 
+
                                                                                                          
 
+               if (jit_time > msecs * jit_warn_above_fraction)
                                                                                                          
 
+               {
                                                                                                          
 
+                       ereport(WARNING,
                                                                                                          
 
+                                       (errmsg("JIT time was %ld ms of %d ms",
                                                                                                          
 
+                                                       jit_time, msecs)));
                                                                                                          
 
+               }
                                                                                                          
 
+       }
                                                                                                          
 


I think it should be a NOTICE (or less?)

Is it more useful if this is applied combined with log_min_duration_statement ?

It's easy to imagine a query for which the planner computes a high cost, but
actually runs quickly.  You might get a bunch of WARNINGs that the query took
10 MS and JIT was 75% of that, even if you don't care about queries that take
less than 10 SEC.

I should say that this is already available by processing the output of
autoexplain.

-- 
Justin



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
On Fri, Feb 25, 2022 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> > This patch adds a configuration parameter jit_warn_above_fraction that
> > will cause a warning to be logged if the fraction of time spent on
> > doing JIT is bigger than the specified one. For example, this can be
> > used to track down those cases where JIT ends up taking 90% of the
> > query runtime because of bad estimates...
>
> Hm. Could it make sense to do this as a auto_explain feature?

It could be. But I was looking for something a lot more "light weight"
than having to install an extension. But yes, if we wanted to, we
could certainly change jit_warn_above_fraction to be
auto_explain.log_min_jit_fraction or something like that, and do
basically the same thing. But then, we could also have
log_min_duration_statement be part of auto_explain instead, so it's
all about where to draw the line :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Add parameter jit_warn_above_fraction

From
Andres Freund
Date:
Hi,

On 2022-02-25 17:28:41 +0100, Magnus Hagander wrote:
> On Fri, Feb 25, 2022 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> > > This patch adds a configuration parameter jit_warn_above_fraction that
> > > will cause a warning to be logged if the fraction of time spent on
> > > doing JIT is bigger than the specified one. For example, this can be
> > > used to track down those cases where JIT ends up taking 90% of the
> > > query runtime because of bad estimates...
> >
> > Hm. Could it make sense to do this as a auto_explain feature?
> 
> It could be. But I was looking for something a lot more "light weight"
> than having to install an extension. But yes, if we wanted to, we
> could certainly change jit_warn_above_fraction to be
> auto_explain.log_min_jit_fraction or something like that, and do
> basically the same thing.  But then, we could also have
> log_min_duration_statement be part of auto_explain instead, so it's
> all about where to draw the line :)

I guess it feels a tad on the "too narrow/specific" side of things for the
general code. We don't have log_min_duration_{parsing,planning,execution}
either.  But I also get it. So I just wanted to raise it ;)

Greetings,

Andres Freund



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
On Fri, Feb 25, 2022 at 5:47 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-25 17:28:41 +0100, Magnus Hagander wrote:
> > On Fri, Feb 25, 2022 at 5:20 PM Andres Freund <andres@anarazel.de> wrote:
> > > On 2022-02-25 16:16:01 +0100, Magnus Hagander wrote:
> > > > This patch adds a configuration parameter jit_warn_above_fraction that
> > > > will cause a warning to be logged if the fraction of time spent on
> > > > doing JIT is bigger than the specified one. For example, this can be
> > > > used to track down those cases where JIT ends up taking 90% of the
> > > > query runtime because of bad estimates...
> > >
> > > Hm. Could it make sense to do this as a auto_explain feature?
> >
> > It could be. But I was looking for something a lot more "light weight"
> > than having to install an extension. But yes, if we wanted to, we
> > could certainly change jit_warn_above_fraction to be
> > auto_explain.log_min_jit_fraction or something like that, and do
> > basically the same thing.  But then, we could also have
> > log_min_duration_statement be part of auto_explain instead, so it's
> > all about where to draw the line :)
>
> I guess it feels a tad on the "too narrow/specific" side of things for the
> general code. We don't have log_min_duration_{parsing,planning,execution}
> either.  But I also get it. So I just wanted to raise it ;)

Oh it's absolutely a valid issue to raise :) In fact, one could
definitely argue that we should have a parameter for auto_explain *as
well*.

It's true we don't have those -- but it's also in my experience a lot
more common to be an issue with JIT. And unfortunately the current
"solution" people tend to apply is to globally disable JIT, and
there's no similar "globally disable parsing or "globally disable
planning" pattern, for obvious reasons.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:
> > +     {
> > +             {"jit_warn_above_fraction", PGC_SUSET, LOGGING_WHEN,
> > +                     gettext_noop("Sets the fraction of query time spent on JIT before writing"
> > +                                              "a warning to the log."),
> > +                     gettext_noop("Write a message tot he server log if more than this"
> > +                                              "fraction of the query runtime is spent on JIT."
> > +                                              "Zero turns off the warning.")
> > +             },
> > +             &jit_warn_above_fraction,
> > +             0.0, 0.0, 1.0,
> > +             NULL, NULL, NULL
> > +     },
>
> Should be PGC_USERSET ?

Yes. Definitely. Copy/paste thinko.


> +                       gettext_noop("Write a message tot he server log if more than this"
>
> to the

Yup.


> +       if (jit_enabled && jit_warn_above_fraction > 0)
> +       {
> +               int64 jit_time =
> +                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter) +
> +                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter) +
> +                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter) +
> +                       INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);
> +
> +               if (jit_time > msecs * jit_warn_above_fraction)
> +               {
> +                       ereport(WARNING,
> +                                       (errmsg("JIT time was %ld ms of %d ms",
> +                                                       jit_time, msecs)));
> +               }
> +       }
>
>
> I think it should be a NOTICE (or less?)

Hmm. I'm not so sure.

Other similar parameters often use LOG, but the downside of that is
that it won't be sent to the client.

The problem with using NOTICE is that it won't go to the log by
default. It needs to be at least warning to do that.


> Is it more useful if this is applied combined with log_min_duration_statement ?
>
> It's easy to imagine a query for which the planner computes a high cost, but
> actually runs quickly.  You might get a bunch of WARNINGs that the query took
> 10 MS and JIT was 75% of that, even if you don't care about queries that take
> less than 10 SEC.

Yeah, that's definitely a problem. But which way would you want to tie
it to log_min_duration_statement?  That a statement would both have to
take longer than log_min_duration_statement *and* have JIT above a
certain percentage? In my experience that is instead likely to miss
most of the interesting times. Maybe it would need a separate guc for
the timing, but I'd like to avoid that, I'm not sure it's a function
worth *that* much...


> I should say that this is already available by processing the output of
> autoexplain.

True. You just can't trigger it based on it. (and it can be somewhat
of a PITA to parse things out of auto_explain on busy systems, but it
does give a lot of very useful details)


Meanwhile here is an updated based on your other comments above, as
well as those from Julien.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: Add parameter jit_warn_above_fraction

From
Justin Pryzby
Date:
On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:
> On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > I think it should be a NOTICE (or less?)
> 
> Hmm. I'm not so sure.
> 
> Other similar parameters often use LOG, but the downside of that is
> that it won't be sent to the client.
> 
> The problem with using NOTICE is that it won't go to the log by
> default. It needs to be at least warning to do that.

Anyone who uses this parameter is aleady going to be changing GUCs, so it
doesn't need to work by default.  The people who are likely to enable this
already monitor their logs and have probably customized their logging
configuration.

> > Is it more useful if this is applied combined with log_min_duration_statement ?
> >
> > It's easy to imagine a query for which the planner computes a high cost, but
> > actually runs quickly.  You might get a bunch of WARNINGs that the query took
> > 10 MS and JIT was 75% of that, even if you don't care about queries that take
> > less than 10 SEC.
> 
> Yeah, that's definitely a problem. But which way would you want to tie
> it to log_min_duration_statement?  That a statement would both have to
> take longer than log_min_duration_statement *and* have JIT above a
> certain percentage? In my experience that is instead likely to miss
> most of the interesting times.

I don't understand - why doesn't it combine trivially with
log_min_duration_statement?  Are you saying that the default / pre-existing min
duration may not log all of the intended queries ?  I think that just means the
configuration needs to be changed.  The GUC needs to allow/help finding these
JIT issues, but going to require an admin's interaction in any case.  Avoiding
false positives may be important for it to be useful at all.

Hmm .. should it also combine with min_sample_rate ?  Maybe that addresses your
concern.

-- 
Justin



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
On Mon, Mar 7, 2022 at 2:09 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:
> > On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > I think it should be a NOTICE (or less?)
> >
> > Hmm. I'm not so sure.
> >
> > Other similar parameters often use LOG, but the downside of that is
> > that it won't be sent to the client.
> >
> > The problem with using NOTICE is that it won't go to the log by
> > default. It needs to be at least warning to do that.
>
> Anyone who uses this parameter is aleady going to be changing GUCs, so it
> doesn't need to work by default.  The people who are likely to enable this
> already monitor their logs and have probably customized their logging
> configuration.

Sure, but if you set log_min_messagse to NOTICE you are likely going
to flood your logs beyond usefulness. And the whole idea of this
parameter is you can keep it on during "normal times" to catch
outliers.


> > > Is it more useful if this is applied combined with log_min_duration_statement ?
> > >
> > > It's easy to imagine a query for which the planner computes a high cost, but
> > > actually runs quickly.  You might get a bunch of WARNINGs that the query took
> > > 10 MS and JIT was 75% of that, even if you don't care about queries that take
> > > less than 10 SEC.
> >
> > Yeah, that's definitely a problem. But which way would you want to tie
> > it to log_min_duration_statement?  That a statement would both have to
> > take longer than log_min_duration_statement *and* have JIT above a
> > certain percentage? In my experience that is instead likely to miss
> > most of the interesting times.
>
> I don't understand - why doesn't it combine trivially with
> log_min_duration_statement?  Are you saying that the default / pre-existing min
> duration may not log all of the intended queries ?  I think that just means the
> configuration needs to be changed.  The GUC needs to allow/help finding these
> JIT issues, but going to require an admin's interaction in any case.  Avoiding
> false positives may be important for it to be useful at all.
>
> Hmm .. should it also combine with min_sample_rate ?  Maybe that addresses your
> concern.

For one, what if I don't want to log any queries unless this is the
case? I leave log_min_duration_statement at -1...
Or what if I want to log say only queries taking more than 60 seconds.
Now I will miss all queries taking 30 seconds where 99.9% of the time
is spent on JIT which I definitely *would* want.

If all I wanted was to get the JIT information for all queries over a
certain amount of time, then as someone else mentioned as well, I can
just use auto_explain. The point here is I want to trigger the logging
for queries where the runtime is *shorter* than what I have in
log_min_duration_statement, but a large portion of the runtime is
spent on JIT.

I maen. If I have a bunch of queries that take 10ms and 75% is spent
in JIT (per your example above), that is actually a problem and I'd
like to know about it and fix it (by tuning the jit triggers either
globally or locally).  Yes, at 10ms I *may* ignore it, and in general
I'd use pg_stat_statements for that. But at 100ms with 75% JIT it
starts becoming exactly the situation that I intended this patch to
help with.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Add parameter jit_warn_above_fraction

From
Justin Pryzby
Date:
On Mon, Mar 07, 2022 at 04:02:16PM +0100, Magnus Hagander wrote:
> On Mon, Mar 7, 2022 at 2:09 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:
> > > On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > > I think it should be a NOTICE (or less?)
> > >
> > > Hmm. I'm not so sure.
> > >
> > > Other similar parameters often use LOG, but the downside of that is
> > > that it won't be sent to the client.
> > >
> > > The problem with using NOTICE is that it won't go to the log by
> > > default. It needs to be at least warning to do that.
> >
> > Anyone who uses this parameter is aleady going to be changing GUCs, so it
> > doesn't need to work by default.  The people who are likely to enable this
> > already monitor their logs and have probably customized their logging
> > configuration.
> 
> Sure, but if you set log_min_messagse to NOTICE you are likely going
> to flood your logs beyond usefulness. And the whole idea of this
> parameter is you can keep it on during "normal times" to catch
> outliers.

I do set log_min_messages - not to NOTICE, but to INFO ;)

Would NOTICEs really flood most people's logs ?  From what ?  They're aren't
that many, and most seem to be for DDL and utility commands.  We do, uh, more
of that than most people would say is reasonable.

I see a couple hundred of these per day:
| PostGIS: Unable to compute statistics for...
And during deployment, a few hundred more for IF NOT EXIST commands.
That's it.

I still think this GUC should not be WARNING.  If it's a warning, then *this*
will cause previously-quiet logs to be flooded - the opposite problem.  Which
is not desirable for a new guc.

https://www.postgresql.org/docs/current/runtime-config-logging.html

INFO     Provides information implicitly requested by the user, e.g., output from VACUUM VERBOSE.     INFO
INFORMATION
NOTICE     Provides information that might be helpful to users, e.g., notice of truncation of long identifiers.
NOTICE    INFORMATION
 
WARNING Provides warnings of likely problems, e.g., COMMIT outside a transaction block.     NOTICE     WARNING

> > I don't understand - why doesn't it combine trivially with
> > log_min_duration_statement?  Are you saying that the default / pre-existing min
> > duration may not log all of the intended queries ?  I think that just means the
> > configuration needs to be changed.  The GUC needs to allow/help finding these
> > JIT issues, but going to require an admin's interaction in any case.  Avoiding
> > false positives may be important for it to be useful at all.
> >
> > Hmm .. should it also combine with min_sample_rate ?  Maybe that addresses your
> > concern.
> 
> For one, what if I don't want to log any queries unless this is the
> case? I leave log_min_duration_statement at -1...

Then you will conclude that our logging is inadequate for your case.
You need to filter the lines which don't interest you.

> Or what if I want to log say only queries taking more than 60 seconds.
> Now I will miss all queries taking 30 seconds where 99.9% of the time
> is spent on JIT which I definitely *would* want.

If you're unwilling to deal with log entries which are uninteresting, then
clearly you'll need a separate GUC just for log_min_duration_jit_above_fraction
(and maybe another one for jit_above_duration_log_level).  Or implement generic
log filtering based on file/severity/message.

As I see it: most people won't set this GUC at all.  Those who do might enable
it with a high threshold value of log_min_duration_statement (probably a
pre-existing, configured value) to see what there is to see - the worst issues.

A small fraction of people will enable it with a low threshold value for
log_min_duration_statement - maybe only temporarily or per-session.  They'll be
willing to sift through the logs to look for interesting things, like queries
that take 10ms, 75% of which is in JIT, but run hundreds of times per second.

This feature intends to make it easier to identify queries affected by this
problem, but it doesn't need to be possible to do that without logging anything
else.

-- 
Justin



Re: Add parameter jit_warn_above_fraction

From
Andres Freund
Date:
Hi,

On 2022-03-07 13:10:32 +0100, Magnus Hagander wrote:
> Meanwhile here is an updated based on your other comments above, as
> well as those from Julien.

This fails on cfbot, due to compiler warnings:
https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390

Greetings,

Andres Freund



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
On Tue, Mar 22, 2022 at 12:50 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2022-03-07 13:10:32 +0100, Magnus Hagander wrote:
> Meanwhile here is an updated based on your other comments above, as
> well as those from Julien.

This fails on cfbot, due to compiler warnings:
https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390

Huh. That's annoying. I forgot int64 is %d on linux and %lld on Windows :/  

PFA a fix for that.

--
Attachment

Re: Add parameter jit_warn_above_fraction

From
Julien Rouhaud
Date:
Hi,

On Mon, Mar 28, 2022 at 10:11:16PM +0200, Magnus Hagander wrote:
> On Tue, Mar 22, 2022 at 12:50 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > This fails on cfbot, due to compiler warnings:
> > https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390
>
>
> Huh. That's annoying. I forgot int64 is %d on linux and %lld on Windows :/
>
> PFA a fix for that.

The v3 is now stable on all platforms according to the cfbot, and apart from
the log level the patch looks good to me.

The last remaining thing is whether logging at WARNING level is the correct
choice.  I'm personally fine with it, because the people who are going to use
it will probably use the same approach as for log_min_duration_statements:
enable it first with a high value, and check if that just lead to a massive log
spam.  If not, see if there's anything that needs attention and fix it,
otherwise keep lowering it and keep going.

Also, jitting isn't particularly fast in general, so it's probably somewhat
unlikely to have a massive amount of queries that both have a cost higher than
the jit threshold cost and still run fast enough to spam the log without jit
slowing everything down.

I'm not sure what status the patch should be.  I think it's the committer's
decision to chose which log level to use, and that won't affect the patch
enough to justify another review, so I'm switching the patch to ready for
committer.  This doesn't prevent any further discussions for the log level.



Re: Add parameter jit_warn_above_fraction

From
Robert Haas
Date:
On Tue, Mar 29, 2022 at 6:09 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> The last remaining thing is whether logging at WARNING level is the correct
> choice.  I'm personally fine with it, because the people who are going to use
> it will probably use the same approach as for log_min_duration_statements:
> enable it first with a high value, and check if that just lead to a massive log
> spam.  If not, see if there's anything that needs attention and fix it,
> otherwise keep lowering it and keep going.

I think WARNING is fine. After all, the parameter is called
"jit_warn_above_fraction". Yeah, we could also rename the parameter,
but I think "jit_notice_above_fraction" would be harder to understand.
It feels very intuitive to just say "warn me if X thing happens" and I
don't see a reason why we shouldn't just do that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add parameter jit_warn_above_fraction

From
David Rowley
Date:
On Wed, 30 Mar 2022 at 02:38, Robert Haas <robertmhaas@gmail.com> wrote:
> I think WARNING is fine. After all, the parameter is called
> "jit_warn_above_fraction".

I had a think about this patch.  I guess it's a little similar to
checkpoint_warning. The good thing about the checkpoint_warning is
that in the LOG message we give instructions about how the DBA can fix
the issue, i.e increase max_wal_size.

With the proposed patch I see there is no hint about what might be
done to remove/reduce the warnings.  I imagine that's because it's not
all that clear which GUC should be changed. In my view, likely
jit_above_cost is the most relevant but there is also
jit_inline_above_cost, jit_optimize_above_cost, jit_tuple_deforming
and jit_expressions which are relevant too.

If we go with this patch,  the problem I see here is that the amount
of work the JIT compiler must do for a given query depends mostly on
the number of expressions that must be compiled in the query (also to
a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
jit_tuple_deforming and jit_expressions). The DBA does not really have
much control over the number of expressions in the query.  All he or
she can do to get rid of the warning is something like increase
jit_above_cost.  After a few iterations of that, the end result is
that jit_above_cost is now high enough that JIT no longer triggers
for, say, that query to that table with 1000 partitions where no
plan-time pruning takes place.  Is that really a good thing? It likely
means that we just rarely JIT anything at all!

I really believe that the main problem here is that JIT only enables
when the *total* plan cost reaches a certain threshold.  The number of
expressions to be compiled is not a factor in the decision at all.
That means that even if the total execution time of a plan was a true
reflection of the total estimated plan cost, then the fraction of time
spent (as is measured by jit_warn_above_fraction) doing JIT would
entirely depend on the number of expressions to compile.   Of course,
the planner's not that good, but does that not indicate that the JIT
costing should really account for the number of expressions and not
just the total plan cost?

Anyway, what I'm trying to indicate here is that JIT is pretty much
impossible to tune properly and I don't really see why adding a
warning about it not being tuned correctly would help anyone.  I think
it would be better to focus on making improvements to how the JIT
costing works.

I did propose a patch to address this in [1]. It does need more work
and I do plan to come back to it for v16.

I'd much rather see us address the costing problem before adding some
warning, especially a warning where it's not clear how to make go
away.

David

[1] https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C+ksKFpSdZg=q6sTbtQ-v=aw@mail.gmail.com



Re: Add parameter jit_warn_above_fraction

From
Peter Geoghegan
Date:
On Tue, Mar 29, 2022 at 1:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
> That means that even if the total execution time of a plan was a true
> reflection of the total estimated plan cost, then the fraction of time
> spent (as is measured by jit_warn_above_fraction) doing JIT would
> entirely depend on the number of expressions to compile.   Of course,
> the planner's not that good, but does that not indicate that the JIT
> costing should really account for the number of expressions and not
> just the total plan cost?

That's a good point. The difference between the actual cost of
executing the query with and without JIT'ing is what we care about,
for the most part. Maybe we could do a lot better just by inventing a
fairly crude model that captures the benefits of JIT'ing -- that's
what "counting the number of expressions" sounds like to me. This
model could probably assume that JIT'ing itself was free -- maybe
something this simple would work well.

The planner has traditionally used the cost units to determine the
cheapest plan; it compared total plan cost for plans that were taken
from the universe of possible plans for *one specific query*. That's
completely different to any model that expects plan costs to be
meaningful in an absolute sense. I'm not completely sure how much that
difference matters, but I suspect that the answer is: "it depends, but
often it matters a great deal".

-- 
Peter Geoghegan



Re: Add parameter jit_warn_above_fraction

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Tue, Mar 29, 2022 at 1:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> That means that even if the total execution time of a plan was a true
>> reflection of the total estimated plan cost, then the fraction of time
>> spent (as is measured by jit_warn_above_fraction) doing JIT would
>> entirely depend on the number of expressions to compile.   Of course,
>> the planner's not that good, but does that not indicate that the JIT
>> costing should really account for the number of expressions and not
>> just the total plan cost?

> That's a good point.

I think David's questions are sufficiently cogent and difficult
that we should not add jit_warn_above_fraction at this time.
Maybe we'll eventually decide it's the best we can do; but I do
not think we have a problem there that's so pressing that we need
to rush out a partially-baked bandaid.  Especially not at the
tail end of the development cycle, with not much time to gain
experience with it before we ship.

            regards, tom lane



Re: Add parameter jit_warn_above_fraction

From
Peter Geoghegan
Date:
On Tue, Mar 29, 2022 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think David's questions are sufficiently cogent and difficult
> that we should not add jit_warn_above_fraction at this time.

+1

-- 
Peter Geoghegan



Re: Add parameter jit_warn_above_fraction

From
Andres Freund
Date:
Hi,

On 2022-03-30 09:05:44 +1300, David Rowley wrote:
> I really believe that the main problem here is that JIT only enables
> when the *total* plan cost reaches a certain threshold.

Yes, that is/was a clear design mistake. It wasn't quite as bad back when it
was written - partitioning blows the problem up by an order of magnitude or
three. . The costing really needs to at least multiply the number of
to-be-compiled expressions with some cost to decide whether the cost of JITing
is worth it, rather than making "flat" decision.


> I did propose a patch to address this in [1]. It does need more work
> and I do plan to come back to it for v16.

FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
side of a nested loop, just because it's cheap, even though that's where the
bulk of the benefits comes from?

Greetings,

Andres Freund



Re: Add parameter jit_warn_above_fraction

From
David Rowley
Date:
On Wed, 30 Mar 2022 at 12:16, Andres Freund <andres@anarazel.de> wrote:
> > I did propose a patch to address this in [1]. It does need more work
> > and I do plan to come back to it for v16.
>
> FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
> side of a nested loop, just because it's cheap, even though that's where the
> bulk of the benefits comes from?

Yeah, I think the total cost would need to be multiplied by the number
of times we expect to call that part of the plan.  I've not yet sat
down to figure out if that's easy/cheap or hard/costly information to
obtain.

David



Re: Add parameter jit_warn_above_fraction

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 30 Mar 2022 at 12:16, Andres Freund <andres@anarazel.de> wrote:
>> FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
>> side of a nested loop, just because it's cheap, even though that's where the
>> bulk of the benefits comes from?

> Yeah, I think the total cost would need to be multiplied by the number
> of times we expect to call that part of the plan.  I've not yet sat
> down to figure out if that's easy/cheap or hard/costly information to
> obtain.

As long as you don't need the info till the end of planning, it should be
reasonably simple to determine.  I'm not sure if we actually record the
expected number of loops in the plan tree today, but the costing
mechanisms certainly estimate that along the way, so we could store it
if need be.

            regards, tom lane



Re: Add parameter jit_warn_above_fraction

From
Andres Freund
Date:
Hi,

On 2022-03-30 12:41:41 +1300, David Rowley wrote:
> On Wed, 30 Mar 2022 at 12:16, Andres Freund <andres@anarazel.de> wrote:
> > > I did propose a patch to address this in [1]. It does need more work
> > > and I do plan to come back to it for v16.
> >
> > FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
> > side of a nested loop, just because it's cheap, even though that's where the
> > bulk of the benefits comes from?
>
> Yeah, I think the total cost would need to be multiplied by the number
> of times we expect to call that part of the plan.  I've not yet sat
> down to figure out if that's easy/cheap or hard/costly information to
> obtain.

I wonder whether it'd make sense to combine that with awareness of a few plan
types that can lead to large portions of child nodes never being executed. One
the case where the current behaviour is the worst is runtime partition pruning
in append - we compile expressions for whole subtrees that will never be
executed. We should be much more hesitant to compile there compared to a
cheap-ish node that we know will be executed as part of a large expensive part
of the plan tree.

- Andres



Re: Add parameter jit_warn_above_fraction

From
David Rowley
Date:
On Wed, 30 Mar 2022 at 13:20, Andres Freund <andres@anarazel.de> wrote:
> I wonder whether it'd make sense to combine that with awareness of a few plan
> types that can lead to large portions of child nodes never being executed. One
> the case where the current behaviour is the worst is runtime partition pruning
> in append - we compile expressions for whole subtrees that will never be
> executed. We should be much more hesitant to compile there compared to a
> cheap-ish node that we know will be executed as part of a large expensive part
> of the plan tree.

I think that's also a problem but I think that might be better fixed
another way.

There is a patch [1] around that seems to change things to compile JIT
on-demand.  I've not looked at the patch but imagine the overhead
might be kept minimal by initially setting the evalfunc to compile and
run, then set it to just run the compiled Expr for subsequent
executions. Maybe nodes below an Append/MergeAppend with run-time
pruning could compile on-demand and other nodes up-front.  Or maybe
there's no problem with making everything on-demand.

David

[1] https://commitfest.postgresql.org/37/3071/



Re: Add parameter jit_warn_above_fraction

From
Andres Freund
Date:
Hi,

On 2022-03-30 14:30:32 +1300, David Rowley wrote:
> On Wed, 30 Mar 2022 at 13:20, Andres Freund <andres@anarazel.de> wrote:
> > I wonder whether it'd make sense to combine that with awareness of a few plan
> > types that can lead to large portions of child nodes never being executed. One
> > the case where the current behaviour is the worst is runtime partition pruning
> > in append - we compile expressions for whole subtrees that will never be
> > executed. We should be much more hesitant to compile there compared to a
> > cheap-ish node that we know will be executed as part of a large expensive part
> > of the plan tree.
> 
> I think that's also a problem but I think that might be better fixed
> another way.
> 
> There is a patch [1] around that seems to change things to compile JIT
> on-demand.

That's a bad idea idea to do on a per-function basis. Emitting functions one
by one is considerably slower than doing so in larger chunks. Of course that
can be addressed to some degree by something like what you suggest below.


> I've not looked at the patch but imagine the overhead might be kept minimal
> by initially setting the evalfunc to compile and run, then set it to just
> run the compiled Expr for subsequent executions.

That part I'm not worried about, there's such an indirection on the first call
either way IIRC.


> Maybe nodes below an Append/MergeAppend with run-time pruning could compile
> on-demand and other nodes up-front.  Or maybe there's no problem with making
> everything on-demand.

Yea, that could work. The expressions for one "partition query" would still
have to be emitted at once. For each such subtree we should make a separate
costing decision. But I think an additional "will be executed" sub-node is a
different story, the threshold shouldn't be done on a per-node basis.  That
partitioning of the plan tree is kind of what I was trying to get at...

Greetings,

Andres Freund



Re: Add parameter jit_warn_above_fraction

From
David Rowley
Date:
On Wed, 30 Mar 2022 at 14:48, Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-03-30 14:30:32 +1300, David Rowley wrote:
> > Maybe nodes below an Append/MergeAppend with run-time pruning could compile
> > on-demand and other nodes up-front.  Or maybe there's no problem with making
> > everything on-demand.
>
> Yea, that could work. The expressions for one "partition query" would still
> have to be emitted at once. For each such subtree we should make a separate
> costing decision. But I think an additional "will be executed" sub-node is a
> different story, the threshold shouldn't be done on a per-node basis.  That
> partitioning of the plan tree is kind of what I was trying to get at...

Maybe this point is moot if we get something like [1]. It might mean
that run-time pruning would happen early enough that we could just JIT
compile non-pruned subnodes.

David

[1] https://commitfest.postgresql.org/37/3478/



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:
On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 30 Mar 2022 at 02:38, Robert Haas <robertmhaas@gmail.com> wrote:
> I think WARNING is fine. After all, the parameter is called
> "jit_warn_above_fraction".

I had a think about this patch.  I guess it's a little similar to
checkpoint_warning. The good thing about the checkpoint_warning is
that in the LOG message we give instructions about how the DBA can fix
the issue, i.e increase max_wal_size.

With the proposed patch I see there is no hint about what might be
done to remove/reduce the warnings.  I imagine that's because it's not
all that clear which GUC should be changed. In my view, likely
jit_above_cost is the most relevant but there is also
jit_inline_above_cost, jit_optimize_above_cost, jit_tuple_deforming
and jit_expressions which are relevant too.

If we go with this patch,  the problem I see here is that the amount
of work the JIT compiler must do for a given query depends mostly on
the number of expressions that must be compiled in the query (also to
a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
jit_tuple_deforming and jit_expressions). The DBA does not really have
much control over the number of expressions in the query.  All he or
she can do to get rid of the warning is something like increase
jit_above_cost.  After a few iterations of that, the end result is
that jit_above_cost is now high enough that JIT no longer triggers
for, say, that query to that table with 1000 partitions where no
plan-time pruning takes place.  Is that really a good thing? It likely
means that we just rarely JIT anything at all!

I don't agree with the conclusion of that.

What the parameter would be useful for is to be able to tune those costs (or just turn it off) *for that individual query*. That doesn't mean you "rarely JIT anything atll", it just means you rarely JIT that particular query.

In fact, my goal is to specifically make people do that and *not* just turn off JIT globally.


I'd much rather see us address the costing problem before adding some
warning, especially a warning where it's not clear how to make go
away.

The easiest way would be to add a HINT that says turn off jit for this particular query or something?

I do agree that if we can make  "spending too much time on JIT vs query runtime" go away completely, then there is no need for a parameter like this. 

I still think the warning is useful. And I think it may stay useful even after we have made the JIT costing smarter -- though that's not certain of course. 

--

Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:


On Wed, Mar 30, 2022 at 3:04 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 30 Mar 2022 at 02:38, Robert Haas <robertmhaas@gmail.com> wrote:
> I think WARNING is fine. After all, the parameter is called
> "jit_warn_above_fraction".

I had a think about this patch.  I guess it's a little similar to
checkpoint_warning. The good thing about the checkpoint_warning is
that in the LOG message we give instructions about how the DBA can fix
the issue, i.e increase max_wal_size.

With the proposed patch I see there is no hint about what might be
done to remove/reduce the warnings.  I imagine that's because it's not
all that clear which GUC should be changed. In my view, likely
jit_above_cost is the most relevant but there is also
jit_inline_above_cost, jit_optimize_above_cost, jit_tuple_deforming
and jit_expressions which are relevant too.

If we go with this patch,  the problem I see here is that the amount
of work the JIT compiler must do for a given query depends mostly on
the number of expressions that must be compiled in the query (also to
a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
jit_tuple_deforming and jit_expressions). The DBA does not really have
much control over the number of expressions in the query.  All he or
she can do to get rid of the warning is something like increase
jit_above_cost.  After a few iterations of that, the end result is
that jit_above_cost is now high enough that JIT no longer triggers
for, say, that query to that table with 1000 partitions where no
plan-time pruning takes place.  Is that really a good thing? It likely
means that we just rarely JIT anything at all!

I don't agree with the conclusion of that.

What the parameter would be useful for is to be able to tune those costs (or just turn it off) *for that individual query*. That doesn't mean you "rarely JIT anything atll", it just means you rarely JIT that particular query.

In fact, my goal is to specifically make people do that and *not* just turn off JIT globally.


I'd much rather see us address the costing problem before adding some
warning, especially a warning where it's not clear how to make go
away.

The easiest way would be to add a HINT that says turn off jit for this particular query or something?

I do agree that if we can make  "spending too much time on JIT vs query runtime" go away completely, then there is no need for a parameter like this. 

I still think the warning is useful. And I think it may stay useful even after we have made the JIT costing smarter -- though that's not certain of course. 


This patch is still sitting at "ready for committer".

As an example, I have added such a hint in the attached.

I still stand  by that this patch is better than nothing. Sure, I would love for us to adapt the JIT costing model and algorithm to make this not a problem. And once we've done that, we should remove the parameter again.

It's not on by default, and it's trivial to remove in the future. 


Yes, we're right up at the deadline. I'd still like to get it in, so I'd really appreciate some further voices :)

--
Attachment

Re: Add parameter jit_warn_above_fraction

From
Stephen Frost
Date:
Greetings,

On Fri, Apr 8, 2022 at 07:27 Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 30, 2022 at 3:04 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 30 Mar 2022 at 02:38, Robert Haas <robertmhaas@gmail.com> wrote:
> I think WARNING is fine. After all, the parameter is called
> "jit_warn_above_fraction".

I had a think about this patch.  I guess it's a little similar to
checkpoint_warning. The good thing about the checkpoint_warning is
that in the LOG message we give instructions about how the DBA can fix
the issue, i.e increase max_wal_size.

With the proposed patch I see there is no hint about what might be
done to remove/reduce the warnings.  I imagine that's because it's not
all that clear which GUC should be changed. In my view, likely
jit_above_cost is the most relevant but there is also
jit_inline_above_cost, jit_optimize_above_cost, jit_tuple_deforming
and jit_expressions which are relevant too.

If we go with this patch,  the problem I see here is that the amount
of work the JIT compiler must do for a given query depends mostly on
the number of expressions that must be compiled in the query (also to
a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
jit_tuple_deforming and jit_expressions). The DBA does not really have
much control over the number of expressions in the query.  All he or
she can do to get rid of the warning is something like increase
jit_above_cost.  After a few iterations of that, the end result is
that jit_above_cost is now high enough that JIT no longer triggers
for, say, that query to that table with 1000 partitions where no
plan-time pruning takes place.  Is that really a good thing? It likely
means that we just rarely JIT anything at all!

I don't agree with the conclusion of that.

What the parameter would be useful for is to be able to tune those costs (or just turn it off) *for that individual query*. That doesn't mean you "rarely JIT anything atll", it just means you rarely JIT that particular query.

In fact, my goal is to specifically make people do that and *not* just turn off JIT globally.


I'd much rather see us address the costing problem before adding some
warning, especially a warning where it's not clear how to make go
away.

The easiest way would be to add a HINT that says turn off jit for this particular query or something?

I do agree that if we can make  "spending too much time on JIT vs query runtime" go away completely, then there is no need for a parameter like this. 

I still think the warning is useful. And I think it may stay useful even after we have made the JIT costing smarter -- though that's not certain of course. 


This patch is still sitting at "ready for committer".

As an example, I have added such a hint in the attached.

I still stand  by that this patch is better than nothing. Sure, I would love for us to adapt the JIT costing model and algorithm to make this not a problem. And once we've done that, we should remove the parameter again.

It's not on by default, and it's trivial to remove in the future. 


Yes, we're right up at the deadline. I'd still like to get it in, so I'd really appreciate some further voices :)

Looks reasonable to me, so +1. The default is has it off and so I seriously doubt it’ll cause any issues and it’ll be very handy on large and busy systems with lots of queries finding those that have a serious amount of time being spent in JIT (and hopefully avoid folks just turning jit off across the board, since that’s worse- we need more data on jit and need to work on improving it, not ending up with everyone turning it off). 

Thanks!

Stephen

Re: Add parameter jit_warn_above_fraction

From
David Rowley
Date:
On Fri, 8 Apr 2022 at 23:27, Magnus Hagander <magnus@hagander.net> wrote:
>
>
>
> On Wed, Mar 30, 2022 at 3:04 PM Magnus Hagander <magnus@hagander.net> wrote:
>>
>> On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>>
>>> If we go with this patch,  the problem I see here is that the amount
>>> of work the JIT compiler must do for a given query depends mostly on
>>> the number of expressions that must be compiled in the query (also to
>>> a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
>>> jit_tuple_deforming and jit_expressions). The DBA does not really have
>>> much control over the number of expressions in the query.  All he or
>>> she can do to get rid of the warning is something like increase
>>> jit_above_cost.  After a few iterations of that, the end result is
>>> that jit_above_cost is now high enough that JIT no longer triggers
>>> for, say, that query to that table with 1000 partitions where no
>>> plan-time pruning takes place.  Is that really a good thing? It likely
>>> means that we just rarely JIT anything at all!
>>
>>
>> I don't agree with the conclusion of that.
>>
>> What the parameter would be useful for is to be able to tune those costs (or just turn it off) *for that individual
query*.That doesn't mean you "rarely JIT anything atll", it just means you rarely JIT that particular query.
 

I just struggle to imagine that anyone is going to spend much effort
tuning a warning parameter per query.  I imagine they're far more
likely to just ramp it up to only catch some high percentile problems
or just (more likely) just not bother with it.  It seems more likely
that if anyone was to tune anything per query here it would be
jit_above_cost, since that actually might have an affect on the
performance of the query, rather than if it spits out some warning
message or not.  ISTM that if the user knows what to set it to per
query, then there's very little point in having a warning as we'd be
alerting them to something they already know about.

I looked in the -general list to see if we could get some common
explanations to give us an idea of the most common reason for high JIT
compilation time. It seems that the plans were never simple. [1] seems
due to a complex plan. I'm basing that off the "Functions: 167". I
didn't catch the full plan. From what I can tell, [2] seems to be due
to "lots of empty tables", so assuming the clamping at 1 page is
causing issues there.  I think both of those cases could be resolved
by building the costing the way I mentioned.  I admit that 2 cases is
not a very large sample size.

David

[1]
https://www.postgresql.org/message-id/flat/CAPL5KHq8zfWPzueCemXw4c%2BU568PoDfqo3wBDNm3KAyvybdaMQ%40mail.gmail.com#35aca8b42c3862f44b6be5b260c1a109
[2]
https://www.postgresql.org/message-id/flat/CAHOFxGo5xJt02RmwAWrtv2K0jcqqxG-cDiR8FQbvb0WxdKhcgw%40mail.gmail.com#12d91822e869a2e22ca830cb5632f549



Re: Add parameter jit_warn_above_fraction

From
Magnus Hagander
Date:


On Fri, Apr 8, 2022 at 2:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 Apr 2022 at 23:27, Magnus Hagander <magnus@hagander.net> wrote:
>
>
>
> On Wed, Mar 30, 2022 at 3:04 PM Magnus Hagander <magnus@hagander.net> wrote:
>>
>> On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>>
>>> If we go with this patch,  the problem I see here is that the amount
>>> of work the JIT compiler must do for a given query depends mostly on
>>> the number of expressions that must be compiled in the query (also to
>>> a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
>>> jit_tuple_deforming and jit_expressions). The DBA does not really have
>>> much control over the number of expressions in the query.  All he or
>>> she can do to get rid of the warning is something like increase
>>> jit_above_cost.  After a few iterations of that, the end result is
>>> that jit_above_cost is now high enough that JIT no longer triggers
>>> for, say, that query to that table with 1000 partitions where no
>>> plan-time pruning takes place.  Is that really a good thing? It likely
>>> means that we just rarely JIT anything at all!
>>
>>
>> I don't agree with the conclusion of that.
>>
>> What the parameter would be useful for is to be able to tune those costs (or just turn it off) *for that individual query*. That doesn't mean you "rarely JIT anything atll", it just means you rarely JIT that particular query.

I just struggle to imagine that anyone is going to spend much effort
tuning a warning parameter per query.  I imagine they're far more
likely to just ramp it up to only catch some high percentile problems
or just (more likely) just not bother with it.  It seems more likely
that if anyone was to tune anything per query here it would be
jit_above_cost, since that actually might have an affect on the
performance of the query, rather than if it spits out some warning
message or not.  ISTM that if the user knows what to set it to per
query, then there's very little point in having a warning as we'd be
alerting them to something they already know about.

I would not expect people to tune the *warning* at a query level. If anything, then ys, they would tune the either jit_above_cost or just jit=off. But the idea being you can do that on a per query level instead of globally.


I looked in the -general list to see if we could get some common
explanations to give us an idea of the most common reason for high JIT
compilation time. It seems that the plans were never simple. [1] seems
due to a complex plan. I'm basing that off the "Functions: 167". I
didn't catch the full plan. From what I can tell, [2] seems to be due
to "lots of empty tables", so assuming the clamping at 1 page is
causing issues there.  I think both of those cases could be resolved
by building the costing the way I mentioned.  I admit that 2 cases is
not a very large sample size.

Again, I am very much for improvements of the costing model. This is in no way intended to be a replacement for that. It's intended to be a stop-gap. 


The bottom line is that people end up with recommendations to turn off JIT globally more or less by default. Because there's no real useful way today to figure out when it causes problems vs when it helps.

The addition to pg_stat_statements I pushed a short while ago would help with that. But I think having a warning like this would also be useful. As a stop-gap measure, yes, but we really don't know when we will have an improved costing model for it. I hope you're right and that we can have it by 16, and then I will definitely advocate for removing the warning again if it works.

--

Re: Add parameter jit_warn_above_fraction

From
Stephen Frost
Date:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> On Fri, Apr 8, 2022 at 2:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > On Fri, 8 Apr 2022 at 23:27, Magnus Hagander <magnus@hagander.net> wrote:
> > > On Wed, Mar 30, 2022 at 3:04 PM Magnus Hagander <magnus@hagander.net>
> > wrote:
> > >>
> > >> On Tue, Mar 29, 2022 at 10:06 PM David Rowley <dgrowleyml@gmail.com>
> > wrote:
> > >>>
> > >>> If we go with this patch,  the problem I see here is that the amount
> > >>> of work the JIT compiler must do for a given query depends mostly on
> > >>> the number of expressions that must be compiled in the query (also to
> > >>> a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
> > >>> jit_tuple_deforming and jit_expressions). The DBA does not really have
> > >>> much control over the number of expressions in the query.  All he or
> > >>> she can do to get rid of the warning is something like increase
> > >>> jit_above_cost.  After a few iterations of that, the end result is
> > >>> that jit_above_cost is now high enough that JIT no longer triggers
> > >>> for, say, that query to that table with 1000 partitions where no
> > >>> plan-time pruning takes place.  Is that really a good thing? It likely
> > >>> means that we just rarely JIT anything at all!
> > >>
> > >>
> > >> I don't agree with the conclusion of that.
> > >>
> > >> What the parameter would be useful for is to be able to tune those
> > costs (or just turn it off) *for that individual query*. That doesn't mean
> > you "rarely JIT anything atll", it just means you rarely JIT that
> > particular query.
> >
> > I just struggle to imagine that anyone is going to spend much effort
> > tuning a warning parameter per query.  I imagine they're far more
> > likely to just ramp it up to only catch some high percentile problems
> > or just (more likely) just not bother with it.  It seems more likely
> > that if anyone was to tune anything per query here it would be
> > jit_above_cost, since that actually might have an affect on the
> > performance of the query, rather than if it spits out some warning
> > message or not.  ISTM that if the user knows what to set it to per
> > query, then there's very little point in having a warning as we'd be
> > alerting them to something they already know about.
>
> I would not expect people to tune the *warning* at a query level. If
> anything, then ys, they would tune the either jit_above_cost or just
> jit=off. But the idea being you can do that on a per query level instead of
> globally.

Yeah, exactly, this is about having a busy system and wanting to know
which queries are spending a lot of time doing JIT relative to the query
time, so that you can go adjust your JIT parameters or possibly disable
JIT for those queries (or maybe bring those cases to -hackers and try to
help make our costing better).

> > I looked in the -general list to see if we could get some common
> > explanations to give us an idea of the most common reason for high JIT
> > compilation time. It seems that the plans were never simple. [1] seems
> > due to a complex plan. I'm basing that off the "Functions: 167". I
> > didn't catch the full plan. From what I can tell, [2] seems to be due
> > to "lots of empty tables", so assuming the clamping at 1 page is
> > causing issues there.  I think both of those cases could be resolved
> > by building the costing the way I mentioned.  I admit that 2 cases is
> > not a very large sample size.
>
> Again, I am very much for improvements of the costing model. This is in no
> way intended to be a replacement for that. It's intended to be a stop-gap.

Not sure I'd say it's a 'stop-gap' as it's really very similar, imv
anyway, to log_min_duration_statement- you want to know what queries are
taking a lot of time but you can't log all of them.

> What I see much of today are things like
> https://dba.stackexchange.com/questions/264955/handling-performance-problems-with-jit-in-postgres-12
> or
>
https://dev.to/xenatisch/cascade-of-doom-jit-and-how-a-postgres-update-led-to-70-failure-on-a-critical-national-service-3f2a
>
> The bottom line is that people end up with recommendations to turn off JIT
> globally more or less by default. Because there's no real useful way today
> to figure out when it causes problems vs when it helps.

Yeah, that's frustrating.

> The addition to pg_stat_statements I pushed a short while ago would help
> with that. But I think having a warning like this would also be useful. As
> a stop-gap measure, yes, but we really don't know when we will have an
> improved costing model for it. I hope you're right and that we can have it
> by 16, and then I will definitely advocate for removing the warning again
> if it works.

Having this in pg_stat_statements is certainly helpful but having a
warning also is.  I don't think we have to address this in only one way.
A lot faster to flip this guc and then look in the logs on a busy system
than to install pg_stat_statements, restart the cluster once you get
permission to do so, and then query it.

Thanks,

Stephen

Attachment

Re: Add parameter jit_warn_above_fraction

From
Julien Rouhaud
Date:
Hi,

On Fri, Apr 08, 2022 at 09:39:18AM -0400, Stephen Frost wrote:
>
> * Magnus Hagander (magnus@hagander.net) wrote:
> > The addition to pg_stat_statements I pushed a short while ago would help
> > with that. But I think having a warning like this would also be useful. As
> > a stop-gap measure, yes, but we really don't know when we will have an
> > improved costing model for it. I hope you're right and that we can have it
> > by 16, and then I will definitely advocate for removing the warning again
> > if it works.
>
> Having this in pg_stat_statements is certainly helpful but having a
> warning also is.  I don't think we have to address this in only one way.
> A lot faster to flip this guc and then look in the logs on a busy system
> than to install pg_stat_statements, restart the cluster once you get
> permission to do so, and then query it.

+1, especially if you otherwise don't really need or want to have
pg_stat_statements enabled, as it's far from being free.  Sure you could enable
it by default with pg_stat_statements.track = none, but that seems a lot more
troublesome than just dynamically enabling a GUC, possibly for a short time
and/or for a specific database/role.



Re: Add parameter jit_warn_above_fraction

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Fri, Apr 08, 2022 at 09:39:18AM -0400, Stephen Frost wrote:
>> Having this in pg_stat_statements is certainly helpful but having a
>> warning also is.  I don't think we have to address this in only one way.
>> A lot faster to flip this guc and then look in the logs on a busy system
>> than to install pg_stat_statements, restart the cluster once you get
>> permission to do so, and then query it.

> +1, especially if you otherwise don't really need or want to have
> pg_stat_statements enabled, as it's far from being free.  Sure you could enable
> it by default with pg_stat_statements.track = none, but that seems a lot more
> troublesome than just dynamically enabling a GUC, possibly for a short time
> and/or for a specific database/role.

The arguments against the GUC were not about whether it's convenient.
They were about whether the information it gives you is actually going
to be of any use.

Also, good luck with "looking in the logs", because by default
WARNING-level messages don't go to the postmaster log.  If that's
the intended use-case then the message ought to appear at LOG
level (which'd also change the desirable name for the GUC).

            regards, tom lane



Re: Add parameter jit_warn_above_fraction

From
Julien Rouhaud
Date:
On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
>
> Also, good luck with "looking in the logs", because by default
> WARNING-level messages don't go to the postmaster log.  If that's
> the intended use-case then the message ought to appear at LOG
> level (which'd also change the desirable name for the GUC).

I must be missing something because by default log_min_messages is WARNING, and
AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
vanilla .psqlrc.



Re: Add parameter jit_warn_above_fraction

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
>> Also, good luck with "looking in the logs", because by default
>> WARNING-level messages don't go to the postmaster log.

> I must be missing something because by default log_min_messages is WARNING, and
> AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
> vanilla .psqlrc.

Ah, sorry, I was looking at the wrong thing namely
log_min_error_statement.  The point still holds though: if we emit this
at level WARNING, what will get logged is just the bare message and not
the statement that triggered it.  How useful do you think that will be?

            regards, tom lane



Re: Add parameter jit_warn_above_fraction

From
Julien Rouhaud
Date:
On Sat, Apr 09, 2022 at 12:31:23PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
> >> Also, good luck with "looking in the logs", because by default
> >> WARNING-level messages don't go to the postmaster log.
> 
> > I must be missing something because by default log_min_messages is WARNING, and
> > AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
> > vanilla .psqlrc.
> 
> Ah, sorry, I was looking at the wrong thing namely
> log_min_error_statement.  The point still holds though: if we emit this
> at level WARNING, what will get logged is just the bare message and not
> the statement that triggered it.  How useful do you think that will be?

Ah right, I did miss that "small detail".  And of course I agree, as-is that
would be entirely useless and it should be LOG, like the rest of similar
features.



Re: Add parameter jit_warn_above_fraction

From
Ibrar Ahmed
Date:


On Sat, Apr 9, 2022 at 8:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Apr 09, 2022 at 12:31:23PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
> >> Also, good luck with "looking in the logs", because by default
> >> WARNING-level messages don't go to the postmaster log.
>
> > I must be missing something because by default log_min_messages is WARNING, and
> > AFAICS I do get WARNING-level messages in some vanilla cluster logs, using
> > vanilla .psqlrc.
>
> Ah, sorry, I was looking at the wrong thing namely
> log_min_error_statement.  The point still holds though: if we emit this
> at level WARNING, what will get logged is just the bare message and not
> the statement that triggered it.  How useful do you think that will be?

Ah right, I did miss that "small detail".  And of course I agree, as-is that
would be entirely useless and it should be LOG, like the rest of similar
features.



The patch does not apply successfully; please rebase the patch.
patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 642.
Hunk #2 FAILED at 3943.
2 out of 2 hunks FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej
patching file src/backend/utils/misc/postgresql.conf.sample
patching file src/include/jit/jit.h

--
Ibrar Ahmed