Re: percentile value check can be slow - Mailing list pgsql-hackers

From David Fetter
Subject Re: percentile value check can be slow
Date
Msg-id 20171119021009.GJ4411@fetter.org
Whole thread Raw
In response to Re: percentile value check can be slow  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: percentile value check can be slow  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:
> Hi,
> 
> On 11/18/2017 10:30 PM, David Fetter wrote:
> > On Sat, Nov 18, 2017 at 10:44:36AM -0500, Tom Lane wrote:
> >> jotpe <jotpe@posteo.de> writes:
> >>> I tried to enter invalid percentile fractions, and was astonished
> >>> that it seems to be checked after many work is done?
> >>
> >> IIRC, only the aggregate's final-function is concerned with direct
> >> arguments, so it's not all that astonishing.
> > 
> > It may not be surprising from the point of view of a systems
> > programmer, but it's pretty surprising that this check is deferred to
> > many seconds in when the system has all the information it need in
> > order to establish this before execution begins.
> > 
> > I'm not sure I see an easy way to do this check early, but it's worth
> > trying on grounds of POLA violation.  I have a couple of ideas on how
> > to do this, one less invasive but hinky, the other a lot more invasive
> > but better overall.
> > 
> > Ugly Hack With Ugly Performance Consequences:
> >     Inject a subtransaction at the start of execution that supplies an
> >     empty input to the final function with the supplied direct
> >     arguments.
> 
> I'm pretty sure you realize this is quite unlikely to get accepted.

I should hope not, but I mentioned it because I'd like to get it on
the record as rejected.

> > Bigger Lift:
> >     Require a separate recognizer function direct arguments and fire
> >     it during post-parse analysis.  Perhaps this could be called
> >     recognizer along with the corresponding mrecognizer.  It's not
> >     clear that it's sane to have separate ones, but I thought I'd
> >     bring it up for completeness.
> > 
> 
> Is 'recognizer' an established definition I should know? Is it the same
> as 'validator' or is it something new/different?

I borrowed it from http://langsec.org/

I'm not entirely sure what you mean by a validator, but a recognizer
is something that gives a quick and sure read as to whether the input
is well-formed. In general, it's along the lines of a tokenizer, a
parser, and something that does very light post-parse analysis for
correctness of form.

For the case that started the thread, a recognizer would check
something along the lines of 
   CHECK('[0,1]' @> ALL(input_array))

> > Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
> >     Allow optional CHECK constraints in CREATE AGGREGATE for direct
> >     arguments.
> > 
> 
> How will any of the approaches deal with something like
> 
>     select percentile_cont((select array_agg(v) from p))
>            within group (order by a) from t;
> 
> In this case the the values are unknown after the parse analysis, so I
> guess it does not really address that.

It doesn't.  Does it make sense to do a one-shot execution for cases
like that?  It might well be worth it to do the aggregate once in
advance as a throw-away if the query execution time is already going
to take awhile.  Of course, you can break that one by making p a JOIN
to yet another thing...

> FWIW while I won't stand in the way of improving this, I wonder if this
> is really worth the additional complexity. If you get errors like this
> with a static list of values, you will fix the list and you're done. If
> the list is dynamic (computed in the query itself), you'll still get the
> error much later during query execution.
> 
> So if you're getting many failures like this for the "delayed error
> reporting" to be an issue, perhaps there's something wrong in you stack
> and you should address that instead?

I'd like to think that getting something to fail quickly and cheaply
when it can will give our end users a better experience.  Here,
"cheaply" refers to their computing resources and time.  Clearly, not
having this happen in this case bothered Johannes enough to wade in
here.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] pg_basebackup --progress output for batch execution
Next
From: 高增琦
Date:
Subject: Patch to add dependency between client executes and static libraries