Re: percentile value check can be slow - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: percentile value check can be slow
Date
Msg-id 0516ec55-cb9b-1415-bfc5-35827f40897e@2ndquadrant.com
Whole thread Raw
In response to Re: percentile value check can be slow  (David Fetter <david@fetter.org>)
Responses Re: percentile value check can be slow  (David Fetter <david@fetter.org>)
List pgsql-hackers
Hi,

On 11/18/2017 10:30 PM, David Fetter wrote:
> On Sat, Nov 18, 2017 at 10:44:36AM -0500, Tom Lane wrote:
>> jotpe <jotpe@posteo.de> writes:
>>> I tried to enter invalid percentile fractions, and was astonished
>>> that it seems to be checked after many work is done?
>>
>> IIRC, only the aggregate's final-function is concerned with direct
>> arguments, so it's not all that astonishing.
> 
> It may not be surprising from the point of view of a systems
> programmer, but it's pretty surprising that this check is deferred to
> many seconds in when the system has all the information it need in
> order to establish this before execution begins.
> 
> I'm not sure I see an easy way to do this check early, but it's worth
> trying on grounds of POLA violation.  I have a couple of ideas on how
> to do this, one less invasive but hinky, the other a lot more invasive
> but better overall.
> 
> Ugly Hack With Ugly Performance Consequences:
>     Inject a subtransaction at the start of execution that supplies an
>     empty input to the final function with the supplied direct
>     arguments.
> 

I'm pretty sure you realize this is quite unlikely to get accepted.

> Bigger Lift:
>     Require a separate recognizer function direct arguments and fire
>     it during post-parse analysis.  Perhaps this could be called
>     recognizer along with the corresponding mrecognizer.  It's not
>     clear that it's sane to have separate ones, but I thought I'd
>     bring it up for completeness.
> 

Is 'recognizer' an established definition I should know? Is it the same
as 'validator' or is it something new/different?

> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
>     Allow optional CHECK constraints in CREATE AGGREGATE for direct
>     arguments.
> 

How will any of the approaches deal with something like
   select percentile_cont((select array_agg(v) from p))          within group (order by a) from t;

In this case the the values are unknown after the parse analysis, so I
guess it does not really address that.


FWIW while I won't stand in the way of improving this, I wonder if this
is really worth the additional complexity. If you get errors like this
with a static list of values, you will fix the list and you're done. If
the list is dynamic (computed in the query itself), you'll still get the
error much later during query execution.

So if you're getting many failures like this for the "delayed error
reporting" to be an issue, perhaps there's something wrong in you stack
and you should address that instead?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Next
From: Tomas Vondra
Date:
Subject: Re: spgist rangetypes compiler warning (gcc 7.2.0)