Home > mailing lists

Re: percentile value check can be slow - Mailing list pgsql-hackers

From	jotpe
Subject	Re: percentile value check can be slow
Date	November 19, 2017 23:17:46
Msg-id	535fde33-f51f-739b-6cb8-0707bc733c79@posteo.de Whole thread Raw
In response to	Re: percentile value check can be slow (David Fetter <david@fetter.org>)
List	pgsql-hackers

Tree view


On 19.11.2017 18:49, David Fetter wrote:
> On Sun, Nov 19, 2017 at 01:23:42PM +0100, Tomas Vondra wrote:
>> Hi,
>>
>> On 11/19/2017 03:10 AM, David Fetter wrote:
>>> On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:
>>>> Hi,
>>>>
>>>> ...
>>>>
>>>> Is 'recognizer' an established definition I should know? Is it the same
>>>> as 'validator' or is it something new/different?
>>>
>>> I borrowed it from http://langsec.org/
>>>
>>> I'm not entirely sure what you mean by a validator, but a recognizer
>>> is something that gives a quick and sure read as to whether the input
>>> is well-formed. In general, it's along the lines of a tokenizer, a
>>> parser, and something that does very light post-parse analysis for
>>> correctness of form.
>>>
>>> For the case that started the thread, a recognizer would check
>>> something along the lines of
>>>
>>>      CHECK('[0,1]' @> ALL(input_array))
>>
>> OK, thanks. From what I understand, recognizer is more about recognizing
>> if a string is valid within a given formal language (essentially, if
>> it's a well-formed program). That may not be the right term for checks
>> on parameter values.
> 
> There are two hard problems in computer science: naming things, cache
> coherency, and off-by-one.
> 
>> OTOH we already have "validators" on a number of places - functions
>> checking various parameters, e.g. reloptions for FDWs, etc.
>>
>> But I guess the naming can be solved later ...
> 
> Indeed.
> 
>>>>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
>>>>>      Allow optional CHECK constraints in CREATE AGGREGATE for direct
>>>>>      arguments.
>>>>>
>>>>
>>>> How will any of the approaches deal with something like
>>>>
>>>>      select percentile_cont((select array_agg(v) from p))
>>>>             within group (order by a) from t;
>>>>
>>>> In this case the the values are unknown after the parse analysis, so I
>>>> guess it does not really address that.
>>>
>>> It doesn't.  Does it make sense to do a one-shot execution for cases
>>> like that?  It might well be worth it to do the aggregate once in
>>> advance as a throw-away if the query execution time is already going
>>> to take awhile.  Of course, you can break that one by making p a JOIN
>>> to yet another thing...
>>>
>>>> FWIW while I won't stand in the way of improving this, I wonder if this
>>>> is really worth the additional complexity. If you get errors like this
>>>> with a static list of values, you will fix the list and you're done. If
>>>> the list is dynamic (computed in the query itself), you'll still get the
>>>> error much later during query execution.
>>>>
>>>> So if you're getting many failures like this for the "delayed error
>>>> reporting" to be an issue, perhaps there's something wrong in you stack
>>>> and you should address that instead?
>>>
>>> I'd like to think that getting something to fail quickly and cheaply
>>> when it can will give our end users a better experience.  Here,
>>> "cheaply" refers to their computing resources and time.
>>
>> The trouble is, this increases execution time for everyone, including
>> people who carefully construct the parameter values. That seems rather
>> undesirable.
> 
> I may be wrong but I'm pretty sure that a check for well-formed direct
> parameters will not impose a significant cost on aggregates.
> 
> It occurs to me that this particular aggregate could take an array of
> a domain defined along the lines of:
> 
>      CREATE DOMAIN float4_0_1_closed AS float4
>          NOT NULL
>          CHECK(VALUE >= 0.0 AND VALUE <= 1.0);
> 
> Then the check would happen much earlier without adding a bunch of
> potentially expensive machinery.
> 
>>> Clearly, not having this happen in this case bothered Johannes
>>> enough to wade in here.
>>
>> No. He was surprised the error is reported after significant amount
>> of time, but he does not seem to claim failing faster would be
>> valuable to him. That is your assumption, and I have my doubts about
>> it.
> 
> My mistake.  I shouldn't have guessed when there was a better
> alternative.

I already wrote that: I did not know about the complexity that is needed 
to precheck the parameters. I thought maybe it could be done easily.
If it's too hard to change that, I wouldn't want that improvement.

> Johannes, could you help us understand your thinking in reporting
> this?

When executing this query was wanted to see whats happening with wrong 
fractions. And I expected to fail this query quick, but it seams the 
postgresql has to read a lot until it checks (in the end) that on 
parameter was out the valid area.

I thougt, maybe there are there are technical forces to do it that way, 
or it can just be improved to fail fast.

Best regards.

pgsql-hackers by date:

From: Emre Hasegeli
Date: 19 November 2017, 23:10:38
Subject: Re: [HACKERS] [PATCH] Improve geometric types

From: Tom Lane
Date: 20 November 2017, 00:11:50
Subject: Re: [HACKERS] pgbench regression test failure

Re: percentile value check can be slow - Mailing list pgsql-hackers

Previous

Next