Re: percentile value check can be slow - Mailing list pgsql-hackers
From | jotpe |
---|---|
Subject | Re: percentile value check can be slow |
Date | |
Msg-id | 535fde33-f51f-739b-6cb8-0707bc733c79@posteo.de Whole thread Raw |
In response to | Re: percentile value check can be slow (David Fetter <david@fetter.org>) |
List | pgsql-hackers |
On 19.11.2017 18:49, David Fetter wrote: > On Sun, Nov 19, 2017 at 01:23:42PM +0100, Tomas Vondra wrote: >> Hi, >> >> On 11/19/2017 03:10 AM, David Fetter wrote: >>> On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote: >>>> Hi, >>>> >>>> ... >>>> >>>> Is 'recognizer' an established definition I should know? Is it the same >>>> as 'validator' or is it something new/different? >>> >>> I borrowed it from http://langsec.org/ >>> >>> I'm not entirely sure what you mean by a validator, but a recognizer >>> is something that gives a quick and sure read as to whether the input >>> is well-formed. In general, it's along the lines of a tokenizer, a >>> parser, and something that does very light post-parse analysis for >>> correctness of form. >>> >>> For the case that started the thread, a recognizer would check >>> something along the lines of >>> >>> CHECK('[0,1]' @> ALL(input_array)) >> >> OK, thanks. From what I understand, recognizer is more about recognizing >> if a string is valid within a given formal language (essentially, if >> it's a well-formed program). That may not be the right term for checks >> on parameter values. > > There are two hard problems in computer science: naming things, cache > coherency, and off-by-one. > >> OTOH we already have "validators" on a number of places - functions >> checking various parameters, e.g. reloptions for FDWs, etc. >> >> But I guess the naming can be solved later ... > > Indeed. > >>>>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users: >>>>> Allow optional CHECK constraints in CREATE AGGREGATE for direct >>>>> arguments. >>>>> >>>> >>>> How will any of the approaches deal with something like >>>> >>>> select percentile_cont((select array_agg(v) from p)) >>>> within group (order by a) from t; >>>> >>>> In this case the the values are unknown after the parse analysis, so I >>>> guess it does not really address that. >>> >>> It doesn't. Does it make sense to do a one-shot execution for cases >>> like that? It might well be worth it to do the aggregate once in >>> advance as a throw-away if the query execution time is already going >>> to take awhile. Of course, you can break that one by making p a JOIN >>> to yet another thing... >>> >>>> FWIW while I won't stand in the way of improving this, I wonder if this >>>> is really worth the additional complexity. If you get errors like this >>>> with a static list of values, you will fix the list and you're done. If >>>> the list is dynamic (computed in the query itself), you'll still get the >>>> error much later during query execution. >>>> >>>> So if you're getting many failures like this for the "delayed error >>>> reporting" to be an issue, perhaps there's something wrong in you stack >>>> and you should address that instead? >>> >>> I'd like to think that getting something to fail quickly and cheaply >>> when it can will give our end users a better experience. Here, >>> "cheaply" refers to their computing resources and time. >> >> The trouble is, this increases execution time for everyone, including >> people who carefully construct the parameter values. That seems rather >> undesirable. > > I may be wrong but I'm pretty sure that a check for well-formed direct > parameters will not impose a significant cost on aggregates. > > It occurs to me that this particular aggregate could take an array of > a domain defined along the lines of: > > CREATE DOMAIN float4_0_1_closed AS float4 > NOT NULL > CHECK(VALUE >= 0.0 AND VALUE <= 1.0); > > Then the check would happen much earlier without adding a bunch of > potentially expensive machinery. > >>> Clearly, not having this happen in this case bothered Johannes >>> enough to wade in here. >> >> No. He was surprised the error is reported after significant amount >> of time, but he does not seem to claim failing faster would be >> valuable to him. That is your assumption, and I have my doubts about >> it. > > My mistake. I shouldn't have guessed when there was a better > alternative. I already wrote that: I did not know about the complexity that is needed to precheck the parameters. I thought maybe it could be done easily. If it's too hard to change that, I wouldn't want that improvement. > Johannes, could you help us understand your thinking in reporting > this? When executing this query was wanted to see whats happening with wrong fractions. And I expected to fail this query quick, but it seams the postgresql has to read a lot until it checks (in the end) that on parameter was out the valid area. I thougt, maybe there are there are technical forces to do it that way, or it can just be improved to fail fast. Best regards.
pgsql-hackers by date: