Re: percentile value check can be slow - Mailing list pgsql-hackers
From | jotpe |
---|---|
Subject | Re: percentile value check can be slow |
Date | |
Msg-id | 1650755f-b43e-f1b0-b9f7-b3d1b2414b7e@posteo.de Whole thread Raw |
In response to | Re: percentile value check can be slow (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
On 19.11.2017 13:23, Tomas Vondra wrote: > Hi, > > On 11/19/2017 03:10 AM, David Fetter wrote: >> On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote: >>> Hi, >>> >>> ... >>> >>> Is 'recognizer' an established definition I should know? Is it the same >>> as 'validator' or is it something new/different? >> >> I borrowed it from http://langsec.org/ >> >> I'm not entirely sure what you mean by a validator, but a recognizer >> is something that gives a quick and sure read as to whether the input >> is well-formed. In general, it's along the lines of a tokenizer, a >> parser, and something that does very light post-parse analysis for >> correctness of form. >> >> For the case that started the thread, a recognizer would check >> something along the lines of >> >> CHECK('[0,1]' @> ALL(input_array)) >> > > OK, thanks. From what I understand, recognizer is more about recognizing > if a string is valid within a given formal language (essentially, if > it's a well-formed program). That may not be the right term for checks > on parameter values. > > OTOH we already have "validators" on a number of places - functions > checking various parameters, e.g. reloptions for FDWs, etc. > > But I guess the naming can be solved later ... > >>>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users: >>>> Allow optional CHECK constraints in CREATE AGGREGATE for direct >>>> arguments. >>>> >>> >>> How will any of the approaches deal with something like >>> >>> select percentile_cont((select array_agg(v) from p)) >>> within group (order by a) from t; >>> >>> In this case the the values are unknown after the parse analysis, so I >>> guess it does not really address that. >> >> It doesn't. Does it make sense to do a one-shot execution for cases >> like that? It might well be worth it to do the aggregate once in >> advance as a throw-away if the query execution time is already going >> to take awhile. Of course, you can break that one by making p a JOIN >> to yet another thing... >> >>> FWIW while I won't stand in the way of improving this, I wonder if this >>> is really worth the additional complexity. If you get errors like this >>> with a static list of values, you will fix the list and you're done. If >>> the list is dynamic (computed in the query itself), you'll still get the >>> error much later during query execution. >>> >>> So if you're getting many failures like this for the "delayed error >>> reporting" to be an issue, perhaps there's something wrong in you stack >>> and you should address that instead? >> >> I'd like to think that getting something to fail quickly and cheaply >> when it can will give our end users a better experience. Here, >> "cheaply" refers to their computing resources and time. > > The trouble is, this increases execution time for everyone, including > people who carefully construct the parameter values. That seems rather > undesirable. > >> >> Clearly, not having this happen in this case bothered Johannes >> enough to wade in here. >> > > No. He was surprised the error is reported after significant amount of > time, but he does not seem to claim failing faster would be valuable to > him. That is your assumption, and I have my doubts about it. I did not know about the complexity that is needed to precheck the parameters. I thought maybe it could be done easily. If it's too hard to change that, I wouldn't want that improvement. Best Regards Johannes
pgsql-hackers by date: