Thread: proposal: fix corner use case of variadic fuctions usage
Hello here is patch, that enables using a variadic parameter modifier for variadic "any" function. Motivation for this patch is consistent behave of "format" function, but it fixes behave of all this class variadic functions. postgres=> -- check pass variadic argument postgres=> select format('%s, %s', variadic array['Hello','World']); format ────────────── Hello, World (1 row) postgres=> -- multidimensional array is supported postgres=> select format('%s, %s', variadic array[['Nazdar','Svete'],['Hello','World']]); format ─────────────────────────────── {Nazdar,Svete}, {Hello,World} (1 row) It respect Tom's comments - it is based on short-lived FmgrInfo structures, that are created immediately before function invocation. Note: there is unsolved issue - reusing transformed arguments - so it is little bit suboptimal for VARIADIC RETURNING MultiFuncCall functions, where we don't need to repeat a unpacking of array value. Regards Pavel
Attachment
On Sun, Nov 4, 2012 at 12:49 PM, Pavel Stehule <span dir="ltr"><<a href="mailto:pavel.stehule@gmail.com" target="_blank">pavel.stehule@gmail.com</a>></span>wrote:<br /><div class="gmail_extra"><div class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello<br/><br /> here is patch, that enables using a variadic parameter modifier for<br/> variadic "any" function.<br /><br /> Motivation for this patch is consistent behave of "format" function,<br />but it fixes behave of all this class variadic functions.<br /><br /> postgres=> -- check pass variadic argument<br/> postgres=> select format('%s, %s', variadic array['Hello','World']);<br /> format<br /> ──────────────<br/> Hello, World<br /> (1 row)<br /><br /> postgres=> -- multidimensional array is supported<br /> postgres=>select format('%s, %s', variadic<br /> array[['Nazdar','Svete'],['Hello','World']]);<br /> format<br/> ───────────────────────────────<br /> {Nazdar,Svete}, {Hello,World}<br /> (1 row)<br /><br /> It respect Tom'scomments - it is based on short-lived FmgrInfo<br /> structures, that are created immediately before function invocation.<br/><br /> Note: there is unsolved issue - reusing transformed arguments - so it<br /> is little bit suboptimalfor VARIADIC RETURNING MultiFuncCall<br /> functions, where we don't need to repeat a unpacking of array value.<br/><br /> Regards<br /><span class=""><font color="#888888"><br /> Pavel</font></span><br /></blockquote></div><br/>Hi Pavel.<br /><br />I am trying to review this patch and on my work computer everything compilesand tests perfectly. However, on my laptop, the regression tests don't pass with "cache lookup failed for type XYZ"where XYZ is some number that does not appear to be any type oid.<br /><br />I don't really know where to go from here.I am asking that other people try this patch to see if they get errors as well.<br /><br />Vik<br /><br />PS: I won'tbe able to answer this thread until Tuesday.<br /><br /></div>
Hello > > Hi Pavel. > > I am trying to review this patch and on my work computer everything compiles > and tests perfectly. However, on my laptop, the regression tests don't pass > with "cache lookup failed for type XYZ" where XYZ is some number that does > not appear to be any type oid. > > I don't really know where to go from here. I am asking that other people try > this patch to see if they get errors as well. > yes, I checked it on .x86_64 and I had a same problems probably there was more than one issue - I had to fix a creating a unpacked params and I had a issue with gcc optimalization when I used a stack variable for fcinfo. Now I fixed these issues and I hope so it will work on all platforms Regards Pavel Stehule > Vik > > PS: I won't be able to answer this thread until Tuesday. >
Attachment
On 30.11.2012 12:18, Vik Reykja wrote: > I am trying to review this patch and on my work computer everything > compiles and tests perfectly. However, on my laptop, the regression tests > don't pass with "cache lookup failed for type XYZ" where XYZ is some number > that does not appear to be any type oid. > > I don't really know where to go from here. I am asking that other people > try this patch to see if they get errors as well. I get the same error. I'll mark this as "waiting on author" in the commitfest. - Heikki
2012/12/5 Heikki Linnakangas <hlinnakangas@vmware.com>: > On 30.11.2012 12:18, Vik Reykja wrote: >> >> I am trying to review this patch and on my work computer everything >> compiles and tests perfectly. However, on my laptop, the regression tests >> don't pass with "cache lookup failed for type XYZ" where XYZ is some >> number >> that does not appear to be any type oid. >> >> I don't really know where to go from here. I am asking that other people >> try this patch to see if they get errors as well. > > > I get the same error. I'll mark this as "waiting on author" in the > commitfest. it was with new patch? http://archives.postgresql.org/message-id/CAFj8pRDyXfXZcL2FRESLU-dPqaOKOu-ekKY12tBzdO559PW5uQ@mail.gmail.com Regards Pavel > > - Heikki
On 05.12.2012 10:38, Pavel Stehule wrote: > 2012/12/5 Heikki Linnakangas<hlinnakangas@vmware.com>: >> I get the same error. I'll mark this as "waiting on author" in the >> commitfest. > > it was with new patch? > > http://archives.postgresql.org/message-id/CAFj8pRDyXfXZcL2FRESLU-dPqaOKOu-ekKY12tBzdO559PW5uQ@mail.gmail.com Ah, no, sorry. The new patch passes regressions just fine. Never mind.. - Heikki
2012/12/5 Heikki Linnakangas <hlinnakangas@vmware.com>: > On 05.12.2012 10:38, Pavel Stehule wrote: >> >> 2012/12/5 Heikki Linnakangas<hlinnakangas@vmware.com>: >> >>> I get the same error. I'll mark this as "waiting on author" in the >>> commitfest. >> >> >> it was with new patch? >> >> >> http://archives.postgresql.org/message-id/CAFj8pRDyXfXZcL2FRESLU-dPqaOKOu-ekKY12tBzdO559PW5uQ@mail.gmail.com > > > Ah, no, sorry. The new patch passes regressions just fine. Never mind.. :) Pavel > > - Heikki
<br /><div class="gmail_extra"><div class="gmail_quote">On Sat, Dec 1, 2012 at 1:14 PM, Pavel Stehule <span dir="ltr"><<ahref="mailto:pavel.stehule@gmail.com" target="_blank">pavel.stehule@gmail.com</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello<br /><div class="im"><br/> ><br /> > Hi Pavel.<br /> ><br /> > I am trying to review this patch and on my work computereverything compiles<br /> > and tests perfectly. However, on my laptop, the regression tests don't pass<br />> with "cache lookup failed for type XYZ" where XYZ is some number that does<br /> > not appear to be any type oid.<br/> ><br /> > I don't really know where to go from here. I am asking that other people try<br /> > this patchto see if they get errors as well.<br /> ><br /><br /></div>yes, I checked it on .x86_64 and I had a same problems<br/><br /> probably there was more than one issue - I had to fix a creating a<br /> unpacked params and I had aissue with gcc optimalization when I used<br /> a stack variable for fcinfo.<br /><br /> Now I fixed these issues and Ihope so it will work on all platforms<br /></blockquote></div><br />It appears to work a lot better, yes. I played aroundwith it a little bit and wasn't able to break it, so I'm marking it as ready for committer. Some wordsmithing willneed to be done on the code comments.<br /></div>
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > Now I fixed these issues and I hope so it will work on all platforms As mentioned on the commitfest application, this needs documentation. That is not the responsibility of the committer; if you need help, then please ask for it. I've also done a quick review of it. The massive if() block being added to execQual.c:ExecMakeFunctionResult really looks like it might be better pulled out into a function of its own. What does "use element_type depends for dimensions" mean and why is it a ToDo? When will it be done? Overall, the comments really need to be better. Things like this: + /* create short lived copies of fmgr data */ + fmgr_info_copy(&sfinfo, fcinfo->flinfo, fcinfo->flinfo->fn_mcxt); + memcpy(scinfo, fcinfo, sizeof(FunctionCallInfoData)); + scinfo->flinfo = &sfinfo; Really don't cut it, imv. *Why* are we creating a copy of the fmgr data? Why does it need to be short lived? And what is going to happen later when you do this?: fcinfo = scinfo; MemoryContextSwitchTo(oldContext); Is there any reason to worry about the fact that fcache->fcinfo_data no longer matches the fcinfo that we're working on? What if there are other references made to it in the future? These memcpy's and pointer foolishness really don't strike me as being a well thought out approach. There are other similar comments throughout which really need to be rewritten to address why we're doing something, not what is being done- you can read the code for that. Marking this Waiting on Author. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > [ quick review of patch ] On reflection it seems to me that this is probably not a very good approach overall. Our general theory for functions taking ANY has been that the core system just computes the arguments and leaves it to the function to make sense of them. Why should this be different in the one specific case of VARIADIC ANY with a variadic array? The approach is also inherently seriously inefficient. Not only does ExecMakeFunctionResult have to build a separate phony Const node for each array element, but the variadic function itself can have no idea that those Consts are all the same type, which means it's going to do datatype lookup over again for each array element. (format(), for instance, would have to look up the same type output function over and over again.) This might not be noticeable on toy examples with a couple of array entries, but it'll be a large performance problem on large arrays. The patch also breaks any attempt that a variadic function might be making to cache info about its argument types across calls. I suppose that the copying of the FmgrInfo is a hack to avoid crashes due to called functions supposing that their flinfo->fn_extra data is still valid for the new argument set. Of course that's another pretty significant performance hit, compounding the per-element hit. Whereas an ordinary variadic function that is given N arguments in a particular query will probably do N datatype lookups and cache the info for the life of the query, a variadic function called with this approach will do one datatype lookup for each array element in each row of the query; and there is no way to optimize that. But large arrays have a worse problem: the approach flat out does not work for arrays of more than FUNC_MAX_ARGS elements, because there's no place to put their values in the FunctionCallInfo struct. (As submitted, if the array is too big the patch will blithely stomp all over memory with user-supplied data, making it not merely a crash risk but probably an exploitable security hole.) This last problem is, so far as I can see, unfixable within this approach; surely we are not going to accept a feature that only works for small arrays. So I am going to mark the CF item rejected not just RWF. I believe that a workable approach would require having the function itself act differently when the VARIADIC flag is set. Currently there is no way for ANY-taking functions to do this because we don't record the use of the VARIADIC flag in FuncExpr, but it'd be reasonable to change that, and probably then add a function similar to get_fn_expr_rettype to dig it out of the FmgrInfo. I don't know if we could usefully provide any infrastructure beyond that for the case, but it'd be worth thinking about whether there's any common behavior for such functions. regards, tom lane
Hello I try to recapitulate our design of variadic functions (sorry for my English - really I have plan to learn this language better - missing time). We have two kinds of VARG functions: VARIADIC "any" and VARIADIC anyarray. These kinds are really different although it is transparent for final user. Our low level design supports variadic functions - it based on FunctionCallInfoData structure. But SQL functions are designed as nonvariadic overloaded functions - described by pg_proc tuple and later FmgrInfo structure. This structure contains fn_nargs and fn_expr (pointer to related parser node) and others, but fn_nargs and fn_epxr are important for this moment. When we would to get parameter of any function parameter, we have to use a get_fn_expr_argtype, that is based on some magic over parser nodes. This expect static number of parameters of any function during query execution - FmgrInfo is significantly reused. So variadic SQL function break some old expectation - but only partially - with using some trick we are able use variadic function inside SQL space without significant changes. 1) CALL of variadic functions are transformed to CALL of function with fixed number of parameters - variadic parameters are transformed to array 2) if you call variadic function in non variadic manner, then its calling function with fixed number of parameters and then there transformation are not necessary. Points @1 and @2 are valid for VARIADIC anyarray functions. We introduced VARIADIC "any" function. Motivation for this kind of function was bypassing postgresql's coerce rules - and own rules implementation for requested functionality. Some builtins function does it internally - but it is not possible for custom functions or it is not possible without some change in parser. Same motivation is reason why "format" function is VARIADIC "any" function. Internal implementation of this function is very simple - just enhance internal checks for mutable number of parameters - and doesn't do anything more - parameters are passed with FunctionCallInfoData, necessary expression nodes are created natively. There is important fact - number of parameters are immutable per one usage - so we can reuse FmgrInfo. Current limit of VARIADIC "any" function is support for call in non variadic manner - originally I didn't propose call in non variadic manner and later we missed support for this use case. What is worse, we don't handle this use case corectly now - call in non variadic manner is quietly and buggy forwarded to variadic call (keyword VARIADIC is ignored) so we can SELECT myleast(10,20,40); SELECT myleast(VARIADIC array[10,20,40]); SELECT format("%d %d", 10, 20); but SELECT format("%d %d", VARIADIC array[10, 20]) returns wrong result because it is not implemented I proposed more solutions a) disabling nob variadic call for VARIADIC "any" function and using overloading as solution for this use case. It was rejected - it has some issues due possible signature ambiguity. b) enhancing FunctionCallInfoData about manner of call - variadic, or non variadic. It was rejected - it needs fixing of all current VARIADIC "any" functions and probably duplicate some work that can be handled generic routine. I don't defend these proposals too much - a issues are clear. c) enhancing executor, so it can transform non variadic call to variadic call - just quietly unpack array with parameters and stores values to FunctionCallInfoData structure. There is new issue - we cannot reuse parser's nodes and fn_expr and FmgrInfo structure - because with this transformation these data are mutable. some example CREATE TABLE test(format_str text, params text[]); INSERT INTO test VALUES('%s', ARRAY['Hello']); INSERT INTO test VALUES('%s %s', ARRAY['Hello','World']); SELECT format(format_str, VARIADIC params) FROM test; now I have to support two instances of function - format('%s', 'Hello') and format('%s %s', 'Hello','World') with two different FmgrInfo - mainly with two different fake parser nodes. any call needs different FmgrInfo - and I am creating it in short context due safe design - any forgotten pointer to this mutable FmgrInfo or more precisely flinfo->fn_expr returns cleaned memory (with enabled asserts) . 2013/1/18 Stephen Frost <sfrost@snowman.net>: > Pavel, > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> Now I fixed these issues and I hope so it will work on all platforms > > As mentioned on the commitfest application, this needs documentation. > That is not the responsibility of the committer; if you need help, then > please ask for it. > > I've also done a quick review of it. > > The massive if() block being added to execQual.c:ExecMakeFunctionResult > really looks like it might be better pulled out into a function of its > own. good idea > > What does "use element_type depends for dimensions" mean and why is it > a ToDo? When will it be done? I had to thinking some time :( - forgotten note - should be removed Actually it should to support multidimensional array as parameter's array - and it does one dimensional slicing - so passing arrays to variadic "any" function in non variadic manner is possible. -- multidimensional array is supportedselect format('%s, %s', variadic array[['Nazdar','Svete'],['Hello','World']]); format------------------------------- {Nazdar,Svete}, {Hello,World}(1 row) > > Overall, the comments really need to be better. Things like this: > > + /* create short lived copies of fmgr data */ > + fmgr_info_copy(&sfinfo, fcinfo->flinfo, fcinfo->flinfo->fn_mcxt); > + memcpy(scinfo, fcinfo, sizeof(FunctionCallInfoData)); > + scinfo->flinfo = &sfinfo; > > > Really don't cut it, imv. *Why* are we creating a copy of the fmgr > data? Why does it need to be short lived? And what is going to happen > later when you do this?: > pls, see my introduction - in this case FmgrInfo is not immutable (in this use case), so it is reason for short living copy and then I using this copied structure instead repeated modification original structure. > fcinfo = scinfo; > MemoryContextSwitchTo(oldContext); > > Is there any reason to worry about the fact that fcache->fcinfo_data no > longer matches the fcinfo that we're working on? What if there are > other references made to it in the future? These memcpy's and pointer > foolishness really don't strike me as being a well thought out approach. > fcinfo_data is used for some purposes now - and it can be accessed from function. Changes that I do are transparent for variadic function - so I cannot use this. > There are other similar comments throughout which really need to be > rewritten to address why we're doing something, not what is being done- > you can read the code for that. Thanks for review Regards Pavel > > Marking this Waiting on Author. > > Thanks, > > Stephen
Hello 2013/1/18 Tom Lane <tgl@sss.pgh.pa.us>: > Stephen Frost <sfrost@snowman.net> writes: >> [ quick review of patch ] > > On reflection it seems to me that this is probably not a very good > approach overall. Our general theory for functions taking ANY has > been that the core system just computes the arguments and leaves it > to the function to make sense of them. Why should this be different > in the one specific case of VARIADIC ANY with a variadic array? > I am not sure if I understand well to question? Reason why VARIADIC "any" is different than VARIADIC anyarray is simple - we have only single type arrays - there are no "any"[] - and we use combination FunctionCallInfoData structure for data and parser expression nodes for type specification. And why we use VARIADIC "any" function? Due our coerce rules - that try to find common coerce type for any unknown (polymorphic) parameters. This coercion can be acceptable - and then people can use VARIADIC "anyarray" or unacceptable - and then people should to use VARIADIC "any" - for example we would not lost type info for boolean types. Next motivation for VARIADIC "any" - implementation is very simple and fast - nobody have to do packing and unpacking array - just push values to narg, arg and argnull fields of FunctionCallInfoData structure. There are no necessary any operations with data. There are only one issue - this corner case. > The approach is also inherently seriously inefficient. Not only > does ExecMakeFunctionResult have to build a separate phony Const > node for each array element, but the variadic function itself can > have no idea that those Consts are all the same type, which means > it's going to do datatype lookup over again for each array element. > (format(), for instance, would have to look up the same type output > function over and over again.) This might not be noticeable on toy > examples with a couple of array entries, but it'll be a large > performance problem on large arrays. yes, "format()" it actually does it - in all cases. And it is not best - but almost of overhead is masked by using sys caches. What is important - for this use case - there is simple and perfect possible optimization - in this case "non variadic manner call of variadic "any" function" all variadic parameters will share same type. Inside variadic function I have not information so this situation is in this moment, but just I can remember last used type - and I can reuse it, when parameter type is same like previous parameter. So there no performance problem. > > The patch also breaks any attempt that a variadic function might be > making to cache info about its argument types across calls. I suppose > that the copying of the FmgrInfo is a hack to avoid crashes due to > called functions supposing that their flinfo->fn_extra data is still > valid for the new argument set. Of course that's another pretty > significant performance hit, compounding the per-element hit. Whereas > an ordinary variadic function that is given N arguments in a particular > query will probably do N datatype lookups and cache the info for the > life of the query, a variadic function called with this approach will > do one datatype lookup for each array element in each row of the query; > and there is no way to optimize that. > I believe so cache of last used datatype and related function can be very effective and enough for this possible performance issues. > But large arrays have a worse problem: the approach flat out does > not work for arrays of more than FUNC_MAX_ARGS elements, because > there's no place to put their values in the FunctionCallInfo struct. > (As submitted, if the array is too big the patch will blithely stomp > all over memory with user-supplied data, making it not merely a crash > risk but probably an exploitable security hole.) agree - FUNC_MAX_ARGS should be tested - it is significant security bug and should be fixed. > > This last problem is, so far as I can see, unfixable within this > approach; surely we are not going to accept a feature that only works > for small arrays. So I am going to mark the CF item rejected not just > RWF. > disagree - non variadic manner call should not be used for walk around FUNC_MAX_ARGS limit. So there should not be passed big array. If somebody need to pass big array to function, then he should to use usual non variadic function with usual array type parameter. He should not use a VARIADIC parameter. We didn't design variadic function to exceeding FUNC_MAX_ARGS limit. So I strongly disagree with rejection for this argument. By contrast - a fact so we don't check array size when variadic function is not called as variadic function is bug. Any function - varidic or not varidic in any use case have to have max FUNC_MAX_ARGS arguments. Our internal variadic functions - that are +/- VARIADIC "any" functions has FUNC_MAX_ARGS limit. > I believe that a workable approach would require having the function > itself act differently when the VARIADIC flag is set. Currently there > is no way for ANY-taking functions to do this because we don't record > the use of the VARIADIC flag in FuncExpr, but it'd be reasonable to > change that, and probably then add a function similar to > get_fn_expr_rettype to dig it out of the FmgrInfo. I don't know if > we could usefully provide any infrastructure beyond that for the case, > but it'd be worth thinking about whether there's any common behavior > for such functions. You propose now something, what you rejected four months ago. your proposal is +/- same like my second proposal and implementation that I sent some time ago. I have not a problem with it - and I'll rewrite it, just I recapitulate your objection a) anybody should to fix any existent variadic "any" function separately - this fix is not general b) it increase complexity (little bit) for this kind of variadic functions. please, can we define agreement on strategy how to fix this issue? I see two important questions? a) what limits are valid for variadic functions? Now FUNC_MAX_ARGS limit is ignored sometime - is it ok? b) where array of variadic parameters for variadic "any" function should be unnested? two possibilities - executor (increase complexity of executor) or inside variadic function - (increase complexity of variadic function). I wrote three variants how to fix this issue - all variants has some advantage and some disadvantage - and I believe so fourth and fifth variant will be same - but all advantage and disadvantage are relative well defined now - so we should to decide for one way. No offensive Tom :), sincerely thank you for review Best regards Pavel > > regards, tom lane
Pavel, While I certainly appreciate your enthusiasm, I don't think this is going to make it into 9.3, which is what we're currentlyfocused on. I'd suggest that you put together a wiki page or similar which outlines how this is going to work and be implemented andit can be discussed for 9.4 in a couple months. I don't think writing any more code is going to be helpful for this rightnow. Thanks, Stephen
2013/1/19 Stephen Frost <sfrost@snowman.net>: > Pavel, > > While I certainly appreciate your enthusiasm, I don't think this is > going to make it into 9.3, which is what we're currently focused on. > > I'd suggest that you put together a wiki page or similar which > outlines how this is going to work and be implemented and it can be > discussed for 9.4 in a couple months. I don't think writing any more > code is going to be helpful for this right now. if we don't define solution now, then probably don't will define solution for 9.4 too. Moving to next release solves nothing. Personally, I can living with commiting in 9.4 - people, who use it and need it, can use existing patch, but I would to have a clean proposition for this issue, because I spent lot of time on this relative simple issue - and I am not happy with it. So I would to write some what will be (probably) commited, and I don't would to return to this open issue again. Regards Pavel > > Thanks, > > Stephen
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2013/1/18 Tom Lane <tgl@sss.pgh.pa.us>: >> The approach is also inherently seriously inefficient. ... > What is important - for this use case - there is simple and perfect > possible optimization - in this case "non variadic manner call of > variadic "any" function" all variadic parameters will share same type. > Inside variadic function I have not information so this situation is > in this moment, but just I can remember last used type - and I can > reuse it, when parameter type is same like previous parameter. > So there no performance problem. Well, if we have to hack each variadic function to make it work well in this scenario, that greatly weakens the major argument for the proposed patch, namely that it provides a single-point fix for VARIADIC behavior. BTW, I experimented with lobotomizing array_in's caching of I/O function lookup behavior, by deleting the if-test at arrayfuncs.c line 184. That seemed to make it about 30% slower for a simple test involving converting two-element float8 arrays. So while failing to cache this stuff isn't the end of the world, arguing that it's not worth worrying about is simply wrong. >> But large arrays have a worse problem: the approach flat out does >> not work for arrays of more than FUNC_MAX_ARGS elements, because >> there's no place to put their values in the FunctionCallInfo struct. >> This last problem is, so far as I can see, unfixable within this >> approach; surely we are not going to accept a feature that only works >> for small arrays. So I am going to mark the CF item rejected not just >> RWF. > disagree - non variadic manner call should not be used for walk around > FUNC_MAX_ARGS limit. So there should not be passed big array. That's utter nonsense. Why wouldn't people expect concat(), for example, to work for large (or even just moderate-sized) arrays? This problem *is* a show stopper for this patch. I suggested a way you can fix it without having such a limitation. If you don't want to go that way, well, it's not going to happen. I agree the prospect that each variadic-ANY function would have to deal with this case for itself is a tad annoying. But there are only two of them in the existing system, and it's not like a variadic-ANY function isn't a pretty complicated beast anyway. > You propose now something, what you rejected four months ago. Well, at the time it wasn't apparent that this approach wouldn't work. It is now, though. regards, tom lane
2013/1/19 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2013/1/18 Tom Lane <tgl@sss.pgh.pa.us>: >>> The approach is also inherently seriously inefficient. ... > >> What is important - for this use case - there is simple and perfect >> possible optimization - in this case "non variadic manner call of >> variadic "any" function" all variadic parameters will share same type. >> Inside variadic function I have not information so this situation is >> in this moment, but just I can remember last used type - and I can >> reuse it, when parameter type is same like previous parameter. > >> So there no performance problem. > > Well, if we have to hack each variadic function to make it work well in > this scenario, that greatly weakens the major argument for the proposed > patch, namely that it provides a single-point fix for VARIADIC behavior. > > BTW, I experimented with lobotomizing array_in's caching of I/O function > lookup behavior, by deleting the if-test at arrayfuncs.c line 184. That > seemed to make it about 30% slower for a simple test involving > converting two-element float8 arrays. So while failing to cache this > stuff isn't the end of the world, arguing that it's not worth worrying > about is simply wrong. > >>> But large arrays have a worse problem: the approach flat out does >>> not work for arrays of more than FUNC_MAX_ARGS elements, because >>> there's no place to put their values in the FunctionCallInfo struct. >>> This last problem is, so far as I can see, unfixable within this >>> approach; surely we are not going to accept a feature that only works >>> for small arrays. So I am going to mark the CF item rejected not just >>> RWF. > >> disagree - non variadic manner call should not be used for walk around >> FUNC_MAX_ARGS limit. So there should not be passed big array. > > That's utter nonsense. Why wouldn't people expect concat(), for > example, to work for large (or even just moderate-sized) arrays? > > This problem *is* a show stopper for this patch. I suggested a way you > can fix it without having such a limitation. If you don't want to go > that way, well, it's not going to happen. > > I agree the prospect that each variadic-ANY function would have to deal > with this case for itself is a tad annoying. But there are only two of > them in the existing system, and it's not like a variadic-ANY function > isn't a pretty complicated beast anyway. > >> You propose now something, what you rejected four months ago. > > Well, at the time it wasn't apparent that this approach wouldn't work. > It is now, though. I have no problem rewrite patch, I'll send new version early. Regards Pavel > > regards, tom lane
On Sat, Jan 19, 2013 at 11:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > We introduced VARIADIC "any" function. Motivation for this kind of > function was bypassing postgresql's coerce rules - and own rules > implementation for requested functionality. Some builtins function > does it internally - but it is not possible for custom functions or it > is not possible without some change in parser. Same motivation is > reason why "format" function is VARIADIC "any" function. I'd just like to draw the attention of all assembled to the fact that this is another example of the cottage industry we've created in avoiding our own burdensome typecasting rules. I not long ago proposed a patch that went nowhere which would have obviated the need for this sort of nonsense in a much more principled way, which of course went nowhere, despite the design being one which Tom himself proposed. Is there any amount of this which will sway popular opinion to the point of view that the problem is not with the individual cases, but the rules themselves? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jan 19, 2013 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> disagree - non variadic manner call should not be used for walk around >> FUNC_MAX_ARGS limit. So there should not be passed big array. > > That's utter nonsense. Why wouldn't people expect concat(), for > example, to work for large (or even just moderate-sized) arrays? /me blinks. What does that have to do with anything? IIUC, the question isn't whether CONCAT() would work for large arrays, but rather for very large numbers of arrays written out as CONCAT(a1, ..., a10000000). I don't understand why an appropriately-placed check against FUNC_MAX_ARGS does anything other than enforce a limit we already have. Or are we currently not consistently enforcing that limit? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jan 19, 2013 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That's utter nonsense. Why wouldn't people expect concat(), for >> example, to work for large (or even just moderate-sized) arrays? > /me blinks. > What does that have to do with anything? IIUC, the question isn't > whether CONCAT() would work for large arrays, but rather for very > large numbers of arrays written out as CONCAT(a1, ..., a10000000). No, the question is what happens with CONCAT(VARIADIC some-array-here), which currently just returns the array as-is, but which really ought to concat all the array elements as if they'd been separate arguments. Pavel is claiming it's okay for that to fall over if the array has more than 100 elements. I disagree, not only for the specific case of CONCAT(), but with the more general implication that such a limitation is going to be okay for any VARIADIC ANY function that anyone will ever write. regards, tom lane
On 01/20/2013 01:37 PM, Robert Haas wrote: > On Sat, Jan 19, 2013 at 11:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> We introduced VARIADIC "any" function. Motivation for this kind of >> function was bypassing postgresql's coerce rules - and own rules >> implementation for requested functionality. Some builtins function >> does it internally - but it is not possible for custom functions or it >> is not possible without some change in parser. Same motivation is >> reason why "format" function is VARIADIC "any" function. > I'd just like to draw the attention of all assembled to the fact that > this is another example of the cottage industry we've created in > avoiding our own burdensome typecasting rules. I not long ago > proposed a patch that went nowhere which would have obviated the need > for this sort of nonsense in a much more principled way, which of > course went nowhere, despite the design being one which Tom himself > proposed. Is there any amount of this which will sway popular opinion > to the point of view that the problem is not with the individual > cases, but the rules themselves? Uh, reference please? This doesn't jog my memory despite it being something that's obviously interesting in light of my recent work. (That could be a a case of dying synapses too.) cheers andrew
Hello 2013/1/20 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jan 19, 2013 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That's utter nonsense. Why wouldn't people expect concat(), for >>> example, to work for large (or even just moderate-sized) arrays? > >> /me blinks. > >> What does that have to do with anything? IIUC, the question isn't >> whether CONCAT() would work for large arrays, but rather for very >> large numbers of arrays written out as CONCAT(a1, ..., a10000000). > > No, the question is what happens with CONCAT(VARIADIC some-array-here), > which currently just returns the array as-is, but which really ought > to concat all the array elements as if they'd been separate arguments. > > Pavel is claiming it's okay for that to fall over if the array has > more than 100 elements. I disagree, not only for the specific case of > CONCAT(), but with the more general implication that such a limitation > is going to be okay for any VARIADIC ANY function that anyone will ever > write. > I am sending patch that is based on last Tom's proposal it missing some small fixes for other variadic "any" functions concat, concat_ws - I'll send it tomorrow Regards Pavel > regards, tom lane
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jan 19, 2013 at 11:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> We introduced VARIADIC "any" function. Motivation for this kind of >> function was bypassing postgresql's coerce rules - and own rules >> implementation for requested functionality. Some builtins function >> does it internally - but it is not possible for custom functions or it >> is not possible without some change in parser. Same motivation is >> reason why "format" function is VARIADIC "any" function. > I'd just like to draw the attention of all assembled to the fact that > this is another example of the cottage industry we've created in > avoiding our own burdensome typecasting rules. I suppose this complaint is based on the idea that we could have declared format() as format(fmt text, VARIADIC values text[]) if only the argument matching rules were sufficiently permissive. I disagree with that though. For that to be anywhere near equivalent, we would have to allow argument matching to cast any data type to text, even if the corresponding cast were explicit-only. Surely that is too dangerous to consider. Even then it wouldn't be quite equivalent, because format() will work on any type whether or not there is a cast to text at all (since it relies on calling the type I/O functions instead). While VARIADIC ANY functions are surely a bit of a hack, I think they are a better solution than destroying the type system's ability to check function calls at all. At least the risks are confined to those specific functions, and not any function anywhere. regards, tom lane
On Sun, Jan 20, 2013 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jan 19, 2013 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That's utter nonsense. Why wouldn't people expect concat(), for >>> example, to work for large (or even just moderate-sized) arrays? > >> /me blinks. > >> What does that have to do with anything? IIUC, the question isn't >> whether CONCAT() would work for large arrays, but rather for very >> large numbers of arrays written out as CONCAT(a1, ..., a10000000). > > No, the question is what happens with CONCAT(VARIADIC some-array-here), > which currently just returns the array as-is, but which really ought > to concat all the array elements as if they'd been separate arguments. Wow, that's pretty surprising behavior. > Pavel is claiming it's okay for that to fall over if the array has > more than 100 elements. I disagree, not only for the specific case of > CONCAT(), but with the more general implication that such a limitation > is going to be okay for any VARIADIC ANY function that anyone will ever > write. I don't know - how many of those will there really ever be? I mean, people only write functions as VARIADIC as a notational convenience, don't they? If you actually need to pass more than 100 separate pieces of data to a function, sending over 100+ parameters is almost certainly the Wrong Way To Do It. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jan 20, 2013 at 2:35 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Uh, reference please? This doesn't jog my memory despite it being something > that's obviously interesting in light of my recent work. (That could be a a > case of dying synapses too.) http://www.postgresql.org/message-id/CA+TgmoZLm6Kp77HXEeU_6B_OBGEnWm9TaGrDF4SrPnsvyEvw2A@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jan 20, 2013 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I suppose this complaint is based on the idea that we could have > declared format() as format(fmt text, VARIADIC values text[]) if > only the argument matching rules were sufficiently permissive. > I disagree with that though. For that to be anywhere near equivalent, > we would have to allow argument matching to cast any data type to > text, even if the corresponding cast were explicit-only. Surely > that is too dangerous to consider. Even then it wouldn't be quite > equivalent, because format() will work on any type whether or not > there is a cast to text at all (since it relies on calling the type > I/O functions instead). Well, as previously discussed a number of times, all types are considered to have assignment casts to text via IO whether or not there is an entry in pg_cast. So the only case in which my proposal would have failed to make this work is if someone added an entry in pg_cast and tagged it as an explicit cast. I can't quite imagine what sort of situation might justify such a perplexing step, but if someone does it and it breaks this then I think they're getting what they paid for. So I think it's pretty close to equivalent. > While VARIADIC ANY functions are surely a bit of a hack, I think they > are a better solution than destroying the type system's ability to check > function calls at all. At least the risks are confined to those > specific functions, and not any function anywhere. I think this is hyperbole which ignores the facts on the ground. Over and over again, we've seen examples of users who are perplexed because there's only one function called wumpus() and we won't call it due to some perceived failure of the types to match sufficiently closely. Destroying the type system's ability to needlessly reject *unambiguous* calls is, IMHO, a feature, not a bug. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/1/20 Robert Haas <robertmhaas@gmail.com>: > On Sun, Jan 20, 2013 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Sat, Jan 19, 2013 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> That's utter nonsense. Why wouldn't people expect concat(), for >>>> example, to work for large (or even just moderate-sized) arrays? >> >>> /me blinks. >> >>> What does that have to do with anything? IIUC, the question isn't >>> whether CONCAT() would work for large arrays, but rather for very >>> large numbers of arrays written out as CONCAT(a1, ..., a10000000). >> >> No, the question is what happens with CONCAT(VARIADIC some-array-here), >> which currently just returns the array as-is, but which really ought >> to concat all the array elements as if they'd been separate arguments. > > Wow, that's pretty surprising behavior. > >> Pavel is claiming it's okay for that to fall over if the array has >> more than 100 elements. I disagree, not only for the specific case of >> CONCAT(), but with the more general implication that such a limitation >> is going to be okay for any VARIADIC ANY function that anyone will ever >> write. one correction - I would to raise error, if array is larger than FUNC_MAX_ARGS. But is true, so this limit is for VARIADIC function synthetic, because parameters are passed in array. On second hand this is inconsistency. > > I don't know - how many of those will there really ever be? I mean, > people only write functions as VARIADIC as a notational convenience, > don't they? If you actually need to pass more than 100 separate > pieces of data to a function, sending over 100+ parameters is almost > certainly the Wrong Way To Do It. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
2013/1/20 Robert Haas <robertmhaas@gmail.com>: > On Sun, Jan 20, 2013 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I suppose this complaint is based on the idea that we could have >> declared format() as format(fmt text, VARIADIC values text[]) if >> only the argument matching rules were sufficiently permissive. >> I disagree with that though. For that to be anywhere near equivalent, >> we would have to allow argument matching to cast any data type to >> text, even if the corresponding cast were explicit-only. Surely >> that is too dangerous to consider. Even then it wouldn't be quite >> equivalent, because format() will work on any type whether or not >> there is a cast to text at all (since it relies on calling the type >> I/O functions instead). > > Well, as previously discussed a number of times, all types are > considered to have assignment casts to text via IO whether or not > there is an entry in pg_cast. So the only case in which my proposal > would have failed to make this work is if someone added an entry in > pg_cast and tagged it as an explicit cast. I can't quite imagine what > sort of situation might justify such a perplexing step, but if someone > does it and it breaks this then I think they're getting what they paid > for. So I think it's pretty close to equivalent. > >> While VARIADIC ANY functions are surely a bit of a hack, I think they >> are a better solution than destroying the type system's ability to check >> function calls at all. At least the risks are confined to those >> specific functions, and not any function anywhere. > > I think this is hyperbole which ignores the facts on the ground. Over > and over again, we've seen examples of users who are perplexed because > there's only one function called wumpus() and we won't call it due to > some perceived failure of the types to match sufficiently closely. > Destroying the type system's ability to needlessly reject > *unambiguous* calls is, IMHO, a feature, not a bug. I am thinking so VARIADIC ANY functions is only one possible solution for functions with more complex coercion rules like oracle DECODE function. Just modification coercion rules is not enough - or we need to thinking about extensible parser and analyser. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jan 20, 2013 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Pavel is claiming it's okay for that to fall over if the array has >> more than 100 elements. I disagree, not only for the specific case of >> CONCAT(), but with the more general implication that such a limitation >> is going to be okay for any VARIADIC ANY function that anyone will ever >> write. > I don't know - how many of those will there really ever be? I mean, > people only write functions as VARIADIC as a notational convenience, > don't they? If you actually need to pass more than 100 separate > pieces of data to a function, sending over 100+ parameters is almost > certainly the Wrong Way To Do It. Well, not necessarily, if they're reasonably expressed as an array. I would also point out that there is no corresponding limitation on variadic functions that take any type other than ANY. Indeed, despite Pavel's claim to the contrary, I'm pretty sure it's seen as a feature that there's no specific upper limit to how many parameters you can pass to a variadic function when using the "VARIADIC array-value" syntax. It's certainly a feature that you can pass a varying number of parameters that way, thereby "evading" the syntactic fact that you can't pass a varying number of parameters any other way. I don't see how come it isn't a feature that you can "evade" the FUNC_MAX_ARGS limit that way, or why we'd consider it acceptable for variably-sized parameter arrays to have such a small arbitrary limit. regards, tom lane
On Sun, Jan 20, 2013 at 10:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jan 20, 2013 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Pavel is claiming it's okay for that to fall over if the array has >>> more than 100 elements. I disagree, not only for the specific case of >>> CONCAT(), but with the more general implication that such a limitation >>> is going to be okay for any VARIADIC ANY function that anyone will ever >>> write. > >> I don't know - how many of those will there really ever be? I mean, >> people only write functions as VARIADIC as a notational convenience, >> don't they? If you actually need to pass more than 100 separate >> pieces of data to a function, sending over 100+ parameters is almost >> certainly the Wrong Way To Do It. > > Well, not necessarily, if they're reasonably expressed as an array. > I would also point out that there is no corresponding limitation on > variadic functions that take any type other than ANY. Indeed, despite > Pavel's claim to the contrary, I'm pretty sure it's seen as a feature > that there's no specific upper limit to how many parameters you can pass > to a variadic function when using the "VARIADIC array-value" syntax. > It's certainly a feature that you can pass a varying number of > parameters that way, thereby "evading" the syntactic fact that you can't > pass a varying number of parameters any other way. I don't see how > come it isn't a feature that you can "evade" the FUNC_MAX_ARGS limit > that way, or why we'd consider it acceptable for variably-sized > parameter arrays to have such a small arbitrary limit. OK, I see. If people are already counting on there being no fixed limit for variadic functions with a type other than "any", then it would indeed seem weird to make "any" an exception. I'm not sure how much practical use case there is for such a thing, but still. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/1/21 Robert Haas <robertmhaas@gmail.com>: > On Sun, Jan 20, 2013 at 10:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Sun, Jan 20, 2013 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Pavel is claiming it's okay for that to fall over if the array has >>>> more than 100 elements. I disagree, not only for the specific case of >>>> CONCAT(), but with the more general implication that such a limitation >>>> is going to be okay for any VARIADIC ANY function that anyone will ever >>>> write. >> >>> I don't know - how many of those will there really ever be? I mean, >>> people only write functions as VARIADIC as a notational convenience, >>> don't they? If you actually need to pass more than 100 separate >>> pieces of data to a function, sending over 100+ parameters is almost >>> certainly the Wrong Way To Do It. >> >> Well, not necessarily, if they're reasonably expressed as an array. >> I would also point out that there is no corresponding limitation on >> variadic functions that take any type other than ANY. Indeed, despite >> Pavel's claim to the contrary, I'm pretty sure it's seen as a feature >> that there's no specific upper limit to how many parameters you can pass >> to a variadic function when using the "VARIADIC array-value" syntax. >> It's certainly a feature that you can pass a varying number of >> parameters that way, thereby "evading" the syntactic fact that you can't >> pass a varying number of parameters any other way. I don't see how >> come it isn't a feature that you can "evade" the FUNC_MAX_ARGS limit >> that way, or why we'd consider it acceptable for variably-sized >> parameter arrays to have such a small arbitrary limit. > > OK, I see. If people are already counting on there being no fixed > limit for variadic functions with a type other than "any", then it > would indeed seem weird to make "any" an exception. I'm not sure how > much practical use case there is for such a thing, but still. after sleeping and some thinking about topic - yes - Tom opinion is correct too - theoretically we can count all variadic argument as one. Exception is just VARIADIC "any" when is called usually - it can be only better documented - I don't see a problem now Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Hello 2013/1/19 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2013/1/18 Tom Lane <tgl@sss.pgh.pa.us>: >>> The approach is also inherently seriously inefficient. ... > >> What is important - for this use case - there is simple and perfect >> possible optimization - in this case "non variadic manner call of >> variadic "any" function" all variadic parameters will share same type. >> Inside variadic function I have not information so this situation is >> in this moment, but just I can remember last used type - and I can >> reuse it, when parameter type is same like previous parameter. > >> So there no performance problem. > > Well, if we have to hack each variadic function to make it work well in > this scenario, that greatly weakens the major argument for the proposed > patch, namely that it provides a single-point fix for VARIADIC behavior. > > BTW, I experimented with lobotomizing array_in's caching of I/O function > lookup behavior, by deleting the if-test at arrayfuncs.c line 184. That > seemed to make it about 30% slower for a simple test involving > converting two-element float8 arrays. So while failing to cache this > stuff isn't the end of the world, arguing that it's not worth worrying > about is simply wrong. > >>> But large arrays have a worse problem: the approach flat out does >>> not work for arrays of more than FUNC_MAX_ARGS elements, because >>> there's no place to put their values in the FunctionCallInfo struct. >>> This last problem is, so far as I can see, unfixable within this >>> approach; surely we are not going to accept a feature that only works >>> for small arrays. So I am going to mark the CF item rejected not just >>> RWF. > >> disagree - non variadic manner call should not be used for walk around >> FUNC_MAX_ARGS limit. So there should not be passed big array. > > That's utter nonsense. Why wouldn't people expect concat(), for > example, to work for large (or even just moderate-sized) arrays? > > This problem *is* a show stopper for this patch. I suggested a way you > can fix it without having such a limitation. If you don't want to go > that way, well, it's not going to happen. > > I agree the prospect that each variadic-ANY function would have to deal > with this case for itself is a tad annoying. But there are only two of > them in the existing system, and it's not like a variadic-ANY function > isn't a pretty complicated beast anyway. > >> You propose now something, what you rejected four months ago. > > Well, at the time it wasn't apparent that this approach wouldn't work. > It is now, though. so here is rewritten patch * there are no limit for array size that holds variadic arguments * iteration over mentioned array is moved to variadic function implementation * there is a new function get_fn_expr_arg_variadic, that returns true, when last argument has label VARIADIC - via FuncExpr node * fix all our variadic "any" functions - concat(), concat_ws() and format() * there is a small optimization - remember last used typOutput function and reuse it when possible * it doesn't change any previous test or documented behave I hope so almost all issues and questions are solved and there are agreement on implemented behave. Regards Pavel > > regards, tom lane
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > so here is rewritten patch I've applied the infrastructure parts of this, but not the changes to format() and concat(). Why are the format and concat patches so randomly different? Not only is the internal implementation completely different for no obvious reason, but the user-visible behavior is inconsistent too. You've got one of them iterating over elements and one over slices. That seems pretty bogus. Personally I'd vote for the element-level behavior in both cases, because that's generally what happens in other PG functions when there's no particular reason to pay attention to the structure of a multidimensional array. I certainly don't see a reason why they should be making opposite choices. Also, I'm not sure that it's appropriate to throw an error if the given argument is null or not an array. Previous versions did not throw an error in such cases. Perhaps just fall back to behaving as if it weren't marked VARIADIC? You could possibly make an argument for not-an-array-type being an error, since that's a statically inconsistent type situation, but I really don't like a null value being an error. A function could easily receive a null value for an array parameter that it wants to pass on to format() or concat(). BTW, using array_iterate as a substitute for deconstruct_array is neither efficient nor good style. array_iterate is for processing the values as you scan the array. regards, tom lane
2013/1/22 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> so here is rewritten patch > > I've applied the infrastructure parts of this, but not the changes > to format() and concat(). > > Why are the format and concat patches so randomly different? > Not only is the internal implementation completely different for no > obvious reason, but the user-visible behavior is inconsistent too. > You've got one of them iterating over elements and one over slices. > That seems pretty bogus. Personally I'd vote for the element-level > behavior in both cases, because that's generally what happens in other > PG functions when there's no particular reason to pay attention to the > structure of a multidimensional array. I certainly don't see a reason > why they should be making opposite choices. some months ago, there was a one argument against this patch (or this feature) impossibility to pass array as one argument into variadic function. I am thinking so natural reply is a slicing. Without slicing you cannot to pass array as a argument - so it is relative hard limit. But if I thinking about it - not too much people use it with multidimensional array, so I prefer slicing as natural behave, but I can live without it. What do you thinking about limit to just only one dimensional arrays? - then we don't need solve this question now. This behave is important for format() -- and maybe it is premature optimization - but I don't to close these doors too early. > > Also, I'm not sure that it's appropriate to throw an error if the given > argument is null or not an array. Previous versions did not throw an > error in such cases. Perhaps just fall back to behaving as if it > weren't marked VARIADIC? You could possibly make an argument for > not-an-array-type being an error, since that's a statically inconsistent > type situation, but I really don't like a null value being an error. > A function could easily receive a null value for an array parameter > that it wants to pass on to format() or concat(). I did test with custom variadic function CREATE OR REPLACE FUNCTION public.ileast(VARIADIC integer[])RETURNS integerLANGUAGE sql AS $function$ select max(v) from unnest($1) g(v) $function$ postgres=# select ileast(10,20);ileast -------- 20 (1 row) postgres=# select ileast(variadic null);ileast -------- (1 row) postgres=# select ileast(variadic 10); ERROR: function ileast(integer) does not exist LINE 1: select ileast(variadic 10); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. so NULL should be supported, and ARRAY should be requested. question is about level - WARNING or ERROR - in long term perspective I am sure so ERROR is correct. If we raise only WARNING - then what we should to do? Ignore VARIADIC label like before or return correct value? Then I don't see a sense for WARNING - this is possible incompatibility :( - but better fix it early than later Other opinions? > > BTW, using array_iterate as a substitute for deconstruct_array is > neither efficient nor good style. array_iterate is for processing the > values as you scan the array. > yes - some deconstruct_sliced_array will be better and cleaner. Depends how we decide first question about using multidimensional arrays. Probably would not be necessary. Regards Pavel > regards, tom lane
Hello 2013/1/22 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> so here is rewritten patch > > I've applied the infrastructure parts of this, but not the changes > to format() and concat(). > > Why are the format and concat patches so randomly different? > Not only is the internal implementation completely different for no > obvious reason, but the user-visible behavior is inconsistent too. > You've got one of them iterating over elements and one over slices. > That seems pretty bogus. Personally I'd vote for the element-level > behavior in both cases, because that's generally what happens in other > PG functions when there's no particular reason to pay attention to the > structure of a multidimensional array. I certainly don't see a reason > why they should be making opposite choices. > > Also, I'm not sure that it's appropriate to throw an error if the given > argument is null or not an array. Previous versions did not throw an > error in such cases. Perhaps just fall back to behaving as if it > weren't marked VARIADIC? You could possibly make an argument for > not-an-array-type being an error, since that's a statically inconsistent > type situation, but I really don't like a null value being an error. > A function could easily receive a null value for an array parameter > that it wants to pass on to format() or concat(). > > BTW, using array_iterate as a substitute for deconstruct_array is > neither efficient nor good style. array_iterate is for processing the > values as you scan the array. I updated patch * simplify concat and concat_ws with reuse array_to_text_internal * remove support for slicing (multidimensional arrays) * VARIADIC NULL is allowed Regards Pavel > > regards, tom lane
Attachment
sorry I have to change wrong comment Regards Pavel 2013/1/22 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > > > 2013/1/22 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> so here is rewritten patch >> >> I've applied the infrastructure parts of this, but not the changes >> to format() and concat(). >> >> Why are the format and concat patches so randomly different? >> Not only is the internal implementation completely different for no >> obvious reason, but the user-visible behavior is inconsistent too. >> You've got one of them iterating over elements and one over slices. >> That seems pretty bogus. Personally I'd vote for the element-level >> behavior in both cases, because that's generally what happens in other >> PG functions when there's no particular reason to pay attention to the >> structure of a multidimensional array. I certainly don't see a reason >> why they should be making opposite choices. >> >> Also, I'm not sure that it's appropriate to throw an error if the given >> argument is null or not an array. Previous versions did not throw an >> error in such cases. Perhaps just fall back to behaving as if it >> weren't marked VARIADIC? You could possibly make an argument for >> not-an-array-type being an error, since that's a statically inconsistent >> type situation, but I really don't like a null value being an error. >> A function could easily receive a null value for an array parameter >> that it wants to pass on to format() or concat(). >> >> BTW, using array_iterate as a substitute for deconstruct_array is >> neither efficient nor good style. array_iterate is for processing the >> values as you scan the array. > > I updated patch > > * simplify concat and concat_ws with reuse array_to_text_internal > * remove support for slicing (multidimensional arrays) > * VARIADIC NULL is allowed > > Regards > > Pavel > >> >> regards, tom lane
Attachment
Hello I sent a updated patch, but still I am not sure in one topic > Also, I'm not sure that it's appropriate to throw an error if the given > argument is null or not an array. Previous versions did not throw an > error in such cases. Perhaps just fall back to behaving as if it > weren't marked VARIADIC? You could possibly make an argument for > not-an-array-type being an error, since that's a statically inconsistent > type situation, but I really don't like a null value being an error. > A function could easily receive a null value for an array parameter > that it wants to pass on to format() or concat(). what should be result of concat(variadic NULL::int[]) I enabled this use case, but what should be result? usually concat() function needs one parameter as minimum and then returns empty string or some string. But concat(variadic NULL::int[]) is +/- zero parameters call. A result should be empty string or NULL? I am vote returning NULL and I have a only one argument If concat(variadic NULL::int[]) returns NULL, then it returns different "value" than concat(variadic '{}'::int[]) what is correct. Opposite behave returns empty string in both variants and It is some when I don't feel well. Regards Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes: > what should be result of concat(variadic NULL::int[]) > I enabled this use case, but what should be result? I think there are two somewhat defensible theories: (1) punt, and return NULL overall. So in this case the variadic function would act as if it were STRICT. That seems a bit weird though if the function is not strict otherwise. (2) Treat the NULL as if it were a zero-length array, giving rise to zero ordinary parameters. This could be problematic if the function can't cope very well with zero parameters ... but on the other hand, if it can't do so, then what will it do with VARIADIC '{}'::int[] ? I lean a little bit towards (2) but it's definitely a judgment call. Anybody have any other arguments one way or the other? regards, tom lane
2013/1/23 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> what should be result of concat(variadic NULL::int[]) >> I enabled this use case, but what should be result? > > I think there are two somewhat defensible theories: > > (1) punt, and return NULL overall. So in this case the variadic > function would act as if it were STRICT. That seems a bit weird though > if the function is not strict otherwise. > > (2) Treat the NULL as if it were a zero-length array, giving rise to > zero ordinary parameters. This could be problematic if the function > can't cope very well with zero parameters ... but on the other hand, > if it can't do so, then what will it do with VARIADIC '{}'::int[] ? This is repeated question - how much is NULL ~ '{}' There is only one precedent, I think postgres=# select '>>>' || array_to_string('{}'::int[], '') || '<<<';?column? ---------->>><<< (1 row) postgres=# select '>>>' || array_to_string(NULL::int[], '') || '<<<';?column? ---------- (1 row) but this function is STRICT - so there is no precedent :( > > I lean a little bit towards (2) but it's definitely a judgment call. > Anybody have any other arguments one way or the other? > > regards, tom lane
2013/1/23 Pavel Stehule <pavel.stehule@gmail.com>: > 2013/1/23 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> what should be result of concat(variadic NULL::int[]) >>> I enabled this use case, but what should be result? >> >> I think there are two somewhat defensible theories: >> >> (1) punt, and return NULL overall. So in this case the variadic >> function would act as if it were STRICT. That seems a bit weird though >> if the function is not strict otherwise. >> >> (2) Treat the NULL as if it were a zero-length array, giving rise to >> zero ordinary parameters. This could be problematic if the function >> can't cope very well with zero parameters ... but on the other hand, >> if it can't do so, then what will it do with VARIADIC '{}'::int[] ? > > This is repeated question - how much is NULL ~ '{}' > > There is only one precedent, I think > > postgres=# select '>>>' || array_to_string('{}'::int[], '') || '<<<'; > ?column? > ---------- > >>><<< > (1 row) > > postgres=# select '>>>' || array_to_string(NULL::int[], '') || '<<<'; > ?column? > ---------- > > (1 row) > > but this function is STRICT - so there is no precedent :( next related example CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[])RETURNS integerLANGUAGE sql AS $function$ select min(v) from unnest($1) g(v) $function$ postgres=# select myleast(variadic array[]::int[]) is null;?column? ----------t (1 row) postgres=# select myleast(variadic null::int[]) is null;?column? ----------t (1 row) so it is close to Tom (2) concat(VARIADIC NULL::int[]) and concat(VARIADIC '{}'::int[]) should returns NULL it is little bit strange, but probably it is most valid Regards Pavel > >> >> I lean a little bit towards (2) but it's definitely a judgment call. >> Anybody have any other arguments one way or the other? >> > > > >> regards, tom lane
On Wed, Jan 23, 2013 at 2:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> what should be result of concat(variadic NULL::int[]) >> I enabled this use case, but what should be result? > > I think there are two somewhat defensible theories: > > (1) punt, and return NULL overall. So in this case the variadic > function would act as if it were STRICT. That seems a bit weird though > if the function is not strict otherwise. > > (2) Treat the NULL as if it were a zero-length array, giving rise to > zero ordinary parameters. This could be problematic if the function > can't cope very well with zero parameters ... but on the other hand, > if it can't do so, then what will it do with VARIADIC '{}'::int[] ? > > I lean a little bit towards (2) but it's definitely a judgment call. > Anybody have any other arguments one way or the other? I'd like to vote for "it probably doesn't matter very much, so let's just pick whatever makes the code simplest". :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Pavel Stehule <pavel.stehule@gmail.com> writes: > next related example > CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[]) > RETURNS integer > LANGUAGE sql > AS $function$ > select min(v) from unnest($1) g(v) > $function$ The reason you get a null from that is that (1) unnest() produces zero rows out for either a null or empty-array input, and (2) min() over zero rows produces NULL. In a lot of cases, it's not very sane for aggregates over no rows to produce NULL; the best-known example is that SUM() produces NULL, when anyone who'd not suffered brain-damage from sitting on the SQL committee would have made it return zero. So I'm not very comfortable with generalizing from this specific case to decide that NULL is the universally right result. regards, tom lane
2013/1/23 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> next related example > >> CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[]) >> RETURNS integer >> LANGUAGE sql >> AS $function$ >> select min(v) from unnest($1) g(v) >> $function$ > > The reason you get a null from that is that (1) unnest() produces zero > rows out for either a null or empty-array input, and (2) min() over > zero rows produces NULL. > > In a lot of cases, it's not very sane for aggregates over no rows to > produce NULL; the best-known example is that SUM() produces NULL, when > anyone who'd not suffered brain-damage from sitting on the SQL committee > would have made it return zero. So I'm not very comfortable with > generalizing from this specific case to decide that NULL is the > universally right result. > I am testing some situation and there are no consistent idea - can we talk just only about functions concat and concat_ws? these functions are really specific - now we talk about corner use case and strongly PostgreSQL proprietary solution. So any solution should not too bad. Difference between all solution will by maximally +/- 4 simple rows per any function. Possibilities 1) A. concat(variadic NULL) => empty string, B. concat(variadic '{}') => empty string -- if we accept @A, then B is ok 2) A. concat(variadic NULL) => NULL, B. concat(variadic '{}') => NULL -- question - is @B valid ? 3) A. concat(variadic NULL) => NULL, B. concat(variadic '{}) => empty string There are no other possibility. I can live with any variant - probably we find any precedent to any variant in our code or in ANSI SQL. I like @2 as general concept for PostgreSQL variadic functions, but when we talk about concat() and concat_ws() @1 is valid too. Please, can somebody say his opinion early Regards Pavel > regards, tom lane
Hello so there is updated version + some lines of doc + new regress tests there are no reply to my previous mail - so I choose concat(variadic null) ---> NULL concat(variadic '{}') ---> empty string this behave should not be precedent for any other variadic "any" function, because concat() and concat_ws() functions has a specific behave - when it is called with one parameter returns string or empty string Regards Pavel 2013/1/23 Pavel Stehule <pavel.stehule@gmail.com>: > 2013/1/23 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> next related example >> >>> CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[]) >>> RETURNS integer >>> LANGUAGE sql >>> AS $function$ >>> select min(v) from unnest($1) g(v) >>> $function$ >> >> The reason you get a null from that is that (1) unnest() produces zero >> rows out for either a null or empty-array input, and (2) min() over >> zero rows produces NULL. >> >> In a lot of cases, it's not very sane for aggregates over no rows to >> produce NULL; the best-known example is that SUM() produces NULL, when >> anyone who'd not suffered brain-damage from sitting on the SQL committee >> would have made it return zero. So I'm not very comfortable with >> generalizing from this specific case to decide that NULL is the >> universally right result. >> > > I am testing some situation and there are no consistent idea - can we > talk just only about functions concat and concat_ws? > > these functions are really specific - now we talk about corner use > case and strongly PostgreSQL proprietary solution. So any solution > should not too bad. > > Difference between all solution will by maximally +/- 4 simple rows > per any function. > > Possibilities > > 1) A. concat(variadic NULL) => empty string, B. concat(variadic '{}') > => empty string -- if we accept @A, then B is ok > 2) A. concat(variadic NULL) => NULL, B. concat(variadic '{}') => NULL > -- question - is @B valid ? > 3) A. concat(variadic NULL) => NULL, B. concat(variadic '{}) => empty string > > There are no other possibility. > > I can live with any variant - probably we find any precedent to any > variant in our code or in ANSI SQL. > > I like @2 as general concept for PostgreSQL variadic functions, but > when we talk about concat() and concat_ws() @1 is valid too. > > Please, can somebody say his opinion early > > Regards > > Pavel > > > >> regards, tom lane
Attachment
On 01/25/2013 01:32 AM, Pavel Stehule wrote: > Hello > > so there is updated version > > + some lines of doc > + new regress tests > > there are no reply to my previous mail - so I choose > > concat(variadic null) ---> NULL > concat(variadic '{}') ---> empty string > I'd love to see this fix still make it into 9.3. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndQuadrant.com> writes: > I'd love to see this fix still make it into 9.3. Working on it right now, actually. regards, tom lane
2013/1/25 Tom Lane <tgl@sss.pgh.pa.us>: > Craig Ringer <craig@2ndQuadrant.com> writes: >> I'd love to see this fix still make it into 9.3. > > Working on it right now, actually. Thank you all for collaboration Best regards Pavel > > regards, tom lane