Thread: Proposal: plpgsql - "for in array" statement
Hello I looked on some constructs that helps with iteration over array in plpgsql. I propose a following syntax: FOR var IN [array variable | array expression] LOOP .. END LOOP var - declared variable - theoretically we can a detect var type from array type, but it needs a early expression an analyze (not used in PL/pgSQL), so var should be declared before. This construct ensure iteration over all items of array. When somebody needs a subscripts from some dimension, then he can use a proposed function "subscripts". so iteration over two dimensional array can be written: DECLARE i integer; j integer; BEGIN FOR i IN subscripts(myarray, 1) LOOP FOR j IN subscripts(myarray, 2) LOOP RAISE NOTICE 'myarray[%,%] = %', i,j, myarray[i,j]; END LOOP; END LOOP; When input array is multidimensional, then this array is flattened - order of iteration is specified by physical store of items inside the array. This construct iterate over all items of input array - it has a same behave as "unnest" function. some examples: DECLARE v integer; a int[] := ARRAY[2,3.4,5]; BEGIN FOR val IN a LOOP RAISE NOTICE '%', val; -- produce 2,3,4,5 END LOOP; FOR val IN subscripts(a, 1) LOOP RAISE NOTICE '%', val; -- produce 1,2,3,4 END LOOP; FOR val IN subscripts(a,1) LOOP RAISE NOTICE '%', a[val]; -- produce 2,3,4,5 END LOOP; END; Comments, ideas? Regards Pavel Stehule
On Tue, Sep 28, 2010 at 3:24 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I looked on some constructs that helps with iteration over array in > plpgsql. I propose a following syntax: > > FOR var IN [array variable | array expression] What is the benefits compared with FOR ... IN SELECT unnest(array) or generate_subscripts(array) ? -- Itagaki Takahiro
2010/9/28 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Tue, Sep 28, 2010 at 3:24 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I looked on some constructs that helps with iteration over array in >> plpgsql. I propose a following syntax: >> >> FOR var IN [array variable | array expression] > > What is the benefits compared with > FOR ... IN SELECT unnest(array) or generate_subscripts(array) ? > the speed SELECT unnest() is full query, but array_expression is just simple query and can be evaluated by exec_eval_simple_expr - it can be significantly times faster. CREATE OR REPLACE FUNCTION f1() RETURNS void AS $$ DECLARE a int[] := ARRAY[1,2,3,4]; s int; BEGIN FOR i IN 1..10000 LOOP s := 0; FOR j IN array_lower(a,1)..array_upper(a,1) LOOP s := s + a[j]; END LOOP;END LOOP; END; $$ LANGUAGE plpgsql; take about 255ms CREATE OR REPLACE FUNCTION f1() RETURNS void AS $$ DECLARE a int[] := ARRAY[1,2,3,4]; j int; s int; BEGIN FOR i IN 1..10000 LOOP s := 0; FOR j IN SELECT unnest(a) LOOP s := s + j; END LOOP; END LOOP; END; $$ LANGUAGE plpgsql; it takes abou 1000ms Regards Pavel Stehule > -- > Itagaki Takahiro >
> 2010/9/28 Itagaki Takahiro <itagaki.takahiro@gmail.com>: >> On Tue, Sep 28, 2010 at 3:24 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> I looked on some constructs that helps with iteration over array in >>> plpgsql. I propose a following syntax: >>> >>> FOR var IN [array variable | array expression] >> >> What is the benefits compared with >> FOR ... IN SELECT unnest(array) or generate_subscripts(array) ? >> > > the speed > Not to mention that it's far more aesthetically pleasing, and makes the statement immediately understandable to people unfamiliar with the plpgsql idioms you describe. -- Regards, Peter Geoghegan
Pavel Stehule <pavel.stehule@gmail.com> writes: > I looked on some constructs that helps with iteration over array in > plpgsql. I propose a following syntax: > FOR var IN [array variable | array expression] > LOOP I don't have any opinion about whether the functionality proposed here is worth the trouble, but I do have an opinion about that syntax: it's an awful choice. plpgsql has enough trouble already distinguishing between integer for-loops and query for-loops, not to mention trouble in producing a helpful error message when somebody gets either of those constructs slightly wrong. Providing a variant where a single expression can follow IN will make both of those problems an order of magnitude worse. As an example, is this a for-in-query or a for-in-array? FOR v IN (SELECT arraycol FROM tab) LOOP ... Either answer is plausible depending on whether you assume the parentheses make it a subquery. Pick something less easily confusable with the existing constructs. regards, tom lane
On Tue, Sep 28, 2010 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I looked on some constructs that helps with iteration over array in >> plpgsql. I propose a following syntax: > >> FOR var IN [array variable | array expression] >> LOOP > > I don't have any opinion about whether the functionality proposed here > is worth the trouble, but I do have an opinion about that syntax: it's > an awful choice. I agree, on both points. It's nice to try to reduce the excess verbosity that is IMHO the biggest usability issue with PL/pgsql, but I can't help wondering whether we're trying to dam the Nile with chicken wire. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sep 28, 2010, at 7:41 AM, Robert Haas wrote: >> I don't have any opinion about whether the functionality proposed here >> is worth the trouble, but I do have an opinion about that syntax: it's >> an awful choice. > > I agree, on both points. > > It's nice to try to reduce the excess verbosity that is IMHO the > biggest usability issue with PL/pgsql, but I can't help wondering > whether we're trying to dam the Nile with chicken wire. I would really like to have this functionality. Iterating over arrays is something I do in PL/pgSQL a *lot*. I see two ways to handle syntax: 1. A new keyword. Some options: + FOREACH (too close to FOR?) + ITERATE (ew) 2. Require the subscripts() function as syntax. Thoughts? Best, David
Robert Haas <robertmhaas@gmail.com> wrote: >>> FOR var IN [array variable | array expression] >>> LOOP >> >> I don't have any opinion about whether the functionality proposed >> here is worth the trouble, but I do have an opinion about that >> syntax: it's an awful choice. > > I agree, on both points. How about distinguishing it this way:? FOR var IN ARRAY array_expression LOOP -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>>> FOR var IN [array variable | array expression] >>>> LOOP > How about distinguishing it this way:? > FOR var IN ARRAY array_expression LOOP That occurred to me too, but it's got a small problem: it's not impossible for ARRAY to be the first token of a valid scalar expression. But I guess you could get around that if you had to by putting the ARRAY expression inside parens, and it would be a pretty darn unusual case anyway. So this is probably the best choice. I'm not thrilled with David's suggestions of using FOREACH or ITERATE --- using a different initial keyword makes it awkward to make generic statements about "all types of FOR loop". regards, tom lane
2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> I looked on some constructs that helps with iteration over array in >> plpgsql. I propose a following syntax: > >> FOR var IN [array variable | array expression] >> LOOP > > I don't have any opinion about whether the functionality proposed here > is worth the trouble, but I do have an opinion about that syntax: it's > an awful choice. plpgsql has enough trouble already distinguishing > between integer for-loops and query for-loops, not to mention trouble > in producing a helpful error message when somebody gets either of those > constructs slightly wrong. Providing a variant where a single > expression can follow IN will make both of those problems an order of > magnitude worse. As an example, is this a for-in-query or a > for-in-array? > > FOR v IN (SELECT arraycol FROM tab) LOOP ... > This is a subquery - so it is a for-in-array - should return one row with one column. Similar construct is in SQL/PSM where you can to write SET var = (SELECT ...) You cannot to write just (SELECT ...) anywhere > Either answer is plausible depending on whether you assume the > parentheses make it a subquery. > > Pick something less easily confusable with the existing constructs. It's not simple - FOR i IN array is natural - Original ADA use a very similar construct. FOR i IN ARRAY has problem with constant array - FOR i IN ARRAY ARRAY[1,2,3,] and FOREACH is used in Oracle for absolutely different task. > we have now a for-in-cursor, so there is a precedent. regards Pavel > regards, tom lane >
2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>>>> FOR var IN [array variable | array expression] >>>>> LOOP > >> How about distinguishing it this way:? > >> FOR var IN ARRAY array_expression LOOP I see one problem - when you can use a constant array, then you will write two keywords ARRAY FOR var IN ARRAY ARRAY[...] LOOP iteration over cursor is supported now, and you don't write FOR var IN CURSOR cursorvar > > That occurred to me too, but it's got a small problem: it's not > impossible for ARRAY to be the first token of a valid scalar expression. > yes > But I guess you could get around that if you had to by putting the ARRAY > expression inside parens, and it would be a pretty darn unusual case > anyway. So this is probably the best choice. I don't agree - There isn't reason for complicating proposed syntax. The situation is relative simple and we are able to print a correct messages. regards Pavel Stehule I'm not thrilled with > David's suggestions of using FOREACH or ITERATE --- using a different > initial keyword makes it awkward to make generic statements about "all > types of FOR loop". > > regards, tom lane >
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >> But I guess you could get around that if you had to by putting the ARRAY >> expression inside parens, and it would be a pretty darn unusual case >> anyway. So this is probably the best choice. > I don't agree - There isn't reason for complicating proposed syntax. Yes, there is. The syntax you propose is flat out ambiguous: there are two possible legal interpretations of some commands. That's not acceptable, especially not when it's so easily fixed. regards, tom lane
2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >>> But I guess you could get around that if you had to by putting the ARRAY >>> expression inside parens, and it would be a pretty darn unusual case >>> anyway. So this is probably the best choice. > >> I don't agree - There isn't reason for complicating proposed syntax. > > Yes, there is. The syntax you propose is flat out ambiguous: there are > two possible legal interpretations of some commands. That's not > acceptable, especially not when it's so easily fixed. > what are you thinking? The subquery cannot be interpreted different. There are not possible use a isolated subquery as query. And subquery have to return one row, one column. Pavel > regards, tom lane >
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >> As an example, is this a for-in-query or a >> for-in-array? >> >> FOR v IN (SELECT arraycol FROM tab) LOOP ... > This is a subquery - so it is a for-in-array - should return one row > with one column. That's not obvious at all. It's legal right now to write that, and it will be interpreted as for-in-query. Furthermore, there are cases where it's essential to be able to write a left paren before SELECT, so that you can control the precedence of UNION/INTERSECT/EXCEPT constructs. So you're proposing to remove functionality and break existing code in order to have a "simple" syntax for for-in-array. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >> Yes, there is. The syntax you propose is flat out ambiguous: there are >> two possible legal interpretations of some commands. > what are you thinking? The subquery cannot be interpreted different. Sure it can: it could be a parenthesized top-level query. In fact, that's what plpgsql will assume if you feed it that syntax today. regards, tom lane
2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >>> As an example, is this a for-in-query or a >>> for-in-array? >>> >>> FOR v IN (SELECT arraycol FROM tab) LOOP ... > >> This is a subquery - so it is a for-in-array - should return one row >> with one column. > > That's not obvious at all. It's legal right now to write that, and it > will be interpreted as for-in-query. but it has not a sense. It's based on implementation and I am sure, so this isn't documented. Yes, we are able to write a := 10 FROM tab WHERE y = 10 but it is just more bug then required feature. FOR v IN (SELECT FROM) when select returns more than one row is big inconsistency - and this is bug, when this is allowed Regards Pavel Furthermore, there are cases where > it's essential to be able to write a left paren before SELECT, so that > you can control the precedence of UNION/INTERSECT/EXCEPT constructs. > So you're proposing to remove functionality and break existing code in > order to have a "simple" syntax for for-in-array. > > regards, tom lane >
2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >>> Yes, there is. The syntax you propose is flat out ambiguous: there are >>> two possible legal interpretations of some commands. > >> what are you thinking? The subquery cannot be interpreted different. > > Sure it can: it could be a parenthesized top-level query. In fact, > that's what plpgsql will assume if you feed it that syntax today. no - there are not any legal construct FOR r IN (..) I believe so we can find more than one similar undocumented features, like this - so it means so plpgsql will be a buggy? > > regards, tom lane >
On Tue, Sep 28, 2010 at 4:39 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >>>> As an example, is this a for-in-query or a >>>> for-in-array? >>>> >>>> FOR v IN (SELECT arraycol FROM tab) LOOP ... >> >>> This is a subquery - so it is a for-in-array - should return one row >>> with one column. >> >> That's not obvious at all. It's legal right now to write that, and it >> will be interpreted as for-in-query. > > but it has not a sense. It has a very fine sense. It's completely obvious to me what that means, and you're proposing to break it. In a word: no. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >> Sure it can: it could be a parenthesized top-level query. In fact, >> that's what plpgsql will assume if you feed it that syntax today. > no - there are not any legal construct FOR r IN (..) You are simply wrong, sir, and I suggest that you go read the SQL standard until you realize that. Consider for example for r in (SELECT ... FROM a UNION SELECT ... FROM b) INTERSECT (SELECT ... FROM c) LOOP ... The parentheses here are not merely legal, they are *necessary*, else the semantics of the UNION/INTERSECT operations change. regards, tom lane
<br /><br /><br /> On 09/28/2010 03:41 PM, Pavel Stehule wrote:<br /><br /><blockquote cite="mid:AANLkTi=18UivzKezbi+Eax0xx3LeHGr_CfK24z6xrMp1@mail.gmail.com"type="cite"><br /><pre wrap=""> It's not simple - FOR i IN array is natural - Original ADA use a very similar construct. </pre></blockquote><br /><br /> No it doesn't. In Ada (Note: not ADA) FOR can only iterate over one thing: a discrete subtype(e.g. an integer or enumeration type, or a range of it)[1]. You can say:<br /><blockquote>for i in my_array'rangeloop ...<br /></blockquote> but that iterates over the array's index, not over its values.And there is no ambiguitywith other things you might loop over because there aren't any.<br /><br /> cheers<br /><br /> andrew<br /><br />[1]<a class="moz-txt-link-freetext" href="http://www.adaic.org/standards/05rm/html/RM-5-5.html">http://www.adaic.org/standards/05rm/html/RM-5-5.html</a><br />
Excerpts from Kevin Grittner's message of mar sep 28 12:28:12 -0400 2010: > How about distinguishing it this way:? > > FOR var IN ARRAY array_expression LOOP What about FOR EACH var IN array_expr LOOP ... I think this requires reserving EACH, which could cause a regression for working code. Maybe there's a way to make it work? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > What about > > FOR EACH var IN array_expr LOOP ... > > I think this requires reserving EACH, which could cause a regression for > working code. Maybe there's a way to make it work? Code that quotes all of its identifiers, such as with: FOR EACH "var" IN "array_expr" LOOP ... ... would also gain a significant amount of future-proofing since then the language can add keywords at will, without there being conflicts with user-defined identifiers. Similarly, quoting identifiers also carries present-day advantages as then one can name identifiers whatever is most suitably descriptive for them without worrying whether the language has a pre-defined meaning for the used words. The quoting also has the nice bonus of making them case-sensitive. -- Darren Duncan
Alvaro Herrera wrote: > What about > > FOR EACH var IN array_expr LOOP ... > > I think this requires reserving EACH, which could cause a regression for > working code. Maybe there's a way to make it work? What about saying FOR-EACH instead? A good general solution that I'd expect to not cause regressions is to separate multiple-word keywords with dashes rather than spaces. Since unquoted identifiers don't have dashes, I think, and moreover because the whole FOR-EACH would occupy the first position of the statement rather than the first two, there should be no ambiguity. Parsing should be easier, too, because keywords formatted like this would just be a single term rather than having infinite variations due to embedded whitespace. -- Darren Duncan
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Kevin Grittner's message of mar sep 28 12:28:12 -0400 2010: >> How about distinguishing it this way:? >> FOR var IN ARRAY array_expression LOOP > What about > FOR EACH var IN array_expr LOOP ... That doesn't seem to have any obvious connection to looping over array elements, as opposed to some other kind of iteration. regards, tom lane
On 09/28/2010 09:31 PM, Darren Duncan wrote: > Alvaro Herrera wrote: >> What about >> >> FOR EACH var IN array_expr LOOP ... >> >> I think this requires reserving EACH, which could cause a regression for >> working code. Maybe there's a way to make it work? > > Code that quotes all of its identifiers, such as with: > > FOR EACH "var" IN "array_expr" LOOP ... > > ... would also gain a significant amount of future-proofing since then > the language can add keywords at will, without there being conflicts > with user-defined identifiers. > > Similarly, quoting identifiers also carries present-day advantages as > then one can name identifiers whatever is most suitably descriptive > for them without worrying whether the language has a pre-defined > meaning for the used words. > > The quoting also has the nice bonus of making them case-sensitive. > > This doesn't help in the least if the array is an expression rather than simply a variable - we're not going to start quoting expressions. cheers andrew
On 09/28/2010 09:43 PM, Darren Duncan wrote: > Alvaro Herrera wrote: >> What about >> >> FOR EACH var IN array_expr LOOP ... >> >> I think this requires reserving EACH, which could cause a regression for >> working code. Maybe there's a way to make it work? > > What about saying FOR-EACH instead? > > A good general solution that I'd expect to not cause regressions is to > separate multiple-word keywords with dashes rather than spaces. > > Since unquoted identifiers don't have dashes, I think, and moreover > because the whole FOR-EACH would occupy the first position of the > statement rather than the first two, there should be no ambiguity. > > Parsing should be easier, too, because keywords formatted like this > would just be a single term rather than having infinite variations due > to embedded whitespace. This would actually make the parsing infinitely more ugly, not less. And we are not gong to start introducing non-alphabetic characters into keywords. It is also, as Tom noted about the earlier version, without any obvious connection to array processing. cheers andrew
2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2010/9/28 Tom Lane <tgl@sss.pgh.pa.us>: >>> Sure it can: it could be a parenthesized top-level query. In fact, >>> that's what plpgsql will assume if you feed it that syntax today. > >> no - there are not any legal construct FOR r IN (..) > > You are simply wrong, sir, and I suggest that you go read the SQL > standard until you realize that. Consider for example > > for r in (SELECT ... FROM a UNION SELECT ... FROM b) INTERSECT (SELECT ... FROM c) LOOP ... > > The parentheses here are not merely legal, they are *necessary*, else > the semantics of the UNION/INTERSECT operations change. > ok, then probably one variant is for-in-array array_expr. Is there agreement? Regards Pavel Stehule > regards, tom lane >
Andrew Dunstan wrote: > On 09/28/2010 09:31 PM, Darren Duncan wrote: >> >> Code that quotes all of its identifiers, such as with: >> >> FOR EACH "var" IN "array_expr" LOOP ... > > This doesn't help in the least if the array is an expression rather than > simply a variable - we're not going to start quoting expressions. I wrote that wrong. I should have said "array_var" not "array_expr". I am certainly not advocating quoting expressions, and didn't mean to imply that here. My point was that if a token is always interpreted as a keyword rather than an identifier when there is ambiguity, then quoting would let users name an identifier "each" or "EACH". In any event, I will not push this since it doesn't address the real issue of language changes not breaking the general case of legacy code; it only says how users can insulate themselves. -- Darren Duncan