Thread: array_length(anyarray)
Hi, Attached is a patch to add support for array_length(anyarray), which only works for one-dimensional arrays, returns 0 for empty arrays and complains if the array's lower bound isn't 1. In other words, does the right thing when used with the arrays people use 99% of the time. I'll add this to the next commit fest, but feel free to discuss it before that. Regards, Marko Tiikkaja
Attachment
On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: > Hi, > > Attached is a patch to add support for array_length(anyarray), which > only works for one-dimensional arrays, returns 0 for empty arrays and > complains if the array's lower bound isn't 1. In other words, does > the right thing when used with the arrays people use 99% of the time. Why the heck would it complain if the lower bound isn't 1? cheers andrew
On 2013-12-18 22:13, Andrew Dunstan wrote: > On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: >> Attached is a patch to add support for array_length(anyarray), which >> only works for one-dimensional arrays, returns 0 for empty arrays and >> complains if the array's lower bound isn't 1. In other words, does >> the right thing when used with the arrays people use 99% of the time. > > Why the heck would it complain if the lower bound isn't 1? Because then you're free to assume that the first element is [1] and the last one is [array_length()]. Which is what 99% of the code using array_length(anyarray, int) does anyway. Regards, Marko Tiikkaja
On 2013-12-18 22:19, I wrote: > On 2013-12-18 22:13, Andrew Dunstan wrote: >> On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: >>> Attached is a patch to add support for array_length(anyarray), which >>> only works for one-dimensional arrays, returns 0 for empty arrays and >>> complains if the array's lower bound isn't 1. In other words, does >>> the right thing when used with the arrays people use 99% of the time. >> >> Why the heck would it complain if the lower bound isn't 1? > > Because then you're free to assume that the first element is [1] and the > last one is [array_length()]. Which is what 99% of the code using > array_length(anyarray, int) does anyway. Just to clarify, I mean that array_lower(.., 1) must be 1. Whatever that's called. "The lower bound of the only dimension of the array"? Regards, Marko Tiikkaja
On 12/18/2013 04:19 PM, Marko Tiikkaja wrote: > On 2013-12-18 22:13, Andrew Dunstan wrote: >> On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: >>> Attached is a patch to add support for array_length(anyarray), which >>> only works for one-dimensional arrays, returns 0 for empty arrays and >>> complains if the array's lower bound isn't 1. In other words, does >>> the right thing when used with the arrays people use 99% of the time. >> >> Why the heck would it complain if the lower bound isn't 1? > > Because then you're free to assume that the first element is [1] and > the last one is [array_length()]. Which is what 99% of the code using > array_length(anyarray, int) does anyway. > > > You're not really free to assume it - you'll need an exception handler for the other-than-1 case, or your code might blow up. This seems to be codifying a bad pattern, which should be using array_lower() and array_upper() instead. cheers andrew
On 2013-12-18 22:32, Andrew Dunstan wrote: > You're not really free to assume it - you'll need an exception handler > for the other-than-1 case, or your code might blow up. > > This seems to be codifying a bad pattern, which should be using > array_lower() and array_upper() instead. That's the entire point -- I *want* my code to blow up. If someone passes a multi-dimensional array to a function that assumes its input is one-dimensional and its indexes start from 1, I want it to be obvious that the caller did something wrong. Now I either copy-paste lines and lines of codes to always test for the weird cases or my code breaks in subtle ways. This is no different from an Assert() somewhere -- if the caller breaks the documented interface, it's his problem, not mine. And I don't want to waste my time coding around the fact that this simple thing is so hard to do in PG. Regards, Marko Tiikkaja
Marko Tiikkaja-4 wrote > On 2013-12-18 22:32, Andrew Dunstan wrote: >> You're not really free to assume it - you'll need an exception handler >> for the other-than-1 case, or your code might blow up. >> >> This seems to be codifying a bad pattern, which should be using >> array_lower() and array_upper() instead. > > That's the entire point -- I *want* my code to blow up. If someone > passes a multi-dimensional array to a function that assumes its input is > one-dimensional and its indexes start from 1, I want it to be obvious > that the caller did something wrong. Now I either copy-paste lines and > lines of codes to always test for the weird cases or my code breaks in > subtle ways. > > This is no different from an Assert() somewhere -- if the caller breaks > the documented interface, it's his problem, not mine. And I don't want > to waste my time coding around the fact that this simple thing is so > hard to do in PG. 1) Why cannot we just make the second argument of the current function optional and default to 1? 2) How about providing a function that returns the "1-dim/lower=1" input array or raise/exception if the input array does not conform? <not tested/psuedo-code> CREATE FUNCTION array_normal(arr anyarray) RETURNS anyarray $$ begin if (empty(arr)) return arr; if (ndim(arr) > 1) raise exception; if (array_lower() <> 1) raise exception returnarr; end; $$ I can also see wanting 1-dimensional enforced without having to require the lower-bound to be 1 so maybe a separate function for that. Usage: SELECT array_length(array_normal(input_array)) I could see this being especially useful for a domain and/or column constraint definition and also allowing for a textbook case of separation of concerns. I am torn, but mostly opposed, to making an array_length(anyarray) function with these limitations enforced - especially if other similar functions are not created at the same time. I fully agree that array_length(anyarray) should be a valid call without requiring the user to specify ", 1" by rote. Tangential Question: Is there any way to define a non-1-based array without using array-literal syntax but by using ARRAY[1,2,3] syntax? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/array-length-anyarray-tp5783950p5783972.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 12/19/13, 12:01 AM, David Johnston wrote: > Marko Tiikkaja-4 wrote >> On 2013-12-18 22:32, Andrew Dunstan wrote: >>> You're not really free to assume it - you'll need an exception handler >>> for the other-than-1 case, or your code might blow up. >>> >>> This seems to be codifying a bad pattern, which should be using >>> array_lower() and array_upper() instead. >> >> That's the entire point -- I *want* my code to blow up. If someone >> passes a multi-dimensional array to a function that assumes its input is >> one-dimensional and its indexes start from 1, I want it to be obvious >> that the caller did something wrong. Now I either copy-paste lines and >> lines of codes to always test for the weird cases or my code breaks in >> subtle ways. >> >> This is no different from an Assert() somewhere -- if the caller breaks >> the documented interface, it's his problem, not mine. And I don't want >> to waste my time coding around the fact that this simple thing is so >> hard to do in PG. > > 1) Why cannot we just make the second argument of the current function > optional and default to 1? That still does the wrong thing for the empty array, multidimensional arrays and arrays that don't start from index 1. > 2) How about providing a function that returns the "1-dim/lower=1" input > array or raise/exception if the input array does not conform? > > <not tested/psuedo-code> > CREATE FUNCTION array_normal(arr anyarray) RETURNS anyarray > $$ > begin > if (empty(arr)) return arr; > if (ndim(arr) > 1) raise exception; > if (array_lower() <> 1) raise exception > return arr; > end; > $$ With this, I would still have to do COALESCE(array_length(array_normal($1), 1), 0). That's pretty stupid for the most common use case of arrays, don't you think? > I can also see wanting 1-dimensional enforced without having to require the > lower-bound to be 1 so maybe a separate function for that. I really don't see the point. How often have you ever created a function that doesn't have a lower bound of 1 on purpose? What good did it serve you? > Usage: > > SELECT array_length(array_normal(input_array)) > > I could see this being especially useful for a domain and/or column > constraint definition and also allowing for a textbook case of separation of > concerns. What would array_length() in this case be? With what you suggested above, you would still get NULL for an empty array. > I am torn, but mostly opposed, to making an array_length(anyarray) function > with these limitations enforced - especially if other similar functions are > not created at the same time. I fully agree that array_length(anyarray) > should be a valid call without requiring the user to specify ", 1" by rote. I'm specifically asking for something that is different from array_length(anyarray, int), because I personally think it's too full of caveats. Regards, Marko Tiikkaja
On Wed, Dec 18, 2013 at 09:27:54PM +0100, Marko Tiikkaja wrote: > Hi, > > Attached is a patch to add support for array_length(anyarray), which > only works for one-dimensional arrays, returns 0 for empty arrays > and complains if the array's lower bound isn't 1. In other words, > does the right thing when used with the arrays people use 99% of the > time. +1 for adding this. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2013/12/19 David Fetter <david@fetter.org>
+1
On Wed, Dec 18, 2013 at 09:27:54PM +0100, Marko Tiikkaja wrote:+1 for adding this.
> Hi,
>
> Attached is a patch to add support for array_length(anyarray), which
> only works for one-dimensional arrays, returns 0 for empty arrays
> and complains if the array's lower bound isn't 1. In other words,
> does the right thing when used with the arrays people use 99% of the
> time.
+1
length should be irrelevant to fact so array starts from 1, 0 or anything else
Regards
Pavel
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 19 December 2013 08:05, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > 2013/12/19 David Fetter <david@fetter.org> >> >> On Wed, Dec 18, 2013 at 09:27:54PM +0100, Marko Tiikkaja wrote: >> > Hi, >> > >> > Attached is a patch to add support for array_length(anyarray), which >> > only works for one-dimensional arrays, returns 0 for empty arrays >> > and complains if the array's lower bound isn't 1. In other words, >> > does the right thing when used with the arrays people use 99% of the >> > time. >> >> +1 for adding this. > > > +1 > I think that having 2 functions called array_length() that each behave differently for empty arrays would just lead to confusion. The SQL standard defines a function called cardinality() that returns the number of elements of a collection (array or multiset), so why don't we call it that? > length should be irrelevant to fact so array starts from 1, 0 or anything > else Yes, this should just return the number of elements, and 0 for an empty array. How it should behave for multi-dimensional arrays is less clear, but I'd argue that it should return the total number of elements, i.e. cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it consistent with the choices we've already made for unnest() and ordinality:- cardinality(foo) = (select count(*) from unnest(foo)).- unnest with ordinality would always result in ordinalsin the range [1, cardinality]. Regards, Dean
On Jan9, 2014, at 14:57 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 19 December 2013 08:05, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> length should be irrelevant to fact so array starts from 1, 0 or anything >> else > > Yes, this should just return the number of elements, and 0 for an empty array. +1. Anything that complains about arrays whose lower bound isn't 1 really needs a *way* less generic name than array_length(). > > How it should behave for multi-dimensional arrays is less clear, but > I'd argue that it should return the total number of elements, i.e. > cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it > consistent with the choices we've already made for unnest() and > ordinality: > - cardinality(foo) = (select count(*) from unnest(foo)). > - unnest with ordinality would always result in ordinals in the range > [1, cardinality]. +1 best regards, Florian Pflug
On 1/9/14 5:44 PM, Florian Pflug wrote: > On Jan9, 2014, at 14:57 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 19 December 2013 08:05, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> length should be irrelevant to fact so array starts from 1, 0 or anything >>> else >> >> Yes, this should just return the number of elements, and 0 for an empty array. > > +1. Anything that complains about arrays whose lower bound isn't 1 really > needs a *way* less generic name than array_length(). Problem is, if you're operating on an array which could have a lower bound that isn't 1, why would you look at the length in the first place? You can't access any elements by index, you'd needto look at array_lower(). You can't iterate over the array by index, you'd need to do array_lower() .. array_lower() + array_length(), which doesn't make sense. And then there's the myriad of stuff you can do with unnest() without actually having to look at the length. Same goes for multi-dimensional arrays: you have even less things you can do there with only a length. So if we give up these constraints, we also make this function completely useless. Regards, Marko Tiikkaja
On 1/9/14, 11:08 AM, Marko Tiikkaja wrote: > On 1/9/14 5:44 PM, Florian Pflug wrote: >> On Jan9, 2014, at 14:57 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> On 19 December 2013 08:05, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> length should be irrelevant to fact so array starts from 1, 0 or anything >>>> else >>> >>> Yes, this should just return the number of elements, and 0 for an empty array. >> >> +1. Anything that complains about arrays whose lower bound isn't 1 really >> needs a *way* less generic name than array_length(). > > Problem is, if you're operating on an array which could have a lower bound that isn't 1, why would you look at the lengthin the first place? You can't access any elements by index, you'd need to look at array_lower(). You can't iterateover the array by index, you'd need to do array_lower() .. array_lower() + array_length(), which doesn't make sense. And then there's the myriad of stuff you can do with unnest() without actually having to look at the length. Samegoes for multi-dimensional arrays: you have even less things you can do there with only a length. > > So if we give up these constraints, we also make this function completely useless. I'm generally opposed to creating code that doesn't support the full featureset of something (in this case, array_lower()<>1).But in this case I hope we can all agree that allowing the user to set an arbitrary array lower bound wasan enormous mistake. While we might not be able to ever completely remove that behavior, I find the idea of throwing anerror to be highly enticing. Plus, as Marko said, this function is pretty useless for non-1-based arrays. I do agree that the name is probably too generic for this though. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Jan9, 2014, at 23:26 , Jim Nasby <jim@nasby.net> wrote: > On 1/9/14, 11:08 AM, Marko Tiikkaja wrote: >> On 1/9/14 5:44 PM, Florian Pflug wrote: >>> On Jan9, 2014, at 14:57 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>>> On 19 December 2013 08:05, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> length should be irrelevant to fact so array starts from 1, 0 or anything >>>>> else >>>> >>>> Yes, this should just return the number of elements, and 0 for an empty array. >>> >>> +1. Anything that complains about arrays whose lower bound isn't 1 really >>> needs a *way* less generic name than array_length(). >> >> Problem is, if you're operating on an array which could have a lower bound that isn't 1, why would you look at the lengthin the first place? You can't access any elements by index, you'd need to look at array_lower(). You can't iterateover the array by index, you'd need to do array_lower() .. array_lower() + array_length(), which doesn't make sense. And then there's the myriad of stuff you can do with unnest() without actually having to look at the length. Samegoes for multi-dimensional arrays: you have even less things you can do there with only a length. >> >> So if we give up these constraints, we also make this function completely useless. > > I'm generally opposed to creating code that doesn't support the full featureset of something (in this case, array_lower()<>1).But in this case I hope we can all agree that allowing the user to set an arbitrary array lower bound wasan enormous mistake. No doubt. > While we might not be able to ever completely remove that behavior, I find the idea of throwing an error to be highly enticing. > > Plus, as Marko said, this function is pretty useless for non-1-based arrays. That I doubt, but... > I do agree that the name is probably too generic for this though. this one is actually my main complaint. The name needs to very clearly mark such a function as dealing only with a subsetof all possible arrays. Otherwise we'll just add to the confusion, not avoid it. best regards, Florian Pflug
On Thu, Jan 9, 2014 at 11:08 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 1/9/14 5:44 PM, Florian Pflug wrote: >> >> On Jan9, 2014, at 14:57 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> >>> On 19 December 2013 08:05, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> >>>> length should be irrelevant to fact so array starts from 1, 0 or >>>> anything >>>> else >>> >>> >>> Yes, this should just return the number of elements, and 0 for an empty >>> array. >> >> >> +1. Anything that complains about arrays whose lower bound isn't 1 really >> needs a *way* less generic name than array_length(). > > > Problem is, if you're operating on an array which could have a lower bound > that isn't 1, why would you look at the length in the first place? You > can't access any elements by index, you'd need to look at array_lower(). > You can't iterate over the array by index, you'd need to do array_lower() > .. array_lower() + array_length(), which doesn't make sense. And then > there's the myriad of stuff you can do with unnest() without actually having > to look at the length. Same goes for multi-dimensional arrays: you have > even less things you can do there with only a length. I'm piling on: it's not clear at all to me why you've special cased this to lower_bound=1. First of all, there are other reasons to check length than iteration. If you want your code to blow up with non 1 based array, that should be checked in userland I think (perhaps with a constraint); the server API function should implement as many reasonable behaviors as possible. merlin
On 1/10/14, 1:20 AM, Merlin Moncure wrote: > I'm piling on: it's not clear at all to me why you've special cased > this to lower_bound=1. First of all, there are other reasons to check > length than iteration. Can you point me to some examples? > the server API function should implement as many > reasonable behaviors as possible. That's exactly what I've done here. :-) Regards, Marko Tiikkaja
On 10 January 2014 00:36, Marko Tiikkaja <marko@joh.to> wrote: > On 1/10/14, 1:20 AM, Merlin Moncure wrote: >> >> I'm piling on: it's not clear at all to me why you've special cased >> this to lower_bound=1. First of all, there are other reasons to check >> length than iteration. > Yes, I agree. A length function that returned 0 for empty arrays would be far from useless. > > Can you point me to some examples? > The example I see all the time is code like if array_length(nodes, 1) < 5 then ... do something ... then you realise (or not as the case may be) that this doesn't work for empty arrays, and have to remember to wrap it in a coalesce call. Simply being able to write if cardinality(nodes) < 5 then ... do something ... is not just shorter, easier to type and easier to read, it is far less likely to be the source of subtle bugs. Regards, Dean
On 1/10/14, 9:04 AM, Dean Rasheed wrote: > On 10 January 2014 00:36, Marko Tiikkaja <marko@joh.to> wrote: >> >> Can you point me to some examples? >> > > The example I see all the time is code like > > if array_length(nodes, 1) < 5 then > ... do something ... > > then you realise (or not as the case may be) that this doesn't work > for empty arrays, and have to remember to wrap it in a coalesce call. > > Simply being able to write > > if cardinality(nodes) < 5 then > ... do something ... > > is not just shorter, easier to type and easier to read, it is far less > likely to be the source of subtle bugs But this is what I don't understand: why do you care whether there's less than 5 elements in the array, but you don't care about how they're organized? '[2:3]={1,2}'::int[] and '{{1},{2}}'::int[] both give the same result when unnest()ed, sure, but why do you want to accept such crap as input if you just want a list of elements? I guess what I truly want is a less generic type that's like an array, but always one-dimensional with a lower bound of 1. There's too much garbage that can be passed to a function taking an array as an input right now. Regards, Marko Tiikkaja
On Fri, Jan 10, 2014 at 2:04 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 10 January 2014 00:36, Marko Tiikkaja <marko@joh.to> wrote: >> On 1/10/14, 1:20 AM, Merlin Moncure wrote: >>> >>> I'm piling on: it's not clear at all to me why you've special cased >>> this to lower_bound=1. First of all, there are other reasons to check >>> length than iteration. >> > > Yes, I agree. A length function that returned 0 for empty arrays would > be far from useless. > >> >> Can you point me to some examples? >> > > The example I see all the time is code like > > if array_length(nodes, 1) < 5 then > ... do something ... > > then you realise (or not as the case may be) that this doesn't work > for empty arrays, and have to remember to wrap it in a coalesce call. > > Simply being able to write > > if cardinality(nodes) < 5 then > ... do something ... > > is not just shorter, easier to type and easier to read, it is far less > likely to be the source of subtle bugs. right -- exactly. or, 'ORDER BY cardinatility(nodes)', etc etc. Furthermore, we already have pretty good support for iteration with arrays via unnest(). What's needed for better iteration support (IMO) is a function that does what unnest does but returns an array on indexes (one per dimsension) -- a generalization of the _pg_expandarray function. Lets' say 'unnest_dims'. 'unnest_dims' is non-trivial to code in user land while 'array_length' is an extremely trivial wrapper to array_upper(). cardinality() (which is much better name for the function IMSNHO) gives a*b*c values say for a 3d array also does something non-trivial *particularly in the case of offset arrays*. On Fri, Jan 10, 2014 at 3:36 AM, Marko Tiikkaja <marko@joh.to> wrote: > I guess what I truly want is a less generic type that's like an array, but always one-dimensional with a lower bound of1. Your function would be the only one in the array API that implemented special behaviors like that. That's suggests to me that the less generic function belongs in user land, not in the core array API. merlin
On 1/10/14, 10:41 AM, Merlin Moncure wrote: > What's needed for better iteration support (IMO) > is a function that does what unnest does but returns an array on > indexes (one per dimsension) -- a generalization of the > _pg_expandarray function. Lets' say 'unnest_dims'. So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a functionality I've considered us to have been missing for a long time. Regards, Marko Tiikkaja
On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 1/10/14, 10:41 AM, Merlin Moncure wrote: >> >> What's needed for better iteration support (IMO) >> is a function that does what unnest does but returns an array on >> indexes (one per dimsension) -- a generalization of the >> _pg_expandarray function. Lets' say 'unnest_dims'. > > > So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, > '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a > functionality I've considered us to have been missing for a long time. not quite. it returns int[], anyelement: so, using your example, you'd get: [1,1], 1 [1,2], 2 [2,1], 3 [2,2], 4 like unnest() it would fully decompose the array do individual elements. what you have above slices the array which is useful,but probably shouldn't live under the 'unnest' name -- perhaps 'slice'. Pavel added it to pl/pgsql under the FOREACH syntax (FYI). merlin
On Jan10, 2014, at 11:00 , Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja <marko@joh.to> wrote: >> On 1/10/14, 10:41 AM, Merlin Moncure wrote: >>> >>> What's needed for better iteration support (IMO) >>> is a function that does what unnest does but returns an array on >>> indexes (one per dimsension) -- a generalization of the >>> _pg_expandarray function. Lets' say 'unnest_dims'. >> >> >> So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, >> '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a >> functionality I've considered us to have been missing for a long time. > > not quite. it returns int[], anyelement: so, using your example, you'd get: > > [1,1], 1 > [1,2], 2 > [2,1], 3 > [2,2], 4 Now that we have WITH ORDINALITY, it'd be sufficient to have a variant of array_dims() that returns int[][] instead of text, say array_dimsarray(). Your unnest_dims could then be written as unnest(array_dimsarray(array)) with ordinality best regards, florian pflug
On Fri, Jan 10, 2014 at 6:00 AM, Florian Pflug <fgp@phlo.org> wrote: > On Jan10, 2014, at 11:00 , Merlin Moncure <mmoncure@gmail.com> wrote: >> On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja <marko@joh.to> wrote: >>> On 1/10/14, 10:41 AM, Merlin Moncure wrote: >>>> >>>> What's needed for better iteration support (IMO) >>>> is a function that does what unnest does but returns an array on >>>> indexes (one per dimsension) -- a generalization of the >>>> _pg_expandarray function. Lets' say 'unnest_dims'. >>> >>> >>> So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, >>> '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a >>> functionality I've considered us to have been missing for a long time. >> >> not quite. it returns int[], anyelement: so, using your example, you'd get: >> >> [1,1], 1 >> [1,2], 2 >> [2,1], 3 >> [2,2], 4 > > Now that we have WITH ORDINALITY, it'd be sufficient to have a > variant of array_dims() that returns int[][] instead of text, say > array_dimsarray(). Your unnest_dims could then be written as > > unnest(array_dimsarray(array)) with ordinality hm, not quite following that. maybe an example? my issue with 'WITH ORDINALITY' (while it's pretty neat) is that it doesn't give you the dimension coordinate of each datum so you can't really use it to slice. with unnest_dims(), you an slice, say via: select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[]) group by dims[1]; or select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[]) where dims[1] = 2; not super elegant, but good enough for most uses I think. anyways, getting back on topic, the question on the table is cardinality() vs array_length, right? merlin
On Jan10, 2014, at 15:10 , Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Jan 10, 2014 at 6:00 AM, Florian Pflug <fgp@phlo.org> wrote: >> On Jan10, 2014, at 11:00 , Merlin Moncure <mmoncure@gmail.com> wrote: >>> On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja <marko@joh.to> wrote: >>>> On 1/10/14, 10:41 AM, Merlin Moncure wrote: >>>>> >>>>> What's needed for better iteration support (IMO) >>>>> is a function that does what unnest does but returns an array on >>>>> indexes (one per dimsension) -- a generalization of the >>>>> _pg_expandarray function. Lets' say 'unnest_dims'. >>>> >>>> >>>> So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, >>>> '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a >>>> functionality I've considered us to have been missing for a long time. >>> >>> not quite. it returns int[], anyelement: so, using your example, you'd get: >>> >>> [1,1], 1 >>> [1,2], 2 >>> [2,1], 3 >>> [2,2], 4 >> >> Now that we have WITH ORDINALITY, it'd be sufficient to have a >> variant of array_dims() that returns int[][] instead of text, say >> array_dimsarray(). Your unnest_dims could then be written as >> >> unnest(array_dimsarray(array)) with ordinality > > hm, not quite following that. maybe an example? > > my issue with 'WITH ORDINALITY' (while it's pretty neat) is that it > doesn't give you the dimension coordinate of each datum so you can't > really use it to slice. with unnest_dims(), you an slice, say via: Sorry, I misunderstood what you were proposing. I though you intended unnest_dims to returns one row per dimension, containing the index and bounds of that dimension. And yeah, that fact your your mail showed unnest_dims returning *4* rows for a *2*-dimensional array should have tipped me off. best regards, Florian Pflug
On 1/9/14, 2:57 PM, Dean Rasheed wrote: > Yes, this should just return the number of elements, and 0 for an empty array. > > How it should behave for multi-dimensional arrays is less clear, but > I'd argue that it should return the total number of elements, i.e. > cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it > consistent with the choices we've already made for unnest() and > ordinality: > - cardinality(foo) = (select count(*) from unnest(foo)). > - unnest with ordinality would always result in ordinals in the range > [1, cardinality]. Ignoring my proposal, this seems like the most reasonable option. I'll send an updated patch along these lines. Regards, Marko Tiikkaja
2014/1/12 Marko Tiikkaja <marko@joh.to>
+1
On 1/9/14, 2:57 PM, Dean Rasheed wrote:Ignoring my proposal, this seems like the most reasonable option. I'll send an updated patch along these lines.Yes, this should just return the number of elements, and 0 for an empty array.
How it should behave for multi-dimensional arrays is less clear, but
I'd argue that it should return the total number of elements, i.e.
cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it
consistent with the choices we've already made for unnest() and
ordinality:
- cardinality(foo) = (select count(*) from unnest(foo)).
- unnest with ordinality would always result in ordinals in the range
[1, cardinality].
+1
Pavel
Regards,
Marko Tiikkaja
On 1/12/14, 5:53 AM, I wrote: > On 1/9/14, 2:57 PM, Dean Rasheed wrote: >> How it should behave for multi-dimensional arrays is less clear, but >> I'd argue that it should return the total number of elements, i.e. >> cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it >> consistent with the choices we've already made for unnest() and >> ordinality: >> - cardinality(foo) = (select count(*) from unnest(foo)). >> - unnest with ordinality would always result in ordinals in the range >> [1, cardinality]. > > Ignoring my proposal, this seems like the most reasonable option. I'll > send an updated patch along these lines. Here's the patch as promised. Thoughts? Regards, Marko Tiikkaja
Attachment
Hello
I checked it and I got a small issuebash-4.1$ patch -p1 < cardinality.patch
(Stripping trailing CRs from patch.)
patching file doc/src/sgml/array.sgml
(Stripping trailing CRs from patch.)
patching file doc/src/sgml/func.sgml
(Stripping trailing CRs from patch.)
patching file src/backend/utils/adt/arrayfuncs.c
(Stripping trailing CRs from patch.)
patching file src/include/catalog/pg_proc.h
(Stripping trailing CRs from patch.)
patching file src/include/utils/array.h
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/arrays.out
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/arrays.sql
Regards
Pavel
2014/1/18 Marko Tiikkaja <marko@joh.to>
On 1/12/14, 5:53 AM, I wrote:On 1/9/14, 2:57 PM, Dean Rasheed wrote:How it should behave for multi-dimensional arrays is less clear, but
I'd argue that it should return the total number of elements, i.e.
cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it
consistent with the choices we've already made for unnest() and
ordinality:
- cardinality(foo) = (select count(*) from unnest(foo)).
- unnest with ordinality would always result in ordinals in the range
[1, cardinality].
Ignoring my proposal, this seems like the most reasonable option. I'll
send an updated patch along these lines.
Here's the patch as promised. Thoughts?
Regards,
Marko Tiikkaja
On 1/19/14, 12:21 AM, Pavel Stehule wrote: > I checked it and I got a small issue > > bash-4.1$ patch -p1 < cardinality.patch > (Stripping trailing CRs from patch.) > > not sure about source of this problem. I can't reproduce the problem. In fact, I don't see a single CR byte in the patch file on my disk, the file my email clients claims to have sent or a copy of the file I downloaded from the archives. Are you sure this isn't a problem on your end? Regards, Marko Tiikkaja
2014/1/19 Marko Tiikkaja <marko@joh.to>
On 1/19/14, 12:21 AM, Pavel Stehule wrote:I checked it and I got a small issue
bash-4.1$ patch -p1 < cardinality.patch
(Stripping trailing CRs from patch.)not sure about source of this problem.
I can't reproduce the problem. In fact, I don't see a single CR byte in the patch file on my disk, the file my email clients claims to have sent or a copy of the file I downloaded from the archives. Are you sure this isn't a problem on your end?
It can be problem on my side - some strange combination of mime type. I seen this issue before. I will recheck it tomorrow from other computer.
Regards
Pavel
Regards,
Marko Tiikkaja
On 18 January 2014 03:07, Marko Tiikkaja <marko@joh.to> wrote: > On 1/12/14, 5:53 AM, I wrote: >> >> On 1/9/14, 2:57 PM, Dean Rasheed wrote: >>> >>> How it should behave for multi-dimensional arrays is less clear, but >>> I'd argue that it should return the total number of elements, i.e. >>> cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it >>> consistent with the choices we've already made for unnest() and >>> ordinality: >>> - cardinality(foo) = (select count(*) from unnest(foo)). >>> - unnest with ordinality would always result in ordinals in the range >>> [1, cardinality]. >> >> >> Ignoring my proposal, this seems like the most reasonable option. I'll >> send an updated patch along these lines. > > > Here's the patch as promised. Thoughts? > A couple of points: The answer for empty (zero dimensional) arrays is wrong --- you need special case handling for this case to return 0. In fact why not simply use ArrayGetNItems()? In the docs, in the table of array functions, I think it would probably be useful to make the entry for array_length say "see also cardinality", otherwise people might just stop reading there. I suspect that in over 90% of cases, cardinality will be the more appropriate function to use rather than array_length. Regards, Dean
On 1/19/14, 9:12 AM, Dean Rasheed wrote: > On 18 January 2014 03:07, Marko Tiikkaja <marko@joh.to> wrote: >> Here's the patch as promised. Thoughts? >> > > A couple of points: > > The answer for empty (zero dimensional) arrays is wrong --- you need > special case handling for this case to return 0. How embarrassing. I don't know why I removed that check or how I didn't catch the clearly wrong answer in the test output. > In fact why not > simply use ArrayGetNItems()? Even better. Changed. > In the docs, in the table of array functions, I think it would > probably be useful to make the entry for array_length say "see also > cardinality", otherwise people might just stop reading there. I > suspect that in over 90% of cases, cardinality will be the more > appropriate function to use rather than array_length. I don't see this as a huge improvement, but even worse, I don't see a way to naturally fit it into the description. New version attached, without the doc change. Regards, Marko Tiikkaja
Attachment
On 19 January 2014 11:43, Marko Tiikkaja <marko@joh.to> wrote: > > > On 1/19/14, 9:12 AM, Dean Rasheed wrote: >> >> On 18 January 2014 03:07, Marko Tiikkaja <marko@joh.to> wrote: >>> >>> Here's the patch as promised. Thoughts? >>> >> >> A couple of points: >> >> The answer for empty (zero dimensional) arrays is wrong --- you need >> special case handling for this case to return 0. > > > How embarrassing. I don't know why I removed that check or how I didn't > catch the clearly wrong answer in the test output. > > >> In fact why not >> simply use ArrayGetNItems()? > > > Even better. Changed. > > >> In the docs, in the table of array functions, I think it would >> probably be useful to make the entry for array_length say "see also >> cardinality", otherwise people might just stop reading there. I >> suspect that in over 90% of cases, cardinality will be the more >> appropriate function to use rather than array_length. > > > I don't see this as a huge improvement, but even worse, I don't see a way to > naturally fit it into the description. > Hmm. Looking at that page in the docs, it currently doesn't even mention that array_length returns NULL for empty arrays, or more generally for arrays that don't have the requested dimension. To someone unfamiliar with postgresql arrays, that could lead to a nasty surprise. How about having the array_length docs say something like returns the length of the requested array dimension, or NULL if the array is empty or does not have therequested dimension. To get the total number of array elements across all dimensions, use <function>cardinality</>. If we did that, we should probably also add the "or NULL if the array is empty or does not have the requested dimension" clause to the array_upper and array_lower docs, and "or NULL if the array is empty" to the array_dims and array_ndims docs. That might seem overly pedantic, but it's quite annoying when API documentation doesn't fully specify the behaviour, and you're forced to use trial-and-error to find out how the functions behave. Regards, Dean
On 1/19/14, 2:12 PM, Dean Rasheed wrote: > That might seem overly pedantic, but it's quite annoying when API > documentation doesn't fully specify the behaviour, and you're forced > to use trial-and-error to find out how the functions behave. For what it's worth, I was thinking the same thing when I was looking at that table. Nearly *all* of them are completely inadequate at explaining the finer details, and the user has to experiment to figure out what actually happens. I seem to recall other similar examples in our documentation for functions. Personally I would like to see this fixed for all functions, not just array functions. But I think that should be a separate patch. Regards, Marko Tiikkaja
On 19 January 2014 11:43, Marko Tiikkaja <marko@joh.to> wrote: > New version attached, without the doc change. > This looks good to me. - applies cleanly.- compiles with no warnings.- passes a sensible set of new regression tests.- implements the agreed behaviour,per SQL spec.- I can't think of any corner cases to break it. I think this is ready for committer, although I would also like to see the doc changes to make the table of array function descriptions a bit more explicit about corner cases. Also, does this mean that we can now claim full support for SQL feature S091 "Basic array support"? Regards, Dean
On 1/20/14 2:29 PM, Dean Rasheed wrote: > I think this is ready for committer Thanks! > ... although I would also like to see > the doc changes to make the table of array function descriptions a bit > more explicit about corner cases. Hmm. I completely missed the fact that unnest() already uses a structure similar to yours. It looks like e.g. window functions do the same, but JSON functions all have proper capitalization and periods, and some others capitalize but omit periods. I could submit a separate patch to describe array functions in a bit more detail, unless you wanted to do that? I'm not planning on fixing the inconsistencies, though, despite them annoying me. Regards, Marko Tiikkaja
On 20 January 2014 13:47, Marko Tiikkaja <marko@joh.to> wrote: > On 1/20/14 2:29 PM, Dean Rasheed wrote: >> >> I think this is ready for committer > > > Thanks! > >> ... although I would also like to see >> >> the doc changes to make the table of array function descriptions a bit >> more explicit about corner cases. > > > Hmm. I completely missed the fact that unnest() already uses a structure > similar to yours. It looks like e.g. window functions do the same, but JSON > functions all have proper capitalization and periods, and some others > capitalize but omit periods. > > I could submit a separate patch to describe array functions in a bit more > detail, Yes please. I think that would be helpful. > I'm not planning on fixing the > inconsistencies, though, despite them annoying me. > To be honest, I hadn't even noticed those inconsistencies. The main thing is to alert new users to the fact that empty arrays behave in a rather odd way for a couple of those functions. Regards, Dean
On Sun, Jan 19, 2014 at 1:41 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2014/1/19 Marko Tiikkaja <marko@joh.to> >> On 1/19/14, 12:21 AM, Pavel Stehule wrote: >>> I checked it and I got a small issue >>> >>> bash-4.1$ patch -p1 < cardinality.patch >>> (Stripping trailing CRs from patch.) >>> >>> not sure about source of this problem. >> >> >> I can't reproduce the problem. In fact, I don't see a single CR byte in >> the patch file on my disk, the file my email clients claims to have sent or >> a copy of the file I downloaded from the archives. Are you sure this isn't >> a problem on your end? > > It can be problem on my side - some strange combination of mime type. I seen > this issue before. I will recheck it tomorrow from other computer. Doesn't matter anyway. Patch needing to strip trailing CRs doesn't cause any issue. I got the same message, BTW; maybe some kind of gmail weirdness. As it turns out, a function called cardinality() was added in 2009 and ripped back out again. But the one that was committed back then had funny semantics that people weren't sure about, and people seemed to think it should behave like this one does. So I've gone ahead and committed this, crossing my fingers that we won't have to rip it back out again. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/21/14, 6:42 PM, Robert Haas wrote: > On Sun, Jan 19, 2014 at 1:41 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> It can be problem on my side - some strange combination of mime type. I seen >> this issue before. I will recheck it tomorrow from other computer. > > Doesn't matter anyway. Patch needing to strip trailing CRs doesn't > cause any issue. I got the same message, BTW; maybe some kind of > gmail weirdness. Interesting. > As it turns out, a function called cardinality() was added in 2009 and > ripped back out again. But the one that was committed back then had > funny semantics that people weren't sure about, and people seemed to > think it should behave like this one does. So I've gone ahead and > committed this, crossing my fingers that we won't have to rip it back > out again. Thanks! I'll keep my fingers crossed as well. Regards, Marko Tiikkaja