Thread: array functions - request for opinions (was Re: [PATCHES] array support patch phase 1 patch)
array functions - request for opinions (was Re: [PATCHES] array support patch phase 1 patch)
From
Joe Conway
Date:
The discussion below has been taking place on the PATCHES list. I'm forwarding it to HACKERS to get a wider audience. We're looking for opinions on (at a minimum): 1) Which of these functions should exist *and* be included in user documentation 2) Which of these functions are not worth having in the backend at all 3) Specifically WRT array_accum(), is there merit in having a userland function that aggregates row values into arrays,for use in custom aggregates. Note that all of these functions have already been implemented and are either in CVS (all except str_to_array and array_to_str) or in a submitted patch. Some are required to implement the SQL 99 standard syntax, but some are not. Specifically, array_subscript, array_assign, and singleton_array are not needed to implement other functionality and would be the primary candidates for removal. Thanks, Joe -------- Original Message -------- Peter Eisentraut wrote: > Joe Conway writes: >>I personally don't understand why we should hide them from users. If I >>prefer to use array_append(var1, var2) rather than (var1 || var2), >>what's the problem? Its a matter of taste as to which is better. > > The problem is that this approach leads to bloat without bound. Maybe > tomorrow someone prefers append_array(var1, var2) or var1 + var2. The > standard says it's var1 || var2, there is no technical or substantial > aesthetical argument against it, so that's what we should use. I can see your point. But is there any circumstance where the function will work and the standard syntax won't? Until recently, it was not possible to assign individual array elements in PL/pgSQL (I think Tom fixed this). Are there any more examples? Let's go down the list: array_accum(anyarray, anyelement) -- no standard syntax; more on this later array_append(anyarray, anyelement) -- can be easily done with "||" operator; is there any case where "||" won't workthat array_append() will? If not, don't document as a user function. array_assign(anyarray, integer, anyelement) -- can be easily done with "var[n] = X" syntax; is there any case where"var[n] = X" won't work that array_assign() will? If not, don't document as a user function. array_cat(anyarray, anyarray) -- see array_append array_dims(anyarray) -- no standard syntax; should be documented. array_lower(anyarray, integer) -- no standard syntax; should be documented. array_prepend(anyelement, anyarray) -- see array_append array_subscript(anyarray, integer) -- can be easily done with "var[n]" syntax; is there any case where "var[n]" won'twork that array_subscript() will? If not, don't document as a user function. array_to_str(anyarray, text) -- no standard syntax; should be documented. array_upper(anyarray, integer) -- no standard syntax; should be documented. singleton_array(anyelement) -- can be easily done with "array[x]" or "'{x}'" syntax; is there any case where one ofthese won't work that singleton_array() will? If not, don't document as a user function. str_to_array(text, text) -- no standard syntax; should be documented. BTW, should this discussion be on HACKERS or even GENERAL in order to get a wider audience of opinion? >>And in any case, array_accum() is intended to be used for building >>custom aggregates, so that needs to be documented. > Can you give an example of an aggregate or a class of aggregates where > this would be useful? > There are many discussions on the lists over the past few years in which people have requested this kind of functionality. There is even a contrib/intagg that does this for integers only. I don't think there is any question that there is a demand for the feature. Some examples: http://fts.postgresql.org/db/msg.html?mid=1021530 http://fts.postgresql.org/db/msg.html?mid=1096592 http://fts.postgresql.org/db/msg.html?mid=1031700 http://fts.postgresql.org/db/msg.html?mid=1353047 http://fts.postgresql.org/db/msg.html?mid=1063738 http://fts.postgresql.org/db/msg.html?mid=1050837 http://fts.postgresql.org/db/msg.html?mid=1066349 Some people would like to simply aggregate rows of data into arrays for analysis, some want to write custom final functions to post-process the resulting array (e.g. median). With array_accum() and some of the other new functions (array_lower & array_upper specifically come to mind), people have a relatively easy way to create there own custom aggregates using PL/pgSQL. With PL/R, I can use array_accum to create an aggregate for just about any summary statistic that I'm interested in. And while, yes, there is already an implementation of array_accum in PL/R, it isn't reasonable to require anyone who wants to use it to install libR also. And, yes, there are undoubtedly other ways to create those aggregates, but none as simple to use. Again, I'd suggest that if we want to debate this much further, we should move the discussion to the GENERAL and SQL lists so that users can have a chance to chime in. Joe
Joe Conway writes: > >>And in any case, array_accum() is intended to be used for building > >>custom aggregates, so that needs to be documented. > > Can you give an example of an aggregate or a class of aggregates where > > this would be useful? > There are many discussions on the lists over the past few years in which > people have requested this kind of functionality. There is even a > contrib/intagg that does this for integers only. I don't think there is > any question that there is a demand for the feature. Some examples: These applications should use an empty array as initial value. Then the normal array concatenation can be used as transition function. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote: > Joe Conway writes: >>There are many discussions on the lists over the past few years in which >>people have requested this kind of functionality. There is even a >>contrib/intagg that does this for integers only. I don't think there is >>any question that there is a demand for the feature. Some examples: > > These applications should use an empty array as initial value. Then the > normal array concatenation can be used as transition function. How can you create an aggregate using an operator as a transition function? =# CREATE AGGREGATE myagg -# ( (# BASETYPE = text, (# SFUNC = '||', (# STYPE = text, (# INITCOND = '' (# ); ERROR: AggregateCreate: function ||(text, text) does not exist Also (I suppose you could argue this should be fixed, but anyway...), you can't currently add elements to an empty array. regression=# create table t(f float8[]); CREATE TABLE regression=# insert into t values('{}'); INSERT 1865486 1 regression=# update t set f[1] = 1; ERROR: Invalid array subscripts regression=# select array_dims(f) from t; array_dims ------------ (1 row) Joe
Joe Conway wrote: > How can you create an aggregate using an operator as a transition > function? > > =# CREATE AGGREGATE myagg > -# ( > (# BASETYPE = text, > (# SFUNC = '||', > (# STYPE = text, > (# INITCOND = '' > (# ); > ERROR: AggregateCreate: function ||(text, text) does not exist Use the function underlying the operator, in this case textcat. Regards, Andreas
Andreas Pflug wrote: > Joe Conway wrote: > >> How can you create an aggregate using an operator as a transition >> function? >> >> =# CREATE AGGREGATE myagg >> -# ( >> (# BASETYPE = text, >> (# SFUNC = '||', >> (# STYPE = text, >> (# INITCOND = '' >> (# ); >> ERROR: AggregateCreate: function ||(text, text) does not exist > > > Use the function underlying the operator, in this case textcat. > Right, but Peter's desire is to *not document* the underlying function (this was just an example, we're really discussing the new array functions array_accum and/or array_append), and my point is that it needs to be documented if we expect people to use it. Joe
Joe Conway <mail@joeconway.com> writes: > How can you create an aggregate using an operator as a transition function? You'd have to reference the underlying function. I think what Peter is really asking is whether the implementation function for || will serve for this purpose, rather than making an additional one. > Also (I suppose you could argue this should be fixed, but anyway...), > you can't currently add elements to an empty array. That particular manifestation is not relevant; I think you mean this one: regression=# select '{}'::int4[] || 2; ERROR: Arrays greater than one-dimension are not supported In both cases I think it's mostly a matter of defining a defensible behavior, which is probably not that hard, but no one's yet gotten annoyed enough to try to fix it. regards, tom lane
Tom Lane wrote: > You'd have to reference the underlying function. That's what I thought, but I thought maybe I was missing something. > I think what Peter is really asking is whether the implementation > function for || will serve for this purpose, rather than making an > additional one. But, as I said in my other post a moment ago, that then implies (at least to me) that the underlying function needs to be documented as an "end user" function, rather than an "internal" one. > That particular manifestation is not relevant; I think you mean this > one: > > regression=# select '{}'::int4[] || 2; ERROR: Arrays greater than > one-dimension are not supported I just wanted to show that the issue pre-existed the new functions. I think they are both caused by the fact that an empty array has *no* dimensions (clearly the error message in the new array functions could be better). > In both cases I think it's mostly a matter of defining a defensible > behavior, which is probably not that hard, but no one's yet gotten > annoyed enough to try to fix it. It is probably easy enough to work around in the array concatenation functions. Would it be defensible to say that ('{}'::int4[] || 2) should produce ('{2}'::int4[]), i.e. a one-dimensional integer array with one element? Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> In both cases I think it's mostly a matter of defining a defensible >> behavior, which is probably not that hard, but no one's yet gotten >> annoyed enough to try to fix it. > It is probably easy enough to work around in the array concatenation > functions. Would it be defensible to say that ('{}'::int4[] || 2) should > produce ('{2}'::int4[]), i.e. a one-dimensional integer array with one > element? You would also have to assume that the subscript lower bound is one, which doesn't bother me but is an additional bit of state that has to appear out of nowhere. (In the assignment case you don't have to assume that, since the subscript tells you what to do.) regards, tom lane
Tom Lane wrote: > You would also have to assume that the subscript lower bound is one, > which doesn't bother me but is an additional bit of state that has to > appear out of nowhere. (In the assignment case you don't have to assume > that, since the subscript tells you what to do.) I've gotten this working for array concatenation and assignment. Examples: -- empty array concatenated with any element, return one element, -- one-dimensional array, with lower bound set to 1 regression=# select '{}'::int4[] || 1; ?column? ---------- {1} (1 row) regression=# select 0 || '{}'::int4[]; ?column? ---------- {0} (1 row) regression=# select array_dims(0 || '{}'::int4[]); array_dims ------------ [1:1] (1 row) -- empty array concatenated with any non-empty, return the non-empty one regression=# select '{}'::int4[] || array[[[1,2],[3,4]]]; ?column? ----------------- {{{1,2},{3,4}}} (1 row) -- concatenate two empty arrays, return empty array regression=# select '{}'::int4[] || '{}'::int4[]; ?column? ---------- {} (1 row) -- assignment to empty array: determine number -- of dimensions and array subscripts based on those -- given in the assignment statement regression=# create table t(f float8[]); CREATE TABLE regression=# insert into t values('{}'); INSERT 2011035 1 regression=# update t set f[-2:2] = array[1,2,3,4,5]; UPDATE 1 regression=# select * from t; f ------------- {1,2,3,4,5} (1 row) regression=# select array_dims(f) from t; array_dims ------------ [-2:2] (1 row) One question, should this work to create an empty array: regression=# select array[]; ERROR: parser: parse error at or near "]" at character 14 or by analogy to '{}'::int4[] regression=# select array[]::int4[]; ERROR: parser: parse error at or near "]" at character 14 Or is the current '{}'::int4[] syntax all we want/need? Joe
Joe Conway <mail@joeconway.com> writes: > One question, should this work to create an empty array: > regression=# select array[]; > ERROR: parser: parse error at or near "]" at character 14 Only if you can specify what type it has. This seems to get back to our off-list discussion about what to do with array of unknown. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>regression=# select array[]; >>ERROR: parser: parse error at or near "]" at character 14 > > Only if you can specify what type it has. This seems to get back > to our off-list discussion about what to do with array of unknown. > Yeah, without an unknown[] data type, I think a special case hack would be needed even for the array[]::int4[] example, because the coercion to int4[] happens too late. I guess '{}'::arraytype[] will have to do for now. Joe