Thread: join of array
Hello, Is possible merge two arrays like array[1,2,3] + array[4,5,6] => array[1,2,3,4,5,6] select array_append(array[1,2,3], array[2,3]); ERROR: function array_append(integer[], integer[]) does not exist regards Pavel Stehule
Pavel Stehule <stehule@kix.fsv.cvut.cz> writes: > Is possible merge two arrays like > array[1,2,3] + array[4,5,6] => array[1,2,3,4,5,6] I was about to say that || would do it, but I see that's not quite right: regression=# SELECT ARRAY[1,2,3] || ARRAY[4,5,6]; ?column? ------------------- {{1,2,3},{4,5,6}} (1 row) Offhand, I would think that '{1,2,3,4,5,6}' would be what I'd intuitively expect to get from "concatenating" these arrays. Joe, do we really have this implemented per spec? regards, tom lane
Joe Conway <mail@joeconway.com> writes: > Hmmm, it made sense to me, at at least at some point ;-). Here's the > SQL99 guidance (SQL200X doesn't give any more detailed guidance): > 4.11.3.2 Operators that operate on array values and return array values > <array concatenation> is an operation that returns the array value made > by joining its array value operands in the order given. That's about as clear as mud :-( ... but I found a clearer statement in SQL99 6.31: 2) If <array concatenation> is specified, then: a) Let AV1 be the value of <array value expression 1> and let AV2 be the value of <array value expression 2>. b) If either AV1 or AV2 is the null value, then the result of the <array concatenate function> is the null value. c) Otherwise, the result is the array comprising every element of AV1 followed by every element of AV2. (c) seems to be pretty clearly what Pavel wants for the 1-D case, but it's not immediately clear how to apply it to multidimensional arrays. > We also have > ARRAY[1,2] || 3 == '{1,2,3}' > and > ARRAY[[1,2],[3,4]] || ARRAY[5,6] == '{{1,2},{3,4},{5,6}}' > and > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{{1,2},{3,4}},{{1,2},{3,4}}}' > I think the first two still make sense. I guess the third case ought to be: > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{1,2},{3,4},{1,2},{3,4}}' > ? Probably. AFAICS this doesn't affect the data copying at all, only the way in which the result's dimension values are computed, right? Also, we might want to take another look at the rules for selecting the lower-bounds of the result array. In the cases where we're joining N+1-D to N-D (including 1-D to scalar) it still seems to make sense to preserve the subscripts of the higher-dimensional object, so the lower- dimensional one is "pushed" onto one end or the other. In the N-D to N-D case I can't see any really principled way to do it; for lack of a better idea, I suggest preserving the subscripts of the lefthand input (ie, using its lower-bound). regards, tom lane
Tom Lane wrote: > Pavel Stehule <stehule@kix.fsv.cvut.cz> writes: > >>Is possible merge two arrays like >>array[1,2,3] + array[4,5,6] => array[1,2,3,4,5,6] > > > I was about to say that || would do it, but I see that's not quite > right: > > regression=# SELECT ARRAY[1,2,3] || ARRAY[4,5,6]; > ?column? > ------------------- > {{1,2,3},{4,5,6}} > (1 row) > > Offhand, I would think that '{1,2,3,4,5,6}' would be what I'd > intuitively expect to get from "concatenating" these arrays. > Joe, do we really have this implemented per spec? > Hmmm, it made sense to me, at at least at some point ;-). Here's the SQL99 guidance (SQL200X doesn't give any more detailed guidance): 4.11.3.2 Operators that operate on array values and return array values <array concatenation> is an operation that returns the array value made by joining its array value operands in the order given. So I guess it ought to be changed. We also have ARRAY[1,2] || 3 == '{1,2,3}' and ARRAY[[1,2],[3,4]] || ARRAY[5,6] == '{{1,2},{3,4},{5,6}}' and ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == '{{{1,2},{3,4}},{{1,2},{3,4}}}' I think the first two still make sense. I guess the third case ought to be: ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == '{{1,2},{3,4},{1,2},{3,4}}' ? If this sounds good, I'll work on a patch for the behavior as well as the docs. Joe
> > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{{1,2},{3,4}},{{1,2},{3,4}}}' > > I think the first two still make sense. I guess the third case ought to be: > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{1,2},{3,4},{1,2},{3,4}}' > ? I do not think this is right. I think the current behaviour is right. You are effectively dereferencing or flattening the second array which changes the definition of the second object. The ability to do the dereference/flattening is useful, but it is not the || operator. How about |* which would flatten 1 level? Of course, that begs the question of what about n levels and I'm not sure about that. --elein On Fri, Aug 15, 2003 at 08:34:14AM -0700, Joe Conway wrote: > Tom Lane wrote: > >Pavel Stehule <stehule@kix.fsv.cvut.cz> writes: > > > >>Is possible merge two arrays like > >>array[1,2,3] + array[4,5,6] => array[1,2,3,4,5,6] > > > > > >I was about to say that || would do it, but I see that's not quite > >right: > > > >regression=# SELECT ARRAY[1,2,3] || ARRAY[4,5,6]; > > ?column? > >------------------- > > {{1,2,3},{4,5,6}} > >(1 row) > > > >Offhand, I would think that '{1,2,3,4,5,6}' would be what I'd > >intuitively expect to get from "concatenating" these arrays. > >Joe, do we really have this implemented per spec? > > > > Hmmm, it made sense to me, at at least at some point ;-). Here's the > SQL99 guidance (SQL200X doesn't give any more detailed guidance): > > 4.11.3.2 Operators that operate on array values and return array values > <array concatenation> is an operation that returns the array value made > by joining its array value operands in the order given. > > So I guess it ought to be changed. > > We also have > ARRAY[1,2] || 3 == '{1,2,3}' > and > ARRAY[[1,2],[3,4]] || ARRAY[5,6] == '{{1,2},{3,4},{5,6}}' > and > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{{1,2},{3,4}},{{1,2},{3,4}}}' > > I think the first two still make sense. I guess the third case ought to be: > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{1,2},{3,4},{1,2},{3,4}}' > ? > > If this sounds good, I'll work on a patch for the behavior as well as > the docs. > > Joe > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
elein wrote: >> ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == >> '{{{1,2},{3,4}},{{1,2},{3,4}}}' >> >>I think the first two still make sense. I guess the third case ought to be: >> ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == >> '{{1,2},{3,4},{1,2},{3,4}}' >>? > > I do not think this is right. I think the current behaviour > is right. You are effectively dereferencing or flattening > the second array which changes the definition of the second > object. It makes sense in analogy to ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' In the case of, e.g. ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8]], '{1,2}', '{3,4}', '{5,6}', and '{7,8}' are "elements" of the higher level array, just like 1, 2, 3, & 4 are elements of '{1,2,3,4}' Joe
I'd like to summarize what I know (or don't know) since this topic has been hit around a little and I'm new to this. I'm hoping it will clear things up, at least for me. You are all the experts, I want to make sure I am singing from the same page. data sample: id | fm | ls | addr | city | st | z |c| start|end ---------------------------------------------------------------------------------- 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm Street,Yountville,CA,94599,5,062001,082009 339111C,Elma Thelma,Velma,98 Oak Lane,St. Louis,MO,63119-2065,,, What I wanted to do was to import lots of these from a text file. In the case where there is an empty string (i.e. no value after a comma) I wanted to define the column in the table in a way that would accept the empty string but replace it with the default value for that column. I didn't know that the copy command is just some C code that stuffs the data into the db ala fois grois. What I would really benefit from (and I hope some other new soul would too) is if someone would outline exactly how they would approach this problem. Maybe provide the correct table definition and the copy command. Or if that just won't work an alternate approach. I realize that some of you have done this partially but there have been too many replies to get into a single cohesive instruction. Anyway I suppose my initial frustration in trying to do this may have blinded me from reason. create table contact ( id character(7) NOT NULL, fm character(30) DEFAULT 'xzxzxzxz', ls character(30) DEFAULT 'xzxzxzxz', addr character(30) DEFAULT '123 xzxzxzxz', city character(25) DEFAULT 'xzxzxzxz', st character(2) DEFAULT 'xz', c character(1) DEFAULT 'x', start decimal(6) DEFAULT 122038, end decimal(6) DEFAULT 122038, CONSTRAINT handle PRIMARY KEY (id) ) WITHOUT OIDS;
you said we had: > We also have > ARRAY[1,2] || 3 == '{1,2,3}' > and > ARRAY[[1,2],[3,4]] || ARRAY[5,6] == '{{1,2},{3,4},{5,6}}' > and > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{{1,2},{3,4}},{{1,2},{3,4}}}' > > I think the first two still make sense. I guess the third case ought to be: > ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > '{{1,2},{3,4},{1,2},{3,4}}' The second case above makes this case wrong. Or vice versa. It is dereferencing the right operand in one case and not in the other. How exactly do you tell the difference? The only difference I can tell is the depth of the left hand array. This is confusing and unnecessarily complex. > It makes sense in analogy to > ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' This case should be '{1,2,{3,4}}'. Unless we have a flattening operator or dereferencing function. Or a policy of always flattening the right operand one level (confusing!). Consistent treatment of the operands is important. Treating object (arrays) as they were defined makes the nesting and usage easier. Not to mention the implementation. (Good problem :-) Elein On Fri, Aug 15, 2003 at 10:06:36AM -0700, Joe Conway wrote: > elein wrote: > >> ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > >> '{{{1,2},{3,4}},{{1,2},{3,4}}}' > >> > >>I think the first two still make sense. I guess the third case ought to > >>be: > >> ARRAY[[1,2],[3,4]] || ARRAY[[1,2],[3,4]] == > >> '{{1,2},{3,4},{1,2},{3,4}}' > >>? > > > >I do not think this is right. I think the current behaviour > >is right. You are effectively dereferencing or flattening > >the second array which changes the definition of the second > >object. > > It makes sense in analogy to > ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' > > In the case of, e.g. ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8]], > '{1,2}', '{3,4}', '{5,6}', and '{7,8}' are "elements" of the higher > level array, just like 1, 2, 3, & 4 are elements of '{1,2,3,4}' > > Joe >
elein wrote: > you said we had: > >>We also have ^^^^ There are two variants each of two cases. The first case is what started this discussion. The newest reading of the SQL99 spec says that we *must* do this: 1a) ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' Quoting the paragraph provided by Tom: "c) Otherwise, the result is the array comprising every element of AV1 followed by every element of AV2." The variant is that when you have an "array of arrays", i.e. a multidimensional array (which Peter E pointed out earlier is part of SQL99 too), the spec wording implies that we also *must* do this: 1b) ARRAY[[1],[2]] || ARRAY[[3],[4]] == '{{1},{2},{3},{4}' The second case is not directly addressed by the spec as far as I can see, i.e. it is a Postgres extension. That is: 2a) ARRAY[1,2] || 3 == '{1,2,3}' So by analogy the multidimensional variant is: 2b) ARRAY[[1],[2]] || ARRAY[3] == '{{1},{2},{3}}' Cases 1a and 1b are currently wrong according to the spec., and that's the change we've been discussing. Cases 2a and 2b currently work as shown and are correct IMHO (although Tom pointed out a lower bound index issue that I'll address in my response to him). Does this help? Joe
Tom Lane wrote: > That's about as clear as mud :-( ... but I found a clearer statement > in SQL99 6.31: > > 2) If <array concatenation> is specified, then: > > a) Let AV1 be the value of <array value expression 1> and let AV2 be > the value of <array value expression 2>. > > b) If either AV1 or AV2 is the null value, then the result of the > <array concatenate function> is the null value. > > c) Otherwise, the result is the array comprising every element of AV1 > followed by every element of AV2. > > (c) seems to be pretty clearly what Pavel wants for the 1-D case, but > it's not immediately clear how to apply it to multidimensional > arrays. > Thanks -- I found the corresponding paragraph in SQL200x (6.35) and it pretty much reads the same. > Probably. AFAICS this doesn't affect the data copying at all, only > the way in which the result's dimension values are computed, right? Looks that way to me. > Also, we might want to take another look at the rules for selecting > the lower-bounds of the result array. In the cases where we're > joining N+1-D to N-D (including 1-D to scalar) it still seems to make > sense to preserve the subscripts of the higher-dimensional object, so > the lower- dimensional one is "pushed" onto one end or the other. This is mostly the way it currently works: regression=# create table arr(f1 int[]); CREATE TABLE regression=# insert into arr values ('{}'); INSERT 2498103 1 regression=# update arr set f1[-2] = 1; UPDATE 1 regression=# select array_lower(f1,1) from arr; array_lower ------------- -2 (1 row) regression=# select array_lower(f1 || 2, 1) from arr; array_lower ------------- -2 (1 row) regression=# select array_lower(0 || f1, 1) from arr; array_lower ------------- -3 (1 row) regression=# update arr set f1 = ARRAY[[1,2],[3,4]]; UPDATE 1 regression=# select array_lower(f1,1) from arr; array_lower ------------- 1 (1 row) regression=# select array_lower(f1 || ARRAY[5,6], 1) from arr; array_lower ------------- 1 (1 row) regression=# select array_lower(ARRAY[-1,0] || f1, 1) from arr; array_lower ------------- 1 (1 row) It looks like the only "wrong" case is the last one. Will fix. > In the N-D to N-D case I can't see any really principled way to do > it; for lack of a better idea, I suggest preserving the subscripts of > the lefthand input (ie, using its lower-bound). OK, will do. Thanks, Joe
Joe Conway <mail@joeconway.com> writes: > elein wrote: >> I do not think this is right. I think the current behaviour >> is right. You are effectively dereferencing or flattening >> the second array which changes the definition of the second >> object. > It makes sense in analogy to > ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' I agree with Joe. The spec is quite clear about what to do in the one-dimensional case: the original arrays lose their separate identity. In the multi-dimensional case, they should lose their separate identities in the outermost dimension. I believe the behavior Elein wants can be had by writing ARRAY[ n_d_array , n_d_array ] (Joe, would you confirm that's true, and document it? I don't think either section 8.10 or section 4.2.8 makes clear that you can build arrays from smaller array values rather than just scalars.) As long as we have that alternative, it's not necessary that concatenation do the same thing. Another argument for doing it this way is that it makes array concatenation associative, which is a nice bit of symmetry. Consider ARRAY[[1,1],[2,2]] || ARRAY[[3,3],[4,4]] || ARRAY[[5,5],[6,6]] || ARRAY[[7,7],[8,8]] Right now, with the default left-to-right association you get {{{1,1},{2,2}},{{3,3},{4,4}},{{5,5},{6,6}},{{7,7},{8,8}}} but if you parenthesize it differently you can get a different answer: regression=# select (ARRAY[[1,1],[2,2]] || ARRAY[[3,3],[4,4]]) || (ARRAY[[5,5],[6,6]] || ARRAY[[7,7],[8,8]]); ?column? --------------------------------------------------------------- {{{{1,1},{2,2}},{{3,3},{4,4}}},{{{5,5},{6,6}},{{7,7},{8,8}}}} (1 row) With the flattening approach all the intermediate results will remain 2-D arrays and so you get the same answer for all parenthesizations, namely {{1,1},{2,2},{3,3},{4,4},{5,5},{6,6},{7,7},{8,8}}. That strikes me as more nearly the intuitive meaning of "concatenation" than what we've got now. (Cases involving some N-D and some N+1-D inputs remain non-associative, though, which is a tad annoying. Maybe it's okay seeing that the inputs are of different kinds.) regards, tom lane
On Fri, 2003-08-15 at 12:41, Joe Conway wrote: > Tom Lane wrote: > > That's about as clear as mud :-( ... but I found a clearer statement > > in SQL99 6.31: > > > > 2) If <array concatenation> is specified, then: > > > > a) Let AV1 be the value of <array value expression 1> and let AV2 be > > the value of <array value expression 2>. > > > > b) If either AV1 or AV2 is the null value, then the result of the > > <array concatenate function> is the null value. > > > > c) Otherwise, the result is the array comprising every element of AV1 > > followed by every element of AV2. > > > > (c) seems to be pretty clearly what Pavel wants for the 1-D case, but > > it's not immediately clear how to apply it to multidimensional > > arrays. > > > > Thanks -- I found the corresponding paragraph in SQL200x (6.35) and it > pretty much reads the same. Why are arrays even mentioned in the the same breath wrt relations DBMSs? Aren't they an anathema to all we know and love? -- +---------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "Man, I'm pretty. Hoo Hah!" | | Johnny Bravo | +---------------------------------------------------------------+
I guess I am arguing against the spec. :-) But given the spec... The spec is consistent in that it seems to dereference the right operand one level. However, that would still make 2b inconsistent in the evaluation of the right operand. > 1a) ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' > 1b) ARRAY[[1],[2]] || ARRAY[[3],[4]] == '{{1},{2},{3},{4}}' and ARRAY[1,2] || ARRAY[[3],[4]] == '{1,2,{3},{4}}' > So by analogy the multidimensional variant is: > 2b) ARRAY[[1],[2]] || ARRAY[3] == '{{1},{2},{3}}' I would think this would be '{{1},{2}, 3}}' and ARRAY[1,2] || ARRAY[[3],[4]] == '{1,2,{3},{4}}' I do see the analogy you are making. But I respectfully disagree (with the spec ?) that the type/structure of the left operand should be taken into account when evaluating the right operand. elein On Fri, Aug 15, 2003 at 10:36:54AM -0700, Joe Conway wrote: > elein wrote: > >you said we had: > > > >>We also have > ^^^^ > > There are two variants each of two cases. The first case is what started > this discussion. The newest reading of the SQL99 spec says that we > *must* do this: > 1a) ARRAY[1,2] || ARRAY[3,4] == '{1,2,3,4}' > > Quoting the paragraph provided by Tom: > "c) Otherwise, the result is the array comprising every element > of AV1 followed by every element of AV2." > > The variant is that when you have an "array of arrays", i.e. a > multidimensional array (which Peter E pointed out earlier is part of > SQL99 too), the spec wording implies that we also *must* do this: > 1b) ARRAY[[1],[2]] || ARRAY[[3],[4]] == '{{1},{2},{3},{4}' > > > The second case is not directly addressed by the spec as far as I can > see, i.e. it is a Postgres extension. That is: > 2a) ARRAY[1,2] || 3 == '{1,2,3}' > > So by analogy the multidimensional variant is: > 2b) ARRAY[[1],[2]] || ARRAY[3] == '{{1},{2},{3}}' > > Cases 1a and 1b are currently wrong according to the spec., and that's > the change we've been discussing. Cases 2a and 2b currently work as > shown and are correct IMHO (although Tom pointed out a lower bound index > issue that I'll address in my response to him). > > Does this help? > > Joe > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Tom Lane wrote: > I believe the behavior Elein wants can be had by writing > ARRAY[ n_d_array , n_d_array ] > (Joe, would you confirm that's true, and document it? I don't think > either section 8.10 or section 4.2.8 makes clear that you can build > arrays from smaller array values rather than just scalars.) As long as > we have that alternative, it's not necessary that concatenation do the > same thing. Well this works: regression=# select ARRAY[ARRAY[[1,2],[3,4]],ARRAY[[5,6],[7,8]]]; array ------------------------------- {{{1,2},{3,4}},{{5,6},{7,8}}} (1 row) But I was disappointed that this doesn't: regression=# select ARRAY['{{1,2},{3,4}}'::int[],'{{5,6},{7,8}}'::int[]]; ERROR: multidimensional ARRAY[] must be built from nested array expressions Nor does this: create table arr(f1 int[], f2 int[]); insert into arr values (ARRAY[[1,2],[3,4]],ARRAY[[5,6],[7,8]]); regression=# select ARRAY[f1,f2] from arr; ERROR: multidimensional ARRAY[] must be built from nested array expressions It does work for the element to array case: create table els(f1 int, f2 int); insert into els values (1,2); regression=# select ARRAY[f1,f2] from els; array ------- {1,2} (1 row) Should I try to make the second and third cases work? Joe
Joe Conway <mail@joeconway.com> writes: > But I was disappointed that this doesn't: > regression=# select ARRAY['{{1,2},{3,4}}'::int[],'{{5,6},{7,8}}'::int[]]; > ERROR: multidimensional ARRAY[] must be built from nested array expressions Drat, I was assuming that that *would* work. > Should I try to make the second and third cases work? Could you look at how big a change it'd be, anyway? Offhand I think it may just mean that the subscript-checking done in parse_expr.c needs to be done at runtime instead. Remember parse_expr should only be concerned about determining datatype, and for its purposes all arrays of a given element type are the same --- subscript checking should happen at runtime. (It seems likely that having an ndims field in ArrayExpr is inappropriate.) regards, tom lane
PostgreSQL is an ORDBMS, not just an RDBMS. A column holds a type of value. Any kind. The structure and operands define the type. The data defines the value. This holds true for simple types like an integer or complex types like an array. The database data is relatively "type blind" in an ORDBMS. It uses the standard overloaded operands to determine the type of function to perform for all of the usual RDBMS utilities. Constraints, triggers, sorting, etc. all apply. That's what the ORDBMS stuff can give you. Arrays are a natural extension. Arrays don't necessarily imply denormalization. It depends on how you use them. The same rule applies for integers. elein On Fri, Aug 15, 2003 at 01:13:52PM -0500, Ron Johnson wrote: > > Why are arrays even mentioned in the the same breath wrt relations > DBMSs? Aren't they an anathema to all we know and love? > > -- > +---------------------------------------------------------------+ > | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | > | Jefferson, LA USA | > | | > | "Man, I'm pretty. Hoo Hah!" | > | Johnny Bravo | > +---------------------------------------------------------------+ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
On Friday 15 August 2003 01:13 pm, Ron Johnson wrote: > On Fri, 2003-08-15 at 12:41, Joe Conway wrote: > > Tom Lane wrote: > > > That's about as clear as mud :-( ... but I found a clearer statement > > > in SQL99 6.31: > > > > > > 2) If <array concatenation> is specified, then: > > > > > > a) Let AV1 be the value of <array value expression 1> and let AV2 be > > > the value of <array value expression 2>. > > > > > > b) If either AV1 or AV2 is the null value, then the result of the > > > <array concatenate function> is the null value. > > > > > > c) Otherwise, the result is the array comprising every element of AV1 > > > followed by every element of AV2. > > > > > > (c) seems to be pretty clearly what Pavel wants for the 1-D case, but > > > it's not immediately clear how to apply it to multidimensional > > > arrays. > > > > Thanks -- I found the corresponding paragraph in SQL200x (6.35) and it > > pretty much reads the same. > > Why are arrays even mentioned in the the same breath wrt relations > DBMSs? Aren't they an anathema to all we know and love? This gives rise to a couple of good questions: When and why would you want to use arrays instead of a relational model? Thanks, Andrew Gould
Hi, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > >>But I was disappointed that this doesn't: > > >>regression=# select ARRAY['{{1,2},{3,4}}'::int[],'{{5,6},{7,8}}'::int[]]; >>ERROR: multidimensional ARRAY[] must be built from nested array expressions > > > Drat, I was assuming that that *would* work. > > >>Should I try to make the second and third cases work? > > > Could you look at how big a change it'd be, anyway? Offhand I think it > may just mean that the subscript-checking done in parse_expr.c needs to > be done at runtime instead. Remember parse_expr should only be > concerned about determining datatype, and for its purposes all arrays of > a given element type are the same --- subscript checking should happen > at runtime. (It seems likely that having an ndims field in ArrayExpr > is inappropriate.) Wouldn't it be a good idea to just extend the partner arrays? Say if we concenate array A(Na,..,Xa) || B(Nb,...,Xb) The resulting array C would be of dimension C(Na+Nb,max(Oa,Ob),max(Pa,Pb), ... max(Xa,Xb)) So concenation would be an extending and right hand appending (at first level) Regards Tino Wildenhain
On Fri, 2003-08-15 at 13:32, elein wrote: > PostgreSQL is an ORDBMS, not just an RDBMS. But y'all are talking about the SQL standard here. > A column holds a type of value. Any kind. The > structure and operands define the type. The data > defines the value. This holds true for simple types > like an integer or complex types like an array. > > The database data is relatively "type blind" in an > ORDBMS. It uses the standard overloaded operands > to determine the type of function to perform for > all of the usual RDBMS utilities. > Constraints, triggers, sorting, etc. all apply. > > That's what the ORDBMS stuff can give you. > Arrays are a natural extension. > > Arrays don't necessarily imply denormalization. > It depends on how you use them. The same rule > applies for integers. I dunno 'bout that... > elein > > On Fri, Aug 15, 2003 at 01:13:52PM -0500, Ron Johnson wrote: > > > > Why are arrays even mentioned in the the same breath wrt relations > > DBMSs? Aren't they an anathema to all we know and love? -- +---------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "Man, I'm pretty. Hoo Hah!" | | Johnny Bravo | +---------------------------------------------------------------+
In response to both Andrew Gould and Ron Johnson... If arrays are not natural in the organization of your data, don't use them. That is the guideline. If the array defines something specific they are very natural. The confusion could be that arrays are abstract types. Specific implementations which use arrays might be clearer. For example, a definition of a polygon is an array of Points. Points, themselves are an array. (The actual postgreSQL implementation of polygons and points doesn't use the newer cleaner array abstraction, I think. But if I were reimplementing them, I would build on top of the new array capabilities. The point is to show an array structured object which makes sense in context.) Of course you can denomalize via arrays, but it tends to make things harder for you. And I believe the same thing is true for denormalized integer columns. elein ============================================================= elein@varlena.com www.varlena.com PostgreSQL Consulting & Support PostgreSQL General Bits http://www.varlena.com/GeneralBits/ ============================================================= "Free your mind the rest will follow" -- En Vogue On Fri, Aug 15, 2003 at 02:20:18PM -0500, Ron Johnson wrote: > On Fri, 2003-08-15 at 13:32, elein wrote: > > PostgreSQL is an ORDBMS, not just an RDBMS. > > But y'all are talking about the SQL standard here. > > > A column holds a type of value. Any kind. The > > structure and operands define the type. The data > > defines the value. This holds true for simple types > > like an integer or complex types like an array. > > > > The database data is relatively "type blind" in an > > ORDBMS. It uses the standard overloaded operands > > to determine the type of function to perform for > > all of the usual RDBMS utilities. > > Constraints, triggers, sorting, etc. all apply. > > > > That's what the ORDBMS stuff can give you. > > Arrays are a natural extension. > > > > Arrays don't necessarily imply denormalization. > > It depends on how you use them. The same rule > > applies for integers. > > I dunno 'bout that... > > > elein > > > > On Fri, Aug 15, 2003 at 01:13:52PM -0500, Ron Johnson wrote: > > > > > > Why are arrays even mentioned in the the same breath wrt relations > > > DBMSs? Aren't they an anathema to all we know and love? > > -- > +---------------------------------------------------------------+ > | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | > | Jefferson, LA USA | > | | > | "Man, I'm pretty. Hoo Hah!" | > | Johnny Bravo | > +---------------------------------------------------------------+ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
On Friday 15 August 2003 02:56 pm, elein wrote: > In response to both Andrew Gould and Ron Johnson... > > If arrays are not natural in the organization of > your data, don't use them. That is the guideline. > > If the array defines something specific they are > very natural. The confusion could be that arrays > are abstract types. > > Specific implementations which use arrays might > be clearer. For example, a definition of a polygon > is an array of Points. Points, themselves are an > array. > > (The actual postgreSQL implementation of polygons and points > doesn't use the newer cleaner array abstraction, I think. > But if I were reimplementing them, I would build on > top of the new array capabilities. The point is to show > an array structured object which makes sense in context.) > > Of course you can denomalize via arrays, but it tends > to make things harder for you. And I believe the > same thing is true for denormalized integer columns. > > elein > ============================================================= > elein@varlena.com www.varlena.com Thanks, Elein. The polygon example makes it clearer. In the books I have here, the examples show how to use arrays but they use data that I would move to another table. Best regards, Andrew
On Fri, 2003-08-15 at 15:36, Andrew L. Gould wrote: > On Friday 15 August 2003 02:56 pm, elein wrote: > > In response to both Andrew Gould and Ron Johnson... > > > > If arrays are not natural in the organization of > > your data, don't use them. That is the guideline. > > > > If the array defines something specific they are > > very natural. The confusion could be that arrays > > are abstract types. > > > > Specific implementations which use arrays might > > be clearer. For example, a definition of a polygon > > is an array of Points. Points, themselves are an > > array. > > > > (The actual postgreSQL implementation of polygons and points > > doesn't use the newer cleaner array abstraction, I think. > > But if I were reimplementing them, I would build on > > top of the new array capabilities. The point is to show > > an array structured object which makes sense in context.) > > > > Of course you can denomalize via arrays, but it tends > > to make things harder for you. And I believe the > > same thing is true for denormalized integer columns. > > > > elein > > ============================================================= > > elein@varlena.com www.varlena.com > > Thanks, Elein. The polygon example makes it clearer. In the books I have > here, the examples show how to use arrays but they use data that I would move > to another table. This is what makes me nervous about db arrays: the tendency for denormalization. -- +---------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "Man, I'm pretty. Hoo Hah!" | | Johnny Bravo | +---------------------------------------------------------------+
create table import_contact ( id character(7) not null primary key, fm character(30), ls character(30), addr character(30), city character(25), st character(2), c character(1), start decimal(6), end decimal(6), ) WITHOUT OIDS; cat datafile.dat | psql -dthedatabase -c "copy import_contact from stdin delimiter ',' null ''"; echo "insert into contact select id, case fm when null then 'xzxzxzxz' else fm end, case ls when null then 'xzxzxzxz' else ls end, case addr when null then '123 xzxzxzxz' else addr end, case city when null then 'xzxzxzxz' else city end, case st when null then 'xz' else st end, case c when null then 'x' else c end, case start when null then 122038 else start, case end when null then 122038 else end from import_contact;" | psql -dthedatabase Could be one way although it's not atomic. Can rewrite the copy command to be a copy from file command to do that and use the \i command (or redirect to psql from file/stdin). Simple but there are many other methods to get this thing to work. If you don't want to recreate the defaults everytime then you could have subselects that reference the pg system tables extract the default value for the columns you are looking for. Also could create the insert statements with a script on the outside or replace any blank (null in reality) fields with the default value and copy that straight to the table. On Sat, 16 Aug 2003 03:18 am, expect wrote: > I'd like to summarize what I know (or don't know) since this topic has been > hit around a little and I'm new to this. I'm hoping it will clear things > up, at least for me. You are all the experts, I want to make sure I am > singing from the same page. > > data sample: > id | fm | ls | addr | city | st | z |c| > start|end > --------------------------------------------------------------------------- >------- > > 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm > Street,Yountville,CA,94599,5,062001,082009 339111C,Elma Thelma,Velma,98 Oak > Lane,St. Louis,MO,63119-2065,,, > > > What I wanted to do was to import lots of these from a text file. In the > case where there is an empty string (i.e. no value after a comma) I wanted > to define the column in the table in a way that would accept the empty > string but replace it with the default value for that column. I didn't > know that the copy command is just some C code that stuffs the data into > the db ala fois grois. > > What I would really benefit from (and I hope some other new soul would too) > is if someone would outline exactly how they would approach this problem. > > Maybe provide the correct table definition and the copy command. Or if > that just won't work an alternate approach. I realize that some of you > have done this partially but there have been too many replies to get into a > single cohesive instruction. > > > Anyway I suppose my initial frustration in trying to do this may have > blinded me from reason. > > > create table contact ( > id character(7) NOT NULL, > fm character(30) DEFAULT 'xzxzxzxz', > ls character(30) DEFAULT 'xzxzxzxz', > addr character(30) DEFAULT '123 xzxzxzxz', > city character(25) DEFAULT 'xzxzxzxz', > st character(2) DEFAULT 'xz', > c character(1) DEFAULT 'x', > start decimal(6) DEFAULT 122038, > end decimal(6) DEFAULT 122038, > CONSTRAINT handle PRIMARY KEY (id) > ) WITHOUT OIDS; > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Sat, 16 Aug 2003 09:33:51 +1000 Jason Godden <jasongodden@optushome.com.au> wrote: Ahh, thanks for this. And thanks to all the others that helped me on my way. Hopefully I'll be able to give something back to the group. Although that might be hard with all the experts here. Perhaps I can document this and provide it for public consumption. > create table import_contact ( > id character(7) not null primary key, > fm character(30), > ls character(30), > addr character(30), > city character(25), > st character(2), > c character(1), > start decimal(6), > end decimal(6), > ) WITHOUT OIDS; > > cat datafile.dat | psql -dthedatabase -c "copy import_contact from stdin > delimiter ',' null ''"; > > echo "insert into contact select > id, > case fm when null then 'xzxzxzxz' else fm end, > case ls when null then 'xzxzxzxz' else ls end, > case addr when null then '123 xzxzxzxz' else addr end, > case city when null then 'xzxzxzxz' else city end, > case st when null then 'xz' else st end, > case c when null then 'x' else c end, > case start when null then 122038 else start, > case end when null then 122038 else end > from import_contact;" | psql -dthedatabase > > Could be one way although it's not atomic. Can rewrite the copy command to be > a copy from file command to do that and use the \i command (or redirect to > psql from file/stdin). Simple but there are many other methods to get this > thing to work. If you don't want to recreate the defaults everytime then you > could have subselects that reference the pg system tables extract the default > value for the columns you are looking for. > > Also could create the insert statements with a script on the outside or > replace any blank (null in reality) fields with the default value and copy > that straight to the table. > > On Sat, 16 Aug 2003 03:18 am, expect wrote: > > I'd like to summarize what I know (or don't know) since this topic has been > > hit around a little and I'm new to this. I'm hoping it will clear things > > up, at least for me. You are all the experts, I want to make sure I am > > singing from the same page. > > > > data sample: > > id | fm | ls | addr | city | st | z |c| > > start|end > > --------------------------------------------------------------------------- > >------- > > > > 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm > > Street,Yountville,CA,94599,5,062001,082009 339111C,Elma Thelma,Velma,98 Oak > > Lane,St. Louis,MO,63119-2065,,, > > > > > > What I wanted to do was to import lots of these from a text file. In the > > case where there is an empty string (i.e. no value after a comma) I wanted > > to define the column in the table in a way that would accept the empty > > string but replace it with the default value for that column. I didn't > > know that the copy command is just some C code that stuffs the data into > > the db ala fois grois. > > > > What I would really benefit from (and I hope some other new soul would too) > > is if someone would outline exactly how they would approach this problem. > > > > Maybe provide the correct table definition and the copy command. Or if > > that just won't work an alternate approach. I realize that some of you > > have done this partially but there have been too many replies to get into a > > single cohesive instruction. > > > > > > Anyway I suppose my initial frustration in trying to do this may have > > blinded me from reason. > > > > > > create table contact ( > > id character(7) NOT NULL, > > fm character(30) DEFAULT 'xzxzxzxz', > > ls character(30) DEFAULT 'xzxzxzxz', > > addr character(30) DEFAULT '123 xzxzxzxz', > > city character(25) DEFAULT 'xzxzxzxz', > > st character(2) DEFAULT 'xz', > > c character(1) DEFAULT 'x', > > start decimal(6) DEFAULT 122038, > > end decimal(6) DEFAULT 122038, > > CONSTRAINT handle PRIMARY KEY (id) > > ) WITHOUT OIDS; > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > >
Just a note on that example: I didn't properly end the case commands on the last two fields in the insert and end should probably be quoted. No I haven't tested it. Should be: echo "insert into contact select id, case fm when null then 'xzxzxzxz' else fm end, case ls when null then 'xzxzxzxz' else ls end, case addr when null then '123 xzxzxzxz' else addr end, case city when null then 'xzxzxzxz' else city end, case st when null then 'xz' else st end, case c when null then 'x' else c end, case start when null then 122038 else start end, case "end" when null then 122038 else "end" end from import_contact;" | psql -dthedatabase Rgds, Jason On Sat, 16 Aug 2003 01:10 pm, expect wrote: > On Sat, 16 Aug 2003 09:33:51 +1000 > Jason Godden <jasongodden@optushome.com.au> wrote: > > > Ahh, thanks for this. And thanks to all the others that helped me on my > way. Hopefully I'll be able to give something back to the group. Although > that might be hard with all the experts here. > > Perhaps I can document this and provide it for public consumption. > > > create table import_contact ( > > id character(7) not null primary key, > > fm character(30), > > ls character(30), > > addr character(30), > > city character(25), > > st character(2), > > c character(1), > > start decimal(6), > > end decimal(6), > > ) WITHOUT OIDS; > > > > cat datafile.dat | psql -dthedatabase -c "copy import_contact from stdin > > delimiter ',' null ''"; > > > > echo "insert into contact select > > id, > > case fm when null then 'xzxzxzxz' else fm end, > > case ls when null then 'xzxzxzxz' else ls end, > > case addr when null then '123 xzxzxzxz' else addr end, > > case city when null then 'xzxzxzxz' else city end, > > case st when null then 'xz' else st end, > > case c when null then 'x' else c end, > > case start when null then 122038 else start, > > case end when null then 122038 else end > > from import_contact;" | psql -dthedatabase > > > > Could be one way although it's not atomic. Can rewrite the copy command > > to be a copy from file command to do that and use the \i command (or > > redirect to psql from file/stdin). Simple but there are many other > > methods to get this thing to work. If you don't want to recreate the > > defaults everytime then you could have subselects that reference the pg > > system tables extract the default value for the columns you are looking > > for. > > > > Also could create the insert statements with a script on the outside or > > replace any blank (null in reality) fields with the default value and > > copy that straight to the table. > > > > On Sat, 16 Aug 2003 03:18 am, expect wrote: > > > I'd like to summarize what I know (or don't know) since this topic has > > > been hit around a little and I'm new to this. I'm hoping it will clear > > > things up, at least for me. You are all the experts, I want to make > > > sure I am singing from the same page. > > > > > > data sample: > > > id | fm | ls | addr | city | st | z |c| > > > start|end > > > ----------------------------------------------------------------------- > > >---- ------- > > > > > > 191922C,Bob Cobb,D'Obbalina Sr.,312 Elm > > > Street,Yountville,CA,94599,5,062001,082009 339111C,Elma Thelma,Velma,98 > > > Oak Lane,St. Louis,MO,63119-2065,,, > > > > > > > > > What I wanted to do was to import lots of these from a text file. In > > > the case where there is an empty string (i.e. no value after a comma) > > > I wanted to define the column in the table in a way that would accept > > > the empty string but replace it with the default value for that column. > > > I didn't know that the copy command is just some C code that stuffs > > > the data into the db ala fois grois. > > > > > > What I would really benefit from (and I hope some other new soul would > > > too) is if someone would outline exactly how they would approach this > > > problem. > > > > > > Maybe provide the correct table definition and the copy command. Or if > > > that just won't work an alternate approach. I realize that some of you > > > have done this partially but there have been too many replies to get > > > into a single cohesive instruction. > > > > > > > > > Anyway I suppose my initial frustration in trying to do this may have > > > blinded me from reason. > > > > > > > > > create table contact ( > > > id character(7) NOT NULL, > > > fm character(30) DEFAULT 'xzxzxzxz', > > > ls character(30) DEFAULT 'xzxzxzxz', > > > addr character(30) DEFAULT '123 xzxzxzxz', > > > city character(25) DEFAULT 'xzxzxzxz', > > > st character(2) DEFAULT 'xz', > > > c character(1) DEFAULT 'x', > > > start decimal(6) DEFAULT 122038, > > > end decimal(6) DEFAULT 122038, > > > CONSTRAINT handle PRIMARY KEY (id) > > > ) WITHOUT OIDS; > > > > > > > > > ---------------------------(end of > > > broadcast)--------------------------- TIP 2: you can get off all lists > > > at once with the unregister command (send "unregister > > > YourEmailAddressHere" to majordomo@postgresql.org) > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Fri, Aug 15, 2003 at 01:37:50PM -0500 I heard the voice of Andrew L. Gould, and lo! it spake thus: > On Friday 15 August 2003 01:13 pm, Ron Johnson wrote: > > > > Why are arrays even mentioned in the the same breath wrt relations > > DBMSs? Aren't they an anathema to all we know and love? > > This gives rise to a couple of good questions: > > When and why would you want to use arrays instead of a relational model? When it's appropriate 8-} I've found it to be extremely rare, but it DOES happen. For instance, in one project, I needed a bit of data in a record, which could store between 1 and 7 (integer) values in it. Never less than 1, never more than 7, and the ordering of them was essential to preserve. I *COULD* do it in another table, with an order column (which would have to be maintained 'manually' in the code), and, if one were neurotic, a seperate constraint to keep from somehow getting too many records for each item in the other table.... but it was far easier to just slap it in an array. Now, any time ordering isn't important, OR a field can grow without realistic bounds, arrays naturally aren't important. For instance, if I'd needed 15 entries, I might well have gone ahead and referenced another table, just because it would get too irritating dealing with the arrays. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ "The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet"