Thread: MULTISET patch
Hello, I have a free time and I can do a review of your patch. Please, can send a last version and can send a links on documentation that you used? Regards Pavel Stehule
On Mon, Dec 27, 2010 at 02:09, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I have a free time and I can do a review of your patch. Please, can > send a last version and can send a links on documentation that you > used? Thanks! The latest patch attached. I've not written documentation yet, but I used the following site: [The SQL standard] - http://www.wiscorp.com/sql20nn.zip - http://www.wiscorp.com/sqlmultisets.zip [secondary information] - http://farrago.sourceforge.net/design/CollectionTypes.html - http://waelchatila.com/2005/05/18/1116485743467.html [Implementation in Oracle Database] - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm Here are the list of functions in the patch. Note that all of the functions treat arrays as one-dimensional (ex. [N][M] => [N * M]) because there is no multi-dimensional arrays/multiset support in the SQL standard. - [FUNCTION] cardinality(anyarray) => integer - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray - [FUNCTION] array_flatten(anyarray) => anyarray - [FUNCTION] array_sort(anyarray) => anyarray - [FUNCTION] set(anyarray) => anyarray - [SYNTAX] $1 IS [NOT] A SET => boolean - [SYNTAX] $1 [NOT] MEMBER OF $2 => boolean - [SYNTAX] $1 [NOT] SUBMULTISET OF $2 => boolean - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 => anyarray - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $22 => anyarray - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $22 => anyarray - [AGGREGATE] collect(anyelement) => anyarray - [AGGREGATE] fusion(anyarray) => anyarray - [AGGREGATE] intersection(anyarray) => anyarray -- Itagaki Takahiro
Attachment
On Mon, 2010-12-27 at 10:36 +0900, Itagaki Takahiro wrote: > On Mon, Dec 27, 2010 at 02:09, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > I have a free time and I can do a review of your patch. Please, can > > send a last version and can send a links on documentation that you > > used? > > Thanks! The latest patch attached. > I've not written documentation yet, but I used the following site: > > [The SQL standard] > - http://www.wiscorp.com/sql20nn.zip > - http://www.wiscorp.com/sqlmultisets.zip > [secondary information] > - http://farrago.sourceforge.net/design/CollectionTypes.html > - http://waelchatila.com/2005/05/18/1116485743467.html > [Implementation in Oracle Database] > - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm > - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm > > Here are the list of functions in the patch. Note that all of the > functions treat arrays as one-dimensional (ex. [N][M] => [N * M]) > because there is no multi-dimensional arrays/multiset support > in the SQL standard. I think collect() is non-identical to fusion() but the tests don't highlight that, so I think we need more tests to highlight null handling. You mention that SQL standard doesn't handle multi-dim arrays, so I think we need some tests to define and check that behaviour. Other than that, looks like a clean and useful addition. Without any docs the only point of reference to understand the PostgreSQL implementation is by reading the tests. IMHO docs explain a patch and make a review possible; they aren't something that can be written after a review. Perhaps the nag doesn't apply as much when you supply external references, but reviewer shouldn't have to read all of those again. Of course, no need for docs to be perfect, just enough to understand. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Hello some quick notes: * trim_array - you use a deconstruct_array. It unpack all fields and it could not be effective. Can we limit a unpacked array? I searched on net. This function has a little bit unconptual name - DB2 use a synonym for this function array_trim. Can we use this synonym too? Probably there could be a low level optimization - can we limit a detoast processing? (It must not be a part of this patch). * three state boolean - true, false, -1. I am not sure, if this is correct style. Using a second variable can be a more clean * you doesn't a realese a deconstructed array * using a variable name "type" + it has a nice speedup for our array based functions (sort is 3x faster), + patch was applyed without problems + all test was passed Questions: should be a MULTISET, SUBMULTISET, MEMBER a reserved keywords? I am for marking these words as reserved keywords, but it needs a wide agreeement. Without agreement, I don't think so not optimal keyword "OF" in MEMBER operator is significant issue. Regards Pavel Stehule 2010/12/27 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Mon, Dec 27, 2010 at 02:09, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I have a free time and I can do a review of your patch. Please, can >> send a last version and can send a links on documentation that you >> used? > > Thanks! The latest patch attached. > I've not written documentation yet, but I used the following site: > > [The SQL standard] > - http://www.wiscorp.com/sql20nn.zip > - http://www.wiscorp.com/sqlmultisets.zip > [secondary information] > - http://farrago.sourceforge.net/design/CollectionTypes.html > - http://waelchatila.com/2005/05/18/1116485743467.html > [Implementation in Oracle Database] > - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm > - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm > > Here are the list of functions in the patch. Note that all of the > functions treat arrays as one-dimensional (ex. [N][M] => [N * M]) > because there is no multi-dimensional arrays/multiset support > in the SQL standard. > > - [FUNCTION] cardinality(anyarray) => integer > - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray > - [FUNCTION] array_flatten(anyarray) => anyarray > - [FUNCTION] array_sort(anyarray) => anyarray > - [FUNCTION] set(anyarray) => anyarray > - [SYNTAX] $1 IS [NOT] A SET => boolean > - [SYNTAX] $1 [NOT] MEMBER OF $2 => boolean > - [SYNTAX] $1 [NOT] SUBMULTISET OF $2 => boolean > - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 => anyarray > - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $22 => anyarray > - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $22 => anyarray > - [AGGREGATE] collect(anyelement) => anyarray > - [AGGREGATE] fusion(anyarray) => anyarray > - [AGGREGATE] intersection(anyarray) => anyarray > > -- > Itagaki Takahiro >
On Mon, Dec 27, 2010 at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote: > * trim_array - you use a deconstruct_array. It unpack all fields and > it could not be effective. Can we limit a unpacked array? Sure, I'll optimize it. > I searched on net. This function has a little bit unconptual name - > DB2 use a synonym for this function array_trim. Can we use this > synonym too? IBM DB2 does use TRIM_ARRAY for the name, no? I believe it's the standard. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053491.html > Probably there could be a low level optimization - can we limit a > detoast processing? (It must not be a part of this patch). I think we could avoid deconstruct_array() in some spaces, but cannot avoid detoasting. > Questions: > should be a MULTISET, SUBMULTISET, MEMBER a reserved keywords? > I am for marking these words as reserved keywords, but it needs a wide > agreeement. Without agreement, I don't think so not optimal keyword > "OF" in MEMBER operator is significant issue. They are full-reserved keywords in the spec, but I'd like not to reserve them until we can do nothing but do so. To be honest, I cannot fix shift/reduce errors in an optional OF in the syntax even if I marked those variables as reserved keywords. Can I ask for your help about the usage of bison/flex for such case? + /* FIXME: OF is an option in the SQL standard, but I cannot solve + shift/reduce errors without OF. To solve the errors, we might need + to make OF, MEMBER, and/or SUBMULTISET to reserved keywords. They + are reserved keywords in the SQL standard. + */ -- Itagaki Takahiro
2010/12/27 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Mon, Dec 27, 2010 at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> * trim_array - you use a deconstruct_array. It unpack all fields and >> it could not be effective. Can we limit a unpacked array? > > Sure, I'll optimize it. > >> I searched on net. This function has a little bit unconptual name - >> DB2 use a synonym for this function array_trim. Can we use this >> synonym too? > > IBM DB2 does use TRIM_ARRAY for the name, no? I believe it's the standard. > http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053491.html > DB2 use a both names. Almost all array functions has a name array_* . Sure, standard is primary target. It's only my idea, so we can have a both names. It's not significant. >> Probably there could be a low level optimization - can we limit a >> detoast processing? (It must not be a part of this patch). > > I think we could avoid deconstruct_array() in some spaces, > but cannot avoid detoasting. > It must not be done this commitfest. I agree, so we cant avoid detoasting, but we can limit it with pg_detoast_datum_slice. Pavel >> Questions: >> should be a MULTISET, SUBMULTISET, MEMBER a reserved keywords? >> I am for marking these words as reserved keywords, but it needs a wide >> agreeement. Without agreement, I don't think so not optimal keyword >> "OF" in MEMBER operator is significant issue. > > They are full-reserved keywords in the spec, but I'd like not to > reserve them until we can do nothing but do so. > > To be honest, I cannot fix shift/reduce errors in an optional OF > in the syntax even if I marked those variables as reserved keywords. > Can I ask for your help about the usage of bison/flex for such case? > > + /* FIXME: OF is an option in the SQL standard, but I cannot solve > + shift/reduce errors without OF. To solve the errors, we might need > + to make OF, MEMBER, and/or SUBMULTISET to reserved keywords. They > + are reserved keywords in the SQL standard. > + */ > > -- > Itagaki Takahiro >
Here is an updated patch for MULTISET functions support. - Fixed a bug in MULTISET EXCEPT ALL. - Fixed a bug of NULL handling in SUBMULTISET OF. - Removed array_flatten() from exported function list. It is still used internally, but I doubt it would be useful in SQL level. On Mon, Dec 27, 2010 at 21:27, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> * trim_array - you use a deconstruct_array. It unpack all fields and >>> it could not be effective. Can we limit a unpacked array? I modified the logic to use existing array_get_slice(), that uses memcpy to make a sliced array without unpacking. > DB2 use a both names. Almost all array functions has a name array_* . > Sure, standard is primary target. It's only my idea, so we can have a > both names. It's not significant. We also use array_xxx for array function names, but TRIM_ARRAY() is the standard... Since slice syntax ARRAY[lo:hi] should be more useful than trim_array(), I don't have a plan to add the alias. > * three state boolean - true, false, -1. I am not sure, if this is > correct style. Using a second variable can be a more clean I modified the code to use two booleans. > * you doesn't a realese a deconstructed array I don't think it needs to be fixed because functions are executed in per-tuple memory context in normal cases. On Mon, Dec 27, 2010 at 17:05, Simon Riggs <simon@2ndquadrant.com> wrote: > I think collect() is non-identical to fusion() but the tests don't > highlight that, so I think we need more tests to highlight null > handling. collect() exactly equals to array_agg() in the current implementation. It should return a MULTISET value instead of ARRAY in the spec, but we don't have MULTISET data type for now. > You mention that SQL standard doesn't handle multi-dim arrays, > so I think we need some tests to define and check that behaviour. I added additional tests for multi-dimensional arrays and NULL elements. > Without any docs the only point of reference to understand the > PostgreSQL implementation is by reading the tests. IMHO docs explain a > patch and make a review possible; they aren't something that can be > written after a review. Perhaps the nag doesn't apply as much when you > supply external references, but reviewer shouldn't have to read all of > those again. Of course, no need for docs to be perfect, just enough to > understand. I added some comments mainly in "Array Functions and Operators" section. The most debatable part is whether arrays should be flattened or not. I wrote the patch to flatten always -- for example, trim_array( [[1,2],[3,4]], 1) => [1,2,3] but we could design it to keep the dimension: trim_array( [[1,2],[3,4]], 1) => [[1,2]] However, it's not so easy for comparison function (sort, set, ...) to decide what we should do. Note that the existing <@ operator also ignored dimensions in arrays: =# SELECT ARRAY[[1,3],[2,5]] <@ ARRAY[[1,2,3],[4,5,6]]; ?column? ---------- t -- Itagaki Takahiro
Attachment
On Thu, January 6, 2011 12:54, Itagaki Takahiro wrote: > Here is an updated patch for MULTISET functions support. There seem to be some faulty line-endings in arrays.out that break the arrays test (on x86_64 Centos 5.4). make, make check were OK after I removed these (3 lines, from line 1370). *** /var/data1/pg_stuff/pg_sandbox/pgsql.multiset/src/test/regress/expected/arrays.out 2011-01-06 17:05:33.000000000 +0100 --- /var/data1/pg_stuff/pg_sandbox/pgsql.multiset/src/test/regress/results/arrays.out 2011-01-06 17:08:47.000000000 +0100 *************** *** 1367,1375 **** SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL], ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL]; ! submultiset_of | submultiset_of ^M ! ----------------+----------------^M ! | f^M (1 row) SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL]; --- 1367,1375 ---- SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL], ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL]; ! submultiset_of | submultiset_of ! ----------------+---------------- ! | f (1 row) SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL]; ====================================================================== Erik Rijkers