Thread: MULTISET patch

MULTISET patch

From
Pavel Stehule
Date:
Hello,

I have a free time and I can do a review of your patch. Please, can
send a last version and can send a links on documentation that you
used?

Regards

Pavel Stehule


Re: MULTISET patch

From
Itagaki Takahiro
Date:
On Mon, Dec 27, 2010 at 02:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I have a free time and I can do a review of your patch. Please, can
> send a last version and can send a links on documentation that you
> used?

Thanks! The latest patch attached.
I've not written documentation yet, but I used the following site:

[The SQL standard]
- http://www.wiscorp.com/sql20nn.zip
- http://www.wiscorp.com/sqlmultisets.zip
[secondary information]
- http://farrago.sourceforge.net/design/CollectionTypes.html
- http://waelchatila.com/2005/05/18/1116485743467.html
[Implementation in Oracle Database]
- http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm
- http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm

Here are the list of functions in the patch. Note that all of the
functions treat arrays as one-dimensional (ex. [N][M] => [N * M])
because there is no multi-dimensional arrays/multiset support
in the SQL standard.

 - [FUNCTION] cardinality(anyarray) => integer
 - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
 - [FUNCTION] array_flatten(anyarray) => anyarray
 - [FUNCTION] array_sort(anyarray) => anyarray
 - [FUNCTION] set(anyarray) => anyarray
 - [SYNTAX] $1 IS [NOT] A SET => boolean
 - [SYNTAX] $1 [NOT] MEMBER OF $2 => boolean
 - [SYNTAX] $1 [NOT] SUBMULTISET OF $2 => boolean
 - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 => anyarray
 - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $22 => anyarray
 - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $22 => anyarray
 - [AGGREGATE] collect(anyelement) => anyarray
 - [AGGREGATE] fusion(anyarray) => anyarray
 - [AGGREGATE] intersection(anyarray) => anyarray

--
Itagaki Takahiro

Attachment

Re: MULTISET patch

From
Simon Riggs
Date:
On Mon, 2010-12-27 at 10:36 +0900, Itagaki Takahiro wrote:
> On Mon, Dec 27, 2010 at 02:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > I have a free time and I can do a review of your patch. Please, can
> > send a last version and can send a links on documentation that you
> > used?
> 
> Thanks! The latest patch attached.
> I've not written documentation yet, but I used the following site:
> 
> [The SQL standard]
> - http://www.wiscorp.com/sql20nn.zip
> - http://www.wiscorp.com/sqlmultisets.zip
> [secondary information]
> - http://farrago.sourceforge.net/design/CollectionTypes.html
> - http://waelchatila.com/2005/05/18/1116485743467.html
> [Implementation in Oracle Database]
> - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm
> - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm
> 
> Here are the list of functions in the patch. Note that all of the
> functions treat arrays as one-dimensional (ex. [N][M] => [N * M])
> because there is no multi-dimensional arrays/multiset support
> in the SQL standard.

I think collect() is non-identical to fusion() but the tests don't
highlight that, so I think we need more tests to highlight null
handling. You mention that SQL standard doesn't handle multi-dim arrays,
so I think we need some tests to define and check that behaviour.

Other than that, looks like a clean and useful addition.

Without any docs the only point of reference to understand the
PostgreSQL implementation is by reading the tests. IMHO docs explain a
patch and make a review possible; they aren't something that can be
written after a review. Perhaps the nag doesn't apply as much when you
supply external references, but reviewer shouldn't have to read all of
those again. Of course, no need for docs to be perfect, just enough to
understand.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: MULTISET patch

From
Pavel Stehule
Date:
Hello

some quick notes:

* trim_array - you use a deconstruct_array. It unpack  all fields and
it could not be effective. Can we limit a unpacked array?

I searched on net. This function has a little bit unconptual name -
DB2 use a synonym for this function array_trim. Can we use this
synonym too?

Probably there could be a low level optimization - can we limit a
detoast processing? (It must not be a part of this patch).

* three state boolean - true, false, -1. I am not sure, if this is
correct style. Using a second variable can be a more clean
* you doesn't a realese a deconstructed array
* using a variable name "type"

+ it has a nice speedup for our array based functions (sort is 3x faster),
+ patch was applyed without problems
+ all test was passed

Questions:

should be a MULTISET, SUBMULTISET, MEMBER a reserved keywords?

I am for marking these words as reserved keywords, but it needs a wide
agreeement. Without agreement, I don't think so not optimal keyword
"OF" in MEMBER operator is significant issue.

Regards

Pavel Stehule




2010/12/27 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Mon, Dec 27, 2010 at 02:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I have a free time and I can do a review of your patch. Please, can
>> send a last version and can send a links on documentation that you
>> used?
>
> Thanks! The latest patch attached.
> I've not written documentation yet, but I used the following site:
>
> [The SQL standard]
> - http://www.wiscorp.com/sql20nn.zip
> - http://www.wiscorp.com/sqlmultisets.zip
> [secondary information]
> - http://farrago.sourceforge.net/design/CollectionTypes.html
> - http://waelchatila.com/2005/05/18/1116485743467.html
> [Implementation in Oracle Database]
> - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm
> - http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm
>
> Here are the list of functions in the patch. Note that all of the
> functions treat arrays as one-dimensional (ex. [N][M] => [N * M])
> because there is no multi-dimensional arrays/multiset support
> in the SQL standard.
>
>  - [FUNCTION] cardinality(anyarray) => integer
>  - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
>  - [FUNCTION] array_flatten(anyarray) => anyarray
>  - [FUNCTION] array_sort(anyarray) => anyarray
>  - [FUNCTION] set(anyarray) => anyarray
>  - [SYNTAX] $1 IS [NOT] A SET => boolean
>  - [SYNTAX] $1 [NOT] MEMBER OF $2 => boolean
>  - [SYNTAX] $1 [NOT] SUBMULTISET OF $2 => boolean
>  - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 => anyarray
>  - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $22 => anyarray
>  - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $22 => anyarray
>  - [AGGREGATE] collect(anyelement) => anyarray
>  - [AGGREGATE] fusion(anyarray) => anyarray
>  - [AGGREGATE] intersection(anyarray) => anyarray
>
> --
> Itagaki Takahiro
>


Re: MULTISET patch

From
Itagaki Takahiro
Date:
On Mon, Dec 27, 2010 at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> * trim_array - you use a deconstruct_array. It unpack  all fields and
> it could not be effective. Can we limit a unpacked array?

Sure, I'll optimize it.

> I searched on net. This function has a little bit unconptual name -
> DB2 use a synonym for this function array_trim. Can we use this
> synonym too?

IBM DB2 does use TRIM_ARRAY for the name, no? I believe it's the standard.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053491.html

> Probably there could be a low level optimization - can we limit a
> detoast processing? (It must not be a part of this patch).

I think we could avoid deconstruct_array() in some spaces,
but cannot avoid detoasting.

> Questions:
> should be a MULTISET, SUBMULTISET, MEMBER a reserved keywords?
> I am for marking these words as reserved keywords, but it needs a wide
> agreeement. Without agreement, I don't think so not optimal keyword
> "OF" in MEMBER operator is significant issue.

They are full-reserved keywords in the spec, but I'd like not to
reserve them until we can do nothing but do so.

To be honest, I cannot fix shift/reduce errors in an optional OF
in the syntax even if I marked those variables as reserved keywords.
Can I ask for your help about the usage of bison/flex for such case?

+ /*    FIXME: OF is an option in the SQL standard, but I cannot solve
+     shift/reduce errors without OF. To solve the errors, we might need
+     to make OF, MEMBER, and/or SUBMULTISET to reserved keywords. They
+     are reserved keywords in the SQL standard.
+ */

--
Itagaki Takahiro


Re: MULTISET patch

From
Pavel Stehule
Date:
2010/12/27 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Mon, Dec 27, 2010 at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> * trim_array - you use a deconstruct_array. It unpack  all fields and
>> it could not be effective. Can we limit a unpacked array?
>
> Sure, I'll optimize it.
>
>> I searched on net. This function has a little bit unconptual name -
>> DB2 use a synonym for this function array_trim. Can we use this
>> synonym too?
>
> IBM DB2 does use TRIM_ARRAY for the name, no? I believe it's the standard.
>
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053491.html
>

DB2 use a both names.  Almost all array functions has a name array_* .
Sure, standard is primary target. It's only my idea, so we can have a
both names. It's not significant.

>> Probably there could be a low level optimization - can we limit a
>> detoast processing? (It must not be a part of this patch).
>
> I think we could avoid deconstruct_array() in some spaces,
> but cannot avoid detoasting.
>

It must not be done this commitfest. I agree, so we cant avoid
detoasting, but we can limit it with pg_detoast_datum_slice.

Pavel

>> Questions:
>> should be a MULTISET, SUBMULTISET, MEMBER a reserved keywords?
>> I am for marking these words as reserved keywords, but it needs a wide
>> agreeement. Without agreement, I don't think so not optimal keyword
>> "OF" in MEMBER operator is significant issue.
>
> They are full-reserved keywords in the spec, but I'd like not to
> reserve them until we can do nothing but do so.
>
> To be honest, I cannot fix shift/reduce errors in an optional OF
> in the syntax even if I marked those variables as reserved keywords.
> Can I ask for your help about the usage of bison/flex for such case?
>
> + /*    FIXME: OF is an option in the SQL standard, but I cannot solve
> +       shift/reduce errors without OF. To solve the errors, we might need
> +       to make OF, MEMBER, and/or SUBMULTISET to reserved keywords. They
> +       are reserved keywords in the SQL standard.
> + */
>
> --
> Itagaki Takahiro
>


Re: MULTISET patch

From
Itagaki Takahiro
Date:
Here is an updated patch for MULTISET functions support.
- Fixed a bug in MULTISET EXCEPT ALL.
- Fixed a bug of NULL handling in SUBMULTISET OF.
- Removed array_flatten() from exported function list. It is still
  used internally, but I doubt it would be useful in SQL level.

On Mon, Dec 27, 2010 at 21:27, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> * trim_array - you use a deconstruct_array. It unpack  all fields and
>>> it could not be effective. Can we limit a unpacked array?

I modified the logic to use existing array_get_slice(),
that uses memcpy to make a sliced array without unpacking.

> DB2 use a both names.  Almost all array functions has a name array_* .
> Sure, standard is primary target. It's only my idea, so we can have a
> both names. It's not significant.

We also use array_xxx for array function names, but TRIM_ARRAY()
is the standard... Since slice syntax ARRAY[lo:hi] should be more
useful  than trim_array(), I don't have a plan to add the alias.

> * three state boolean - true, false, -1. I am not sure, if this is
> correct style. Using a second variable can be a more clean

I modified the code to use two booleans.

> * you doesn't a realese a deconstructed array

I don't think it needs to be fixed because functions are executed
in per-tuple memory context in normal cases.


On Mon, Dec 27, 2010 at 17:05, Simon Riggs <simon@2ndquadrant.com> wrote:
> I think collect() is non-identical to fusion() but the tests don't
> highlight that, so I think we need more tests to highlight null
> handling.

collect() exactly equals to array_agg() in the current implementation.
It should return a MULTISET value instead of ARRAY in the spec,
but we don't have MULTISET data type for now.

> You mention that SQL standard doesn't handle multi-dim arrays,
> so I think we need some tests to define and check that behaviour.

I added additional tests for multi-dimensional arrays and NULL elements.

> Without any docs the only point of reference to understand the
> PostgreSQL implementation is by reading the tests. IMHO docs explain a
> patch and make a review possible; they aren't something that can be
> written after a review. Perhaps the nag doesn't apply as much when you
> supply external references, but reviewer shouldn't have to read all of
> those again. Of course, no need for docs to be perfect, just enough to
> understand.

I added some comments mainly in "Array Functions and Operators" section.
The most debatable part is whether arrays should be flattened or not.
I wrote the patch to flatten always -- for example,
  trim_array( [[1,2],[3,4]], 1) => [1,2,3]
but we could design it to keep the dimension:
  trim_array( [[1,2],[3,4]], 1) => [[1,2]]

However, it's not so easy for comparison function (sort, set, ...) to
decide what we should do. Note that the existing <@ operator also ignored
dimensions in arrays:

=# SELECT ARRAY[[1,3],[2,5]] <@ ARRAY[[1,2,3],[4,5,6]];
 ?column?
----------
 t

--
Itagaki Takahiro

Attachment

Re: MULTISET patch

From
"Erik Rijkers"
Date:
On Thu, January 6, 2011 12:54, Itagaki Takahiro wrote:
> Here is an updated patch for MULTISET functions support.

There seem to be some faulty line-endings in arrays.out that break the arrays test (on x86_64
Centos 5.4).  make, make check were OK after I removed these (3 lines, from line 1370).


*** /var/data1/pg_stuff/pg_sandbox/pgsql.multiset/src/test/regress/expected/arrays.out  2011-01-06
17:05:33.000000000 +0100
--- /var/data1/pg_stuff/pg_sandbox/pgsql.multiset/src/test/regress/results/arrays.out   2011-01-06
17:08:47.000000000 +0100
***************
*** 1367,1375 ****
 SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],        ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
!  submultiset_of | submultiset_of ^M
! ----------------+----------------^M
!                 | f^M (1 row)
 SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];
--- 1367,1375 ----
 SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],        ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
!  submultiset_of | submultiset_of
! ----------------+----------------
!                 | f (1 row)
 SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];

======================================================================


Erik Rijkers