Thread: Function array_agg(array)

Function array_agg(array)

From
Ali Akbar
Date:
Greetings,

While looking for easier items in PostgreSQL Wiki's Todo List (this will be my 3rd patch), i found this TODO:

Add a built-in array_agg(anyarray) or similar, that can aggregate 1-dimensional arrays into a 2-dimensional array.

I've stumbled by this lack of array_agg(anyarray) sometimes ago in my work, so i decided to explore this.

Currently, if we try array_agg(anyarray), PostgreSQL behaves like this:

# select array_agg('{1,2}'::int[]);
ERROR:  could not find array type for data type integer[]

Reading implementation of array_agg, it looks like the array_agg function is generic, and can process any input. The error comes from PostgreSQL not finding array type for int[] (_int4 in pg_proc).

In PostgreSQL, any array is multidimensional, array type for any array is the same:
- the type of {1,2} is int[]
- {{1,2}, {3,4}} is int[]
- {{{1},{2}, {3} ,{4}}} is still int[]

So, can't we just set the typarray of array types to its self oid? (patch attached). So far:
- the array_agg is working and returning correct types:

backend> select array_agg('{1,2}'::int[]);
     1: array_agg    (typeid = 1007, len = -1, typmod = -1, byval = f)
    ----
     1: array_agg = "{"{1,2}"}"    (typeid = 1007, len = -1, typmod = -1, byval = f)
    ----

select array_agg('{''a'',''b''}'::varchar[]);
     1: array_agg    (typeid = 1015, len = -1, typmod = -1, byval = f)
    ----
     1: array_agg = "{"{'a','b'}"}"    (typeid = 1015, len = -1, typmod = -1, byval = f)
    ----


- Regression tests passed except for the pg_type sanity check while checking typelem relation with typarray:

SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
       p2.typelem, p2.typlen
FROM   pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid)
WHERE  p1.typarray <> 0 AND
       (p2.oid IS NULL OR p2.typelem <> p1.oid OR p2.typlen <> -1);
!  oid  |    basetype    |   arraytype    | typelem | typlen
! ------+----------------+----------------+---------+--------
!   143 | _xml           | _xml           |     142 |     -1
!   199 | _json          | _json          |     114 |     -1
!   629 | _line          | _line          |     628 |     -1
!   719 | _circle        | _circle        |     718 |     -1
... (cut)


Aside from the sanity check complaints, I don't see any problems in the resulting array operations.

So, back to the question: Can't we just set the typarray of array types to its self oid?

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Tom Lane
Date:
Ali Akbar <the.apaan@gmail.com> writes:
> So, can't we just set the typarray of array types to its self oid?

Seems dangerous as heck; certainly it would have side-effects far more
wide-ranging than just making this particular function work.

A safer answer is to split array_agg into two functions,array_agg(anynonarray) -> anyarrayarray_agg(anyarray) ->
anyarray

I rather imagine you should do that anyway, because I really doubt
that this hack is operating quite as intended.  I suspect you are
producing arrays containing arrays as elements, not true 2-D arrays.
That's not a direction we want to go in I think; certainly there are
no other operations that produce such things.
        regards, tom lane



Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-11 22:28 GMT+07:00 Tom Lane <tgl@sss.pgh.pa.us>:
Seems dangerous as heck; certainly it would have side-effects far more
wide-ranging than just making this particular function work.

A safer answer is to split array_agg into two functions,
        array_agg(anynonarray) -> anyarray
        array_agg(anyarray) -> anyarray

I rather imagine you should do that anyway, because I really doubt
that this hack is operating quite as intended.  I suspect you are
producing arrays containing arrays as elements, not true 2-D arrays.
That's not a direction we want to go in I think; certainly there are
no other operations that produce such things.

Thanks for the review. Yes, it looks like the patch produced array as the elements. So, all array operations behaves wierdly.

In this quick & dirty patch, I am trying to implement the array_agg(anyarray), introducing two new functions:
- array_agg_anyarray_transfn
- array_agg_anyarray_finalfn

At first, i want to use accumArrayResult and makeMdArrayResult, but it's complicated to work with multi-dimensional arrays with those two functions. So i combined array_cat with those function.

Currently, it cannot handle NULL arrays:
backend> select array_agg(a) from (values(null::int[])) a(a);
     1: array_agg    (typeid = 1007, len = -1, typmod = -1, byval = f)
    ----
ERROR:  cannot aggregate null arrays

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-12 19:37 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
Currently, it cannot handle NULL arrays:
backend> select array_agg(a) from (values(null::int[])) a(a);
     1: array_agg    (typeid = 1007, len = -1, typmod = -1, byval = f)
    ----
ERROR:  cannot aggregate null arrays

While thinking about the function behavior if its input is NULL array (e.g: NULL:int[]), i've found:
- currentpatch doesn't handle empty array correctly:
    - when there is only one array to aggregate, the resulting array is wrong
    - when the first array is empty array, and the second array is also empty array, it segfaulted
- if we see NULL array as NULL values, the resulting array cannot be differentiated from array of null ints:
    - SELECT array_agg(NULL::int[]) FROM generate_series(1,2); ---> {NULL, NULL} with type int[]
    - SELECT array_agg(NULL::int) FROM generate_series(1,2); --> {NULL, NULL} with type int[]

Also i've found that handling NULL array is listed as BUG in TODO. The discussion in the thread is still not finished, with last email from Tom Lane (http://www.postgresql.org/message-id/18866.1226025853@sss.pgh.pa.us):

> array_lower raise exception if array is empty (there are no dimensions
> to inquire about)
> array_upper raise exception if array is empty (there are no dimensions
> to inquire about)

Well, these beg the question: is an empty array zero-dimensional, or
is it a one-dimensional array of no elements, or perhaps both of those
as well as higher-dimensional cases where any axis has zero elements,
or ???

It's really all kind of messy ... we need to trade off simplicity of
definition, ease of use, backwards compatibility, and standards
compliance (though the standard has only 1-D arrays so it's of just
limited help here).

So, is there any idea how we will handle NULL and empty array in array_agg(anyarray)?
I propose we just reject those input because the output will make no sense:
- array_agg(NULL::int[]) --> the result will be indistinguished from array_agg of NULL ints.
- array_agg('{}'::int[]) --> how we determine the dimension of the result? is it 0? Or the result will be just an empty array {} ?

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
So, is there any idea how we will handle NULL and empty array in array_agg(anyarray)? 
I propose we just reject those input because the output will make no sense:
- array_agg(NULL::int[]) --> the result will be indistinguished from array_agg of NULL ints.
- array_agg('{}'::int[]) --> how we determine the dimension of the result? is it 0? Or the result will be just an empty array {} ?

This updated patch rejects NULL and {} arrays as noted above.
 
Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:
Hi

2014-10-19 8:02 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
So, is there any idea how we will handle NULL and empty array in array_agg(anyarray)? 
I propose we just reject those input because the output will make no sense:
- array_agg(NULL::int[]) --> the result will be indistinguished from array_agg of NULL ints.
- array_agg('{}'::int[]) --> how we determine the dimension of the result? is it 0? Or the result will be just an empty array {} ?

This updated patch rejects NULL and {} arrays as noted above.
 

I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.

2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.

3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

Regards

Pavel

 
Regards,
--
Ali Akbar


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Function array_agg(array)

From
Ali Akbar
Date:
Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test 
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

 
4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test 
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?
 

In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword
 

 
4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger.





Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test 
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

Found it. Thanks.

On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109

if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}

to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places.

I'll think more about this later. For this patch, i'll go without changes in pg_type.h.

4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger.

Ok.


Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-23 4:00 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test 
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

Found it. Thanks.

On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109

yes, it is true - this is really big change and maybe needs separate discuss - ***if we allow cycle there. I am not sure about possible side effects***.

Maybe this change is not necessary, you can fix a check only ... "if type is not array or if get_array_type is null raise a error"
 

if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}

to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places.

attention: probably we don't would to allow arrays everywhere.
 

I'll think more about this later. For this patch, i'll go without changes in pg_type.h.

4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger.

Ok.


Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
 
i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for array(subselect) defined?
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

implemented it by modifying ArrayBuildState to ArrayBuildStateArray and ArrayBuildStateScalar (do you have any idea for better struct naming?)
 
In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

attention: probably we don't would to allow arrays everywhere. 

I've changed the places where i think it's appropriate.
 
4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. 

this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 * number of items in array.

Regards,
--
Ali Akbar
 
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:

2014-10-24 10:24 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
 
i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for array(subselect) defined?
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

implemented it by modifying ArrayBuildState to ArrayBuildStateArray and ArrayBuildStateScalar (do you have any idea for better struct naming?)
 
In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

attention: probably we don't would to allow arrays everywhere. 

I've changed the places where i think it's appropriate.
 
4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. 

this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 * number of items in array.

Regards,
--
Ali Akbar
 

Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-24 15:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:

doc updated with additional example for array(subselect). patch attached.

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;


2014-10-24 11:24 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-24 15:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:

doc updated with additional example for array(subselect). patch attached.

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:

2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;

Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost.

--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Michael Paquier
Date:


On Fri, Oct 24, 2014 at 11:43 AM, Ali Akbar <the.apaan@gmail.com> wrote:

2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;

Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost.
That's not surprising, sometimes filterdiff misses the shot.
--
Michael

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;

Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost.

last version is compileable, but some is still broken

postgres=# select array_agg(array[i, i+1, i-1])
            from generate_series(1,2) a(i);
ERROR:  could not find array type for data type integer[]

but array(subselect) works

postgres=# select array(select a from xx);
       array      
-------------------
 {{1,2,3},{1,2,3}}
(1 row)

Regards

Pavel

 

--
Ali Akbar

Re: Function array_agg(array)

From
Pavel Stehule
Date:
Hi

I did some performance tests and it is interesting:

it is about 15% faster than original implementation.

Regards

Pavel




2014-10-24 13:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;

Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost.

last version is compileable, but some is still broken

postgres=# select array_agg(array[i, i+1, i-1])
            from generate_series(1,2) a(i);
ERROR:  could not find array type for data type integer[]

but array(subselect) works

postgres=# select array(select a from xx);
       array      
-------------------
 {{1,2,3},{1,2,3}}
(1 row)

Regards

Pavel

 

--
Ali Akbar


Re: Function array_agg(array)

From
Alvaro Herrera
Date:
Michael Paquier wrote:

> That's not surprising, sometimes filterdiff misses the shot.

Really?  Wow, that's bad news.  I've been using it to submit patches
from time to time ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-24 13:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;

Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost.

last version is compileable, but some is still broken

postgres=# select array_agg(array[i, i+1, i-1])
            from generate_series(1,2) a(i);
ERROR:  could not find array type for data type integer[]

I am sorry, it works - I had a problem with broken database

I fixed small issue in regress tests and I enhanced tests for varlena types and null values.

Regards

Pavel
 

but array(subselect) works

postgres=# select array(select a from xx);
       array      
-------------------
 {{1,2,3},{1,2,3}}
(1 row)

Regards

Pavel

 

--
Ali Akbar


Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:
Hi Ali

I checked a code. I am thinking so code organization is not good. accumArrayResult is too long now. makeMdArrayResult will not work, when arrays was joined (it is not consistent now). I don't like a usage of state->is_array_accum in array_userfunc.c -- it is signal of wrong wrapping.

next question: there is function array_append(anyarray, anyelement). Isn't time to define array_append(anyarray, anyarray) now?

Regards

Pavel

2014-10-24 15:05 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-24 13:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-24 11:43 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-24 16:26 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

some in last patch is wrong, I cannot to compile it:

arrayfuncs.c: In function ‘accumArrayResult’:
arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->alen = 64;  /* arbitrary starting array size */
         ^
arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
         ^
arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dvalues = (Datum *) palloc(astate->alen * sizeof(Datum));
                                            ^
arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
         ^
arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
   astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
                                          ^
arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
   astate->nelems = 0;
         ^
arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
   if (astate->nelems >= astate->alen)
             ^
arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
   if (astate->nelems >= astate->alen)
                               ^
arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
    astate->alen *= 2;

Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost.

last version is compileable, but some is still broken

postgres=# select array_agg(array[i, i+1, i-1])
            from generate_series(1,2) a(i);
ERROR:  could not find array type for data type integer[]

I am sorry, it works - I had a problem with broken database

I fixed small issue in regress tests and I enhanced tests for varlena types and null values.

Regards

Pavel
 

but array(subselect) works

postgres=# select array(select a from xx);
       array      
-------------------
 {{1,2,3},{1,2,3}}
(1 row)

Regards

Pavel

 

--
Ali Akbar



Re: Function array_agg(array)

From
Ali Akbar
Date:
I fixed small issue in regress tests and I enhanced tests for varlena types and null values.
Thanks.

it is about 15% faster than original implementation.
15% faster than array_agg(scalar)? I haven't verify the performance, but because the internal array data and null bitmap is copied as-is, that will be faster.

2014-10-25 1:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi Ali 

I checked a code. I am thinking so code organization is not good. accumArrayResult is too long now. makeMdArrayResult will not work, when arrays was joined (it is not consistent now). I don't like a usage of state->is_array_accum in array_userfunc.c -- it is signal of wrong wrapping.
Yes, i was thinking the same. Attached WIP patch to reorganizate the code. makeMdArrayResult works now, with supplied arguments act as override from default values calculated from ArrayBuildStateArray.

In array_userfunc.c, because we don't want to override ndims, dims and lbs, i copied array_agg_finalfn and only change the call to makeMdArrayResult (we don't uses makeArrayResult because we want to set release to false). Another alternative is to create new makeArrayResult-like function that has parameter bool release.
 
next question: there is function array_append(anyarray, anyelement). Isn't time to define array_append(anyarray, anyarray) now? 

There is array_cat(anyarray, anyarray):
/*-----------------------------------------------------------------------------
 * array_cat :
 * concatenate two nD arrays to form an nD array, or
 * push an (n-1)D array onto the end of an nD array
 *----------------------------------------------------------------------------
 */

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-25 10:29 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
I fixed small issue in regress tests and I enhanced tests for varlena types and null values.
Thanks.

it is about 15% faster than original implementation.
15% faster than array_agg(scalar)? I haven't verify the performance, but because the internal array data and null bitmap is copied as-is, that will be faster.

2014-10-25 1:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi Ali 

I checked a code. I am thinking so code organization is not good. accumArrayResult is too long now. makeMdArrayResult will not work, when arrays was joined (it is not consistent now). I don't like a usage of state->is_array_accum in array_userfunc.c -- it is signal of wrong wrapping.
Yes, i was thinking the same. Attached WIP patch to reorganizate the code. makeMdArrayResult works now, with supplied arguments act as override from default values calculated from ArrayBuildStateArray.

In array_userfunc.c, because we don't want to override ndims, dims and lbs, i copied array_agg_finalfn and only change the call to makeMdArrayResult (we don't uses makeArrayResult because we want to set release to false). Another alternative is to create new makeArrayResult-like function that has parameter bool release.

adding makeArrayResult1 (do you have better name for this?), that accepts bool release parameter. array_agg_finalfn becomes more clean, and no duplicate code in array_agg_anyarray_finalfn.
 
next question: there is function array_append(anyarray, anyelement). Isn't time to define array_append(anyarray, anyarray) now? 

There is array_cat(anyarray, anyarray):
/*-----------------------------------------------------------------------------
 * array_cat :
 * concatenate two nD arrays to form an nD array, or
 * push an (n-1)D array onto the end of an nD array
 *----------------------------------------------------------------------------
 */

Regards, 
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-25 8:19 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-25 10:29 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
I fixed small issue in regress tests and I enhanced tests for varlena types and null values.
Thanks.

it is about 15% faster than original implementation.
15% faster than array_agg(scalar)? I haven't verify the performance, but because the internal array data and null bitmap is copied as-is, that will be faster.

2014-10-25 1:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi Ali 

I checked a code. I am thinking so code organization is not good. accumArrayResult is too long now. makeMdArrayResult will not work, when arrays was joined (it is not consistent now). I don't like a usage of state->is_array_accum in array_userfunc.c -- it is signal of wrong wrapping.
Yes, i was thinking the same. Attached WIP patch to reorganizate the code. makeMdArrayResult works now, with supplied arguments act as override from default values calculated from ArrayBuildStateArray.

In array_userfunc.c, because we don't want to override ndims, dims and lbs, i copied array_agg_finalfn and only change the call to makeMdArrayResult (we don't uses makeArrayResult because we want to set release to false). Another alternative is to create new makeArrayResult-like function that has parameter bool release.

adding makeArrayResult1 (do you have better name for this?), that accepts bool release parameter. array_agg_finalfn becomes more clean, and no duplicate code in array_agg_anyarray_finalfn.

 makeArrayResult1 - I have no better name now

I found one next minor detail.

you reuse a array_agg_transfn function. Inside is a message "array_agg_transfn called in non-aggregate context". It is not correct for array_agg_anyarray_transfn

probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too.

makeMdArrayResultArray and appendArrayDatum  should be marked as "static"

 
next question: there is function array_append(anyarray, anyelement). Isn't time to define array_append(anyarray, anyarray) now? 

There is array_cat(anyarray, anyarray):
/*-----------------------------------------------------------------------------
 * array_cat :
 * concatenate two nD arrays to form an nD array, or
 * push an (n-1)D array onto the end of an nD array
 *----------------------------------------------------------------------------
 */

Regards, 
--
Ali Akbar

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-25 8:33 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-25 8:19 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-25 10:29 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
I fixed small issue in regress tests and I enhanced tests for varlena types and null values.
Thanks.

it is about 15% faster than original implementation.
15% faster than array_agg(scalar)? I haven't verify the performance, but because the internal array data and null bitmap is copied as-is, that will be faster.

2014-10-25 1:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi Ali 

I checked a code. I am thinking so code organization is not good. accumArrayResult is too long now. makeMdArrayResult will not work, when arrays was joined (it is not consistent now). I don't like a usage of state->is_array_accum in array_userfunc.c -- it is signal of wrong wrapping.
Yes, i was thinking the same. Attached WIP patch to reorganizate the code. makeMdArrayResult works now, with supplied arguments act as override from default values calculated from ArrayBuildStateArray.

In array_userfunc.c, because we don't want to override ndims, dims and lbs, i copied array_agg_finalfn and only change the call to makeMdArrayResult (we don't uses makeArrayResult because we want to set release to false). Another alternative is to create new makeArrayResult-like function that has parameter bool release.

adding makeArrayResult1 (do you have better name for this?), that accepts bool release parameter. array_agg_finalfn becomes more clean, and no duplicate code in array_agg_anyarray_finalfn.

 makeArrayResult1 - I have no better name now

I found one next minor detail.

you reuse a array_agg_transfn function. Inside is a message "array_agg_transfn called in non-aggregate context". It is not correct for array_agg_anyarray_transfn

probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too.

makeMdArrayResultArray and appendArrayDatum  should be marked as "static"

I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming.

Regards

Pavel

 

 
next question: there is function array_append(anyarray, anyelement). Isn't time to define array_append(anyarray, anyarray) now? 

There is array_cat(anyarray, anyarray):
/*-----------------------------------------------------------------------------
 * array_cat :
 * concatenate two nD arrays to form an nD array, or
 * push an (n-1)D array onto the end of an nD array
 *----------------------------------------------------------------------------
 */

Regards, 
--
Ali Akbar


Re: Function array_agg(array)

From
Ali Akbar
Date:
 makeArrayResult1 - I have no better name now

I found one next minor detail.

you reuse a array_agg_transfn function. Inside is a message "array_agg_transfn called in non-aggregate context". It is not correct for array_agg_anyarray_transfn

Fixed.
 
probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too.

makeMdArrayResultArray and appendArrayDatum  should be marked as "static"

ok, marked all of that as static.
 
I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming. 

I'm thinking it like this:
- if we want to accumulate array normally, use accumArrayResult and makeArrayResult. If we accumulate scalar the result will be 1D array. if we accumulate array, the resulting dimension is incremented by 1.
- if we want, somehow to affect the normal behavior, and change the dimensions, use makeMdArrayResult.

Searching through the postgres' code, other than internal use in arrayfuncs.c, makeMdArrayResult is used only in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl array to postgres array.

So if somehow we will accumulate array other than in array_agg, i think the most natural way is using accumArrayResult and then makeArrayResult.

CMIIW

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-25 10:16 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
 makeArrayResult1 - I have no better name now

I found one next minor detail.

you reuse a array_agg_transfn function. Inside is a message "array_agg_transfn called in non-aggregate context". It is not correct for array_agg_anyarray_transfn

Fixed.
 
probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too.

makeMdArrayResultArray and appendArrayDatum  should be marked as "static"

ok, marked all of that as static.
 
I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming. 

I'm thinking it like this:
- if we want to accumulate array normally, use accumArrayResult and makeArrayResult. If we accumulate scalar the result will be 1D array. if we accumulate array, the resulting dimension is incremented by 1.
- if we want, somehow to affect the normal behavior, and change the dimensions, use makeMdArrayResult.

Searching through the postgres' code, other than internal use in arrayfuncs.c, makeMdArrayResult is used only in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl array to postgres array.

So if somehow we will accumulate array other than in array_agg, i think the most natural way is using accumArrayResult and then makeArrayResult.


ok, there is more variants and I can't to decide. But I am not satisfied with this API. We do some wrong in structure. makeMdArrayResult is now ugly.


 

CMIIW

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-25 15:43 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-25 10:16 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
 makeArrayResult1 - I have no better name now

I found one next minor detail.

you reuse a array_agg_transfn function. Inside is a message "array_agg_transfn called in non-aggregate context". It is not correct for array_agg_anyarray_transfn

Fixed.
 
probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too.

makeMdArrayResultArray and appendArrayDatum  should be marked as "static"

ok, marked all of that as static.
 
I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming. 

I'm thinking it like this:
- if we want to accumulate array normally, use accumArrayResult and makeArrayResult. If we accumulate scalar the result will be 1D array. if we accumulate array, the resulting dimension is incremented by 1.
- if we want, somehow to affect the normal behavior, and change the dimensions, use makeMdArrayResult.

Searching through the postgres' code, other than internal use in arrayfuncs.c, makeMdArrayResult is used only in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl array to postgres array.

So if somehow we will accumulate array other than in array_agg, i think the most natural way is using accumArrayResult and then makeArrayResult.

ok, there is more variants and I can't to decide. But I am not satisfied with this API. We do some wrong in structure. makeMdArrayResult is now ugly.

One approach that i can think is we cleanly separate the structures and API. We don't touch existing ArrayBuildState, accumArrayResult, makeArrayResult and makeMdArrayResult, and we create new functions: ArrayBuildStateArray, accumArrayResultArray, makeArrayResultArray and makeArrayResultArrayCustom.

But if we do that, the array(subselect) implementation will not be as simple as current patch. We must also change all the ARRAY_SUBLINK-related accumArrayResult call.

Regarding current patch implementation, the specific typedefs are declared as:
+typedef struct ArrayBuildStateScalar
+{
+ ArrayBuildState astate;
...
+} ArrayBuildStateArray;

so it necessities rather ugly code like this:
+ result->elemtype = astate->astate.element_type;

Can we use C11 feature of unnamed struct like this? :
+typedef struct ArrayBuildStateScalar
+{
+ ArrayBuildState;
...
+} ArrayBuildStateArray;

so the code can be a little less ugly by doing it like this:
+ result->elemtype = astate->element_type;

I don't know whether all currently supported compilers implements this feature..


Can you suggest a better structure for this?

If we can't settle on a better structure, i think i'll reimplement array_agg without the performance improvement, using deconstruct_array and unchanged accumArrayResult & makeMdArrayResult.

Thanks,
--
Ali Akbar

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-25 12:20 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-25 15:43 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:


2014-10-25 10:16 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
 makeArrayResult1 - I have no better name now

I found one next minor detail.

you reuse a array_agg_transfn function. Inside is a message "array_agg_transfn called in non-aggregate context". It is not correct for array_agg_anyarray_transfn

Fixed.
 
probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too.

makeMdArrayResultArray and appendArrayDatum  should be marked as "static"

ok, marked all of that as static.
 
I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming. 

I'm thinking it like this:
- if we want to accumulate array normally, use accumArrayResult and makeArrayResult. If we accumulate scalar the result will be 1D array. if we accumulate array, the resulting dimension is incremented by 1.
- if we want, somehow to affect the normal behavior, and change the dimensions, use makeMdArrayResult.

Searching through the postgres' code, other than internal use in arrayfuncs.c, makeMdArrayResult is used only in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl array to postgres array.

So if somehow we will accumulate array other than in array_agg, i think the most natural way is using accumArrayResult and then makeArrayResult.

ok, there is more variants and I can't to decide. But I am not satisfied with this API. We do some wrong in structure. makeMdArrayResult is now ugly.

One approach that i can think is we cleanly separate the structures and API. We don't touch existing ArrayBuildState, accumArrayResult, makeArrayResult and makeMdArrayResult, and we create new functions: ArrayBuildStateArray, accumArrayResultArray, makeArrayResultArray and makeArrayResultArrayCustom.

yes, I am thinking about separatate path, that will be joined in constructMdArray

In reality, there are two different array builders - with different API and better to respect it.
 

But if we do that, the array(subselect) implementation will not be as simple as current patch. We must also change all the ARRAY_SUBLINK-related accumArrayResult call.

Regarding current patch implementation, the specific typedefs are declared as:
+typedef struct ArrayBuildStateScalar
+{
+ ArrayBuildState astate;
...
+} ArrayBuildStateArray;

so it necessities rather ugly code like this:
+ result->elemtype = astate->astate.element_type;

Can we use C11 feature of unnamed struct like this? :

no, what I know a C11 is prohibited
 
+typedef struct ArrayBuildStateScalar
+{
+ ArrayBuildState;
...
+} ArrayBuildStateArray;

so the code can be a little less ugly by doing it like this:
+ result->elemtype = astate->element_type;

I don't know whether all currently supported compilers implements this feature..


Can you suggest a better structure for this?

If we can't settle on a better structure, i think i'll reimplement array_agg without the performance improvement, using deconstruct_array and unchanged accumArrayResult & makeMdArrayResult.

you can check it? We can test, how performance lost we get. As second benefit we can get numbers for introduction new optimized array builder

Regards

Pavel
 

Thanks,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
you can check it? We can test, how performance lost we get. As second benefit we can get numbers for introduction new optimized array builder

array_agg(anyarray) with deconstruct_array, unchanged accumArrayResult and makeMdArrayResult:
INSERT 0 1
Time: 852,527 ms
INSERT 0 1
Time: 844,275 ms
INSERT 0 1
Time: 858,855 ms
INSERT 0 1
Time: 861,072 ms
INSERT 0 1
Time: 952,006 ms
INSERT 0 1
Time: 953,918 ms
INSERT 0 1
Time: 926,945 ms
INSERT 0 1
Time: 923,692 ms
INSERT 0 1
Time: 940,916 ms
INSERT 0 1
Time: 948,700 ms
INSERT 0 1
Time: 933,333 ms
INSERT 0 1
Time: 948,869 ms
INSERT 0 1
Time: 847,113 ms
INSERT 0 1
Time: 908,572 ms 
 
Total: 12776.83
Avg: 912,63

with last patch (v10):
INSERT 0 1
Time: 643,339 ms
INSERT 0 1
Time: 608,010 ms
INSERT 0 1
Time: 610,465 ms
INSERT 0 1
Time: 613,931 ms
INSERT 0 1
Time: 616,466 ms
INSERT 0 1
Time: 634,754 ms
INSERT 0 1
Time: 683,566 ms
INSERT 0 1
Time: 656,665 ms
INSERT 0 1
Time: 630,096 ms
INSERT 0 1
Time: 607,564 ms
INSERT 0 1
Time: 610,353 ms
INSERT 0 1
Time: 626,816 ms
INSERT 0 1
Time: 610,450 ms
INSERT 0 1
Time: 614,342 ms
 
Total: 8842,7
Avg: 631,6

It's 30% faster (i tried varlena element - text). I tried several times and it's consistent in +/- 30%.


quick & dirty non-optimized patch and the test script attached.

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:
Hi

My idea is using new ArrayBuilder optimized for building multidimensional arrays with own State type. I think so casting to ArrayBuildState is base of our problems, so I don't would to do. Code in array_agg_* is simple, little bit more complex code is in nodeSubplan.c. Some schematic changes are in attachments.

Regards

Pavel



2014-10-25 15:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
you can check it? We can test, how performance lost we get. As second benefit we can get numbers for introduction new optimized array builder

array_agg(anyarray) with deconstruct_array, unchanged accumArrayResult and makeMdArrayResult:
INSERT 0 1
Time: 852,527 ms
INSERT 0 1
Time: 844,275 ms
INSERT 0 1
Time: 858,855 ms
INSERT 0 1
Time: 861,072 ms
INSERT 0 1
Time: 952,006 ms
INSERT 0 1
Time: 953,918 ms
INSERT 0 1
Time: 926,945 ms
INSERT 0 1
Time: 923,692 ms
INSERT 0 1
Time: 940,916 ms
INSERT 0 1
Time: 948,700 ms
INSERT 0 1
Time: 933,333 ms
INSERT 0 1
Time: 948,869 ms
INSERT 0 1
Time: 847,113 ms
INSERT 0 1
Time: 908,572 ms 
 
Total: 12776.83
Avg: 912,63

with last patch (v10):
INSERT 0 1
Time: 643,339 ms
INSERT 0 1
Time: 608,010 ms
INSERT 0 1
Time: 610,465 ms
INSERT 0 1
Time: 613,931 ms
INSERT 0 1
Time: 616,466 ms
INSERT 0 1
Time: 634,754 ms
INSERT 0 1
Time: 683,566 ms
INSERT 0 1
Time: 656,665 ms
INSERT 0 1
Time: 630,096 ms
INSERT 0 1
Time: 607,564 ms
INSERT 0 1
Time: 610,353 ms
INSERT 0 1
Time: 626,816 ms
INSERT 0 1
Time: 610,450 ms
INSERT 0 1
Time: 614,342 ms
 
Total: 8842,7
Avg: 631,6

It's 30% faster (i tried varlena element - text). I tried several times and it's consistent in +/- 30%.


quick & dirty non-optimized patch and the test script attached.

Regards,
--
Ali Akbar

Attachment

Re: Function array_agg(array)

From
Ali Akbar
Date:

2014-10-27 1:38 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

My idea is using new ArrayBuilder optimized for building multidimensional arrays with own State type. I think so casting to ArrayBuildState is base of our problems, so I don't would to do. Code in array_agg_* is simple, little bit more complex code is in nodeSubplan.c. Some schematic changes are in attachments.

Thanks! The structure looks clear, and thanks for the example on nodeSubplan.c. I will restructure the v10 of the patch to this structure.

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-10-27 9:11 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-27 1:38 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

My idea is using new ArrayBuilder optimized for building multidimensional arrays with own State type. I think so casting to ArrayBuildState is base of our problems, so I don't would to do. Code in array_agg_* is simple, little bit more complex code is in nodeSubplan.c. Some schematic changes are in attachments.

Thanks! The structure looks clear, and thanks for the example on nodeSubplan.c. I will restructure the v10 of the patch to this structure.

Patch attached.

Regards, 
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:
Hi

I did some minor changes in code

* move tests of old or new builder style for array sublink out of main cycles
* some API simplification of new builder - we should not to create identical API, mainly it has no sense

Regards

Pavel Stehule


2014-10-27 8:12 GMT+01:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-27 9:11 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-27 1:38 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

My idea is using new ArrayBuilder optimized for building multidimensional arrays with own State type. I think so casting to ArrayBuildState is base of our problems, so I don't would to do. Code in array_agg_* is simple, little bit more complex code is in nodeSubplan.c. Some schematic changes are in attachments.

Thanks! The structure looks clear, and thanks for the example on nodeSubplan.c. I will restructure the v10 of the patch to this structure.

Patch attached.

Regards, 
--
Ali Akbar

Attachment

Re: Function array_agg(array)

From
Ali Akbar
Date:

2014-10-27 16:15 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

I did some minor changes in code

* move tests of old or new builder style for array sublink out of main cycles
* some API simplification of new builder - we should not to create identical API, mainly it has no sense

minor changes in the patch:
* remove array_agg_finalfn_internal declaration in top of array_userfuncs.c
* fix comment of makeMdArray
* fix return of makeMdArray
* remove unnecesary changes to array_agg_finalfn

Regards,
--
Ali Akbar
Attachment

Re: Function array_agg(array)

From
Pavel Stehule
Date:


2014-10-27 11:20 GMT+01:00 Ali Akbar <the.apaan@gmail.com>:

2014-10-27 16:15 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

I did some minor changes in code

* move tests of old or new builder style for array sublink out of main cycles
* some API simplification of new builder - we should not to create identical API, mainly it has no sense

minor changes in the patch:
* remove array_agg_finalfn_internal declaration in top of array_userfuncs.c
* fix comment of makeMdArray
* fix return of makeMdArray
* remove unnecesary changes to array_agg_finalfn

super

I tested last version and I have not any objections.

1. We would to have this feature - it is long time number of our ToDo List

2. Proposal and design of multidimensional aggregation is clean and nobody has objection here.

3. There is zero impact on current implementation. From performance reasons it uses new array optimized aggregator - 30% faster for this purpose than current (scalar) array aggregator

4. Code is clean and respect PostgreSQL coding rules

5. There are documentation and necessary regress tests

6. Patching and compilation is clean without warnings.

This patch is ready for commit

Thank you for patch

Regards

Pavel
 

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Ali Akbar
Date:
super

I tested last version and I have not any objections.

1. We would to have this feature - it is long time number of our ToDo List

2. Proposal and design of multidimensional aggregation is clean and nobody has objection here.

3. There is zero impact on current implementation. From performance reasons it uses new array optimized aggregator - 30% faster for this purpose than current (scalar) array aggregator

4. Code is clean and respect PostgreSQL coding rules

5. There are documentation and necessary regress tests

6. Patching and compilation is clean without warnings.

This patch is ready for commit

Thank you for patch

Thank you for the thorough review process.
 
Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2014-10-27 11:20 GMT+01:00 Ali Akbar <the.apaan@gmail.com>:
>> [ array_agg_anyarray-13.patch ]

> This patch is ready for commit

I've committed this after some significant modifications.

I did not like the API chosen, specifically the business about callers
needing to deal with both accumArrayResult and accumMdArray, because that
seemed pretty messy and error-prone.  There was in fact at least one
calling-logic error, namely that the empty array created by nodeSubplan.c
when there were zero inputs would have the wrong element type for the
array-input case.

It seemed to me that the best fix for that would be to push the empty
array creation special case into the accumArrayResult function family.
After some thought about how to do that, I settled on adding an "init"
function that can create an ArrayBuildState with no data yet put into
it; the main purpose is just to save the info about the input datatype.
Then, if makeArrayResult receives the ArrayBuildState with still no
data in it, it can create the appropriate empty array.  (This turned
out to be a better idea than I first realized, as both xml.c and plperl.c
had coding patterns that could be made significantly less ugly with a
convention that the ArrayBuildState is always there.)

After doing that, I adjusted your new functions to have similar APIs,
and then added a switching layer on top for use by callers that want
to support both the scalar and array cases.  This made the adjustments
for array subplans more or less one-liner changes in the relevant
places.  We could possibly have unified the array_agg support functions
as well, but I didn't see much point in that given that we need two
sets of pg_proc entries to make type resolution work properly.

There were a number of other smaller issues too, like not being cautious
about memory allocation (the submitted code could both fail to allocate
enough, and compute an allocation request that would overflow an int).
        regards, tom lane



Re: Function array_agg(array)

From
Ali Akbar
Date:
2014-11-26 0:38 GMT+07:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2014-10-27 11:20 GMT+01:00 Ali Akbar <the.apaan@gmail.com>:
>> [ array_agg_anyarray-13.patch ]

> This patch is ready for commit

I've committed this after some significant modifications.

I did not like the API chosen, specifically the business about callers
needing to deal with both accumArrayResult and accumMdArray, because that
seemed pretty messy and error-prone. 

Thanks!

When in the reviewing process, we tried to implement in existing API, and it was messy, so the last patch is with two API. We didn't think what you eventually did. 3 API: existing for scalar, a new API for array, _and_ a new API for both. Great!!

Just curious, in accumArrayResultArr, while enlarging array and nullbitmaps, why it's implemented with:

astate->abytes = Max(astate->abytes * 2,
                                 astate->nbytes + ndatabytes);

and 

astate->aitems = Max(astate->aitems * 2, newnitems);

won't it be more consistent if it's implemented just like in the first allocation?:

while (astate->aitems <= newnitems)
                            astate->aitems *= 2;

Anyway, thanks for the big modifications. I learned a lot from that.

Regards,
--
Ali Akbar

Re: Function array_agg(array)

From
Tom Lane
Date:
Ali Akbar <the.apaan@gmail.com> writes:
> Just curious, in accumArrayResultArr, while enlarging array and
> nullbitmaps, why it's implemented with:

> astate->abytes = Max(astate->abytes * 2,
> astate->nbytes + ndatabytes);
> and
> astate->aitems = Max(astate->aitems * 2, newnitems);

> won't it be more consistent if it's implemented just like in the first
> allocation?:

> while (astate->aitems <= newnitems)
> astate->aitems *= 2;

The idea was to try to force the initial allocation to be a power of 2,
while not insisting on that for later enlargements.  I can't point to any
hard reasons for doing it that way, but it seemed like a good idea.
Power-of-2 allocations are good up to a certain point but after that they
tend to get wasteful ...
        regards, tom lane