Re: Function array_agg(array) - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Function array_agg(array)
Date
Msg-id CAFj8pRDvRgF-kXd6hV8x5xRH2g6i6MW5-B+Qk6Eb7mf2V7MEZA@mail.gmail.com
Whole thread Raw
In response to Re: Function array_agg(array)  (Ali Akbar <the.apaan@gmail.com>)
Responses Re: Function array_agg(array)
List pgsql-hackers


2014-10-23 4:00 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
OK, i'll add the documentation and regression test 
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

Found it. Thanks.

On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109

yes, it is true - this is really big change and maybe needs separate discuss - ***if we allow cycle there. I am not sure about possible side effects***.

Maybe this change is not necessary, you can fix a check only ... "if type is not array or if get_array_type is null raise a error"
 

if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}

to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places.

attention: probably we don't would to allow arrays everywhere.
 

I'll think more about this later. For this patch, i'll go without changes in pg_type.h.

4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger.

Ok.


Regards,
--
Ali Akbar

pgsql-hackers by date:

Previous
From: Ali Akbar
Date:
Subject: Re: Function array_agg(array)
Next
From: Fujii Masao
Date:
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)