Re: Function array_agg(array) - Mailing list pgsql-hackers
From | Ali Akbar |
---|---|
Subject | Re: Function array_agg(array) |
Date | |
Msg-id | CACQjQLqn27V32jBibu-m=6WhGSXEDBSjLny1=MSDkVVCSOwo=w@mail.gmail.com Whole thread Raw |
In response to | Re: Function array_agg(array) (Pavel Stehule <pavel.stehule@gmail.com>) |
Responses |
Re: Function array_agg(array)
|
List | pgsql-hackers |
2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-10-22 16:58 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:Thanks for the review2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:I agree with your proposal. I have a few comments to design:1. patch doesn't hold documentation and regress tests, please append it.OK, i'll add the documentation and regression test2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.aha, so isn't better to fix a performance for accumArrayResult ?
Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.
In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.3. array_agg was consistent with array(subselect), so it should be fixed too
postgres=# select array_agg(a) from test;
array_agg
-----------------------
{{1,2,3,4},{1,2,3,4}}
(1 row)
postgres=# select array(select a from test);
ERROR: could not find array type for data type integer[]I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?where you can start?
postgres=# explain select array(select a from test);
ERROR: 42704: could not find array type for data type integer[]
LOCATION: exprType, nodeFuncs.c:116look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword
Found it. Thanks.
On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109
if (sublink->subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find array type for data type %s",
format_type_be(exprType((Node *) tent->expr)))));
}
to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places.
I'll think more about this later. For this patch, i'll go without changes in pg_type.h.
4. why you use a magic constant (64) there?+ astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+ astate->aitems = 64 * nitems;+ astate->nullbitmap = (bits8 *)
+ repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):astate->alen = 64; /* arbitrary starting array size */it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger.
Ok.
Regards,
--
Ali Akbar
pgsql-hackers by date: