Re: Function array_agg(array) - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Function array_agg(array)
Date
Msg-id CAFj8pRCue-7X+gBZBXx6MZOQq7+Qe2+Q2PejoyfK6i-NiEFt9A@mail.gmail.com
Whole thread Raw
In response to Re: Function array_agg(array)  (Ali Akbar <the.apaan@gmail.com>)
Responses Re: Function array_agg(array)
List pgsql-hackers

2014-10-24 10:24 GMT+02:00 Ali Akbar <the.apaan@gmail.com>:
Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:
I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.
 
i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for array(subselect) defined?
 
2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred.
We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is.

aha, so isn't better to fix a performance for accumArrayResult ?

Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter.

implemented it by modifying ArrayBuildState to ArrayBuildStateArray and ArrayBuildStateScalar (do you have any idea for better struct naming?)
 
In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays.
 
3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
       array_agg      
-----------------------
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented?

where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword

attention: probably we don't would to allow arrays everywhere. 

I've changed the places where i think it's appropriate.
 
4. why you use a magic constant (64) there?

+         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+         astate->aitems = 64 * nitems;

+             astate->nullbitmap = (bits8 *)
+                 repalloc(astate->nullbitmap, (astate->aitems + 7) / 8);

just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate->alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory.

you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. 

this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 * number of items in array.

Regards,
--
Ali Akbar
 

pgsql-hackers by date:

Previous
From: Borodin Vladimir
Date:
Subject: Re: ExclusiveLock on extension of relation with huge shared_buffers
Next
From: Ali Akbar
Date:
Subject: Re: Function array_agg(array)