Review Report: propose to include 3 new functions into intarray and intagg - Mailing list pgsql-hackers

From Markus Wanner
Subject Review Report: propose to include 3 new functions into intarray and intagg
Date
Msg-id 48C26AB2.5050009@bluegap.ch
Whole thread Raw
In response to Patch: propose to include 3 new functions into intarray and intagg  ("Dmitry Koterov" <dmitry@koterov.ru>)
Responses Re: Review Report: propose to include 3 new functions into intarray and intagg  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Review Report: propose to include 3 new functions into intarray and intagg  ("Dmitry Koterov" <dmitry@koterov.ru>)
List pgsql-hackers
Hi,

this is my first "official" review. I've tried to follow the "Review a
patch" guidelines from the wiki - thanks Simon, that was pretty helpful.

This review covers only the intagg additions.

Dmitry Koterov wrote:
> Here are these functions with detailed documentation:
> http://en.dklab.ru/lib/dklab_postgresql_patch/

Submission review: we generally prefer having patches archived on our
mailing lists, so please just send future patches or revisions of this
patch to our lists (I prefer -hackers, but probably -patches is still
the official one). The documentation should go into a README file of the
contrib module or some such. Tests are missing, but that's the case for
the intagg contrib module anyway. The patch applies and is in context
diff format, good. It adds an unrelated newline, which should generally 
be avoided.

Usability review: functional additions look good and usable, certainly
within a contrib module. I don't think it's compliant to any spec, but
that's not required for contrib, IMO.

Functional test: works as advertised on the accompanying website. I've
tested the int_array_append_aggregate somewhat, see [1].

Performance testing: I've tested with 10 mio rows of arrays of size 1
and compared against core's int_array_aggregate function, see [1] again.
In this simple test it took roughly 50% longer, which seems okay. Memory 
consumption looks sane as well.

Coding review: style seems fine for contrib, though lines longer than 80 
cols should be broken up. Comments in the code are sparse , some even 
counter-productive (marking something as "additional things" certainly 
doesn't help).

Code and architecture review: the new int_array_append_aggregate() 
functions itself seems fine to me.

Summary: My general feeling is, that this patch should be applied after 
minor code style corrections. As a longer term goal I think intagg 
should be integrated into  core, since it's very basic functionality. 
TODO entries for things like an array_accum() aggregate already exist. 
Adding this patch to contrib now might be a step into the right direction.

Dmitry, can you please apply these small corrections and re-submit the 
patch?

Regards

Markus Wanner

P.S.: I dislike the intagg's use of PGARRAY, but that's nothing to do 
with this patch. Shouldn't this better use a real composite type as the 
aggregate's state type? I'd propose to clean up the intagg contrib 
module and prepare it for inclusion into core.


[1]: functional and performance testing session:

On a database with (patched) intagg and intarr contrib modules:

markus=# CREATE TABLE test (id INT NOT NULL, arr INT[] NOT NULL);
CREATE TABLE
markus=# INSERT INTO test VALUES (1, ARRAY[1,2,3]), (2, ARRAY[4,5]), (3,
ARRAY[3,2,1]);
INSERT 0 3
markus=# SELECT * FROM test; id |   arr
----+---------  1 | {1,2,3}  2 | {4,5}  3 | {3,2,1}
(3 rows)

markus=# SELECT int_array_append_aggregate(arr) FROM test; int_array_append_aggregate
---------------------------- {1,2,3,4,5,3,2,1}
(1 row)

markus=# SELECT * FROM test; id |   arr
----+---------  1 | {1,2,3}  2 | {4,5}  3 | {3,2,1}
(3 rows)

markus=# SELECT int_array_aggregate(id) AS ids,
int_array_append_aggregate(arr) FROM test GROUP BY (id / 2);  ids  | int_array_append_aggregate
-------+---------------------------- {1}   | {1,2,3} {2,3} | {4,5,3,2,1}
(2 rows)

markus=# SELECT int_array_aggregate(id) AS ids,
int_array_append_aggregate(arr) FROM test GROUP BY (id % 2);  ids  | int_array_append_aggregate
-------+---------------------------- {2}   | {4,5} {1,3} | {1,2,3,3,2,1}
(2 rows)

markus=# INSERT INTO test VALUES (4, NULL);
INSERT 0 1

markus=# SELECT int_array_append_aggregate(arr) FROM test; int_array_append_aggregate
---------------------------- {1,2,3,4,5,3,2,1}
(1 row)

markus=# SELECT id, int_array_append_aggregate(arr) FROM test GROUP BY id; id | int_array_append_aggregate
----+----------------------------  4 | {}  2 | {4,5}  3 | {3,2,1}  1 | {1,2,3}
(4 rows)

-- switching to performance testing

markus=# \timing
Timing is on.

markus=# DELETE FROM test;
DELETE 4
Time: 9.037 ms

markus=# INSERT INTO test SELECT generate_series(1, 10000000),
array[round(random() * 100)]::int[];
INSERT 0 10000000
Time: 53321.186 ms

markus=# SELECT icount(int_array_aggregate(id)) AS count FROM test;  count
---------- 10000000
(1 row)

Time: 2493.184 ms

markus=# SELECT icount(int_array_append_aggregate(arr)) AS count FROM test;  count
---------- 10000000
(1 row)

Time: 4152.478 ms




pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Prototype: In-place upgrade v02
Next
From: Andrew Dunstan
Date:
Subject: pg_dump/pg_restore items