Thread: grouping/clustering query

grouping/clustering query

From

"David Garamond"

Date:

22 October 2008, 02:14:54

Dear all,

I have an invoices (inv) table and bank transaction (tx) table.
There's also the payment table which is a many-to-many relation
between the former two tables, because each invoice can be paid by one
or more bank transactions, and each bank transaction can pay for one
or more invoices. Example:

# (invoiceid, txid)
(A, 1)
(A, 3)
(B, 1)
(B, 2)
(C, 5)
(D, 6)
(D, 7)
(E, 8)
(F, 8)

For journalling, I need to group/cluster this together. Is there a SQL
query that can generate this output:

# (journal: invoiceids, txids)
[A,B] , [1,2,3]
[C], [5]
[D], [6,7]
[E,F], [8]

Regards,
dave

Re: grouping/clustering query

From

Steve Midgley

Date:

23 October 2008, 13:09:01

At 10:20 PM 10/22/2008, you wrote:
>Message-ID: 
><7c33d060810212214h5c85b406i49dd9e8d76bec9bd@mail.gmail.com>
>Date: Wed, 22 Oct 2008 12:14:49 +0700
>From: "David Garamond" <davidgaramond@gmail.com>
>To: pgsql-sql@postgresql.org
>Subject: grouping/clustering query
>X-Archive-Number: 200810/89
>X-Sequence-Number: 31731
>
>Dear all,
>
>I have an invoices (inv) table and bank transaction (tx) table.
>There's also the payment table which is a many-to-many relation
>between the former two tables, because each invoice can be paid by one
>or more bank transactions, and each bank transaction can pay for one
>or more invoices. Example:
>
># (invoiceid, txid)
>(A, 1)
>(A, 3)
>(B, 1)
>(B, 2)
>(C, 5)
>(D, 6)
>(D, 7)
>(E, 8)
>(F, 8)
>
>For journalling, I need to group/cluster this together. Is there a SQL
>query that can generate this output:
>
># (journal: invoiceids, txids)
>[A,B] , [1,2,3]
>[C], [5]
>[D], [6,7]
>[E,F], [8]

Hi Dave,

I'm not following the logic here. A has 1,3 and B has 1,2. So why does 
the first line print:

>[A,B] , [1,2,3]

What's the rule that tells the query to output this way? Is it that all 
of B's values are between A's values?

Also in your output, you've indicated [A,B] - does this mean you want 
two columns of output, each column being a pg array?

I may not be the best person to answer the actual SQL question, but I 
thought I'd clarify your requirements so the list members can have the 
best chance of answering.

Steve

Re: grouping/clustering query

From

Joe

Date:

23 October 2008, 19:18:59

Steve Midgley wrote:
>> # (invoiceid, txid)
>> (A, 1)
>> (A, 3)
>> (B, 1)
>> (B, 2)
>> (C, 5)
>> (D, 6)
>> (D, 7)
>> (E, 8)
>> (F, 8)
>>
>> For journalling, I need to group/cluster this together. Is there a SQL
>> query that can generate this output:
>>
>> # (journal: invoiceids, txids)
>> [A,B] , [1,2,3]
>> [C], [5]
>> [D], [6,7]
>> [E,F], [8]
>
> Hi Dave,
>
> I'm not following the logic here. A has 1,3 and B has 1,2. So why does 
> the first line print:
>
>> [A,B] , [1,2,3]
>
> What's the rule that tells the query to output this way? Is it that 
> all of B's values are between A's values?
From a purely accounting standpoint, since transaction 1 was applied to 
both invoices A and B, you need to group the invoices so that you can 
compare total invoiced against total paid.

Joe

Re: grouping/clustering query

From

Steve Midgley

Date:

24 October 2008, 12:04:57

At 11:28 AM 10/23/2008, Joe wrote:
>Steve Midgley wrote:
>>># (invoiceid, txid)
>>>(A, 1)
>>>(A, 3)
>>>(B, 1)
>>>(B, 2)
>>>(C, 5)
>>>(D, 6)
>>>(D, 7)
>>>(E, 8)
>>>(F, 8)
>>>
>>>For journalling, I need to group/cluster this together. Is there a 
>>>SQL
>>>query that can generate this output:
>>>
>>># (journal: invoiceids, txids)
>>>[A,B] , [1,2,3]
>>>[C], [5]
>>>[D], [6,7]
>>>[E,F], [8]
>>
>>Hi Dave,
>>
>>I'm not following the logic here. A has 1,3 and B has 1,2. So why 
>>does the first line print:
>>
>>>[A,B] , [1,2,3]
>>
>>What's the rule that tells the query to output this way? Is it that 
>>all of B's values are between A's values?
>
> From a purely accounting standpoint, since transaction 1 was applied 
> to both invoices A and B, you need to group the invoices so that you 
> can compare total invoiced against total paid.

I tinkered around briefly but didn't come up with a good idea, but I 
bet someone on this list can. However, I did create a CREATE script for 
your table design which, in my experience, makes it more likely that a 
real expert will take on your problem..

Hope this helps,

Steve

DROP TABLE IF EXISTS trans;

CREATE TABLE trans
(  id serial NOT NULL,  inv_id character varying,  tx_id character varying,  CONSTRAINT pk_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

insert into trans (inv_id, tx_id) values('A','1');
insert into trans (inv_id, tx_id) values('A','3');
insert into trans (inv_id, tx_id) values('B','1');
insert into trans (inv_id, tx_id) values('B','2');
insert into trans (inv_id, tx_id) values('C','5');
insert into trans (inv_id, tx_id) values('D','6');
insert into trans (inv_id, tx_id) values('D','7');
insert into trans (inv_id, tx_id) values('E','8');
insert into trans (inv_id, tx_id) values('F','8');

Re: grouping/clustering query

From

"Oliveiros Cristina"

Date:

24 October 2008, 12:20:27

Your script is handy, Steve.

Spontaneously, This seems to be an array type problem, something I just have 
vague notions about.

I'll take a look at this, 
http://www.postgresql.org/docs/8.3/static/arrays.html to see if something 
occurs...

Best,
Oliveiros


----- Original Message ----- 
From: "Steve Midgley" <science@misuse.org>
To: "Joe" <dev@freedomcircle.net>
Cc: <pgsql-sql@postgresql.org>; "David Garamond" <davidgaramond@gmail.com>
Sent: Friday, October 24, 2008 4:04 PM
Subject: Re: [SQL] grouping/clustering query


> At 11:28 AM 10/23/2008, Joe wrote:
>>Steve Midgley wrote:
>>>># (invoiceid, txid)
>>>>(A, 1)
>>>>(A, 3)
>>>>(B, 1)
>>>>(B, 2)
>>>>(C, 5)
>>>>(D, 6)
>>>>(D, 7)
>>>>(E, 8)
>>>>(F, 8)
>>>>
>>>>For journalling, I need to group/cluster this together. Is there a SQL
>>>>query that can generate this output:
>>>>
>>>># (journal: invoiceids, txids)
>>>>[A,B] , [1,2,3]
>>>>[C], [5]
>>>>[D], [6,7]
>>>>[E,F], [8]
>>>
>>>Hi Dave,
>>>
>>>I'm not following the logic here. A has 1,3 and B has 1,2. So why does 
>>>the first line print:
>>>
>>>>[A,B] , [1,2,3]
>>>
>>>What's the rule that tells the query to output this way? Is it that all 
>>>of B's values are between A's values?
>>
>> From a purely accounting standpoint, since transaction 1 was applied to 
>> both invoices A and B, you need to group the invoices so that you can 
>> compare total invoiced against total paid.
>
> I tinkered around briefly but didn't come up with a good idea, but I bet 
> someone on this list can. However, I did create a CREATE script for your 
> table design which, in my experience, makes it more likely that a real 
> expert will take on your problem..
>
> Hope this helps,
>
> Steve
>
> DROP TABLE IF EXISTS trans;
>
> CREATE TABLE trans
> (
>   id serial NOT NULL,
>   inv_id character varying,
>   tx_id character varying,
>   CONSTRAINT pk_id PRIMARY KEY (id)
> )
> WITH (OIDS=FALSE);
>
> insert into trans (inv_id, tx_id) values('A','1');
> insert into trans (inv_id, tx_id) values('A','3');
> insert into trans (inv_id, tx_id) values('B','1');
> insert into trans (inv_id, tx_id) values('B','2');
> insert into trans (inv_id, tx_id) values('C','5');
> insert into trans (inv_id, tx_id) values('D','6');
> insert into trans (inv_id, tx_id) values('D','7');
> insert into trans (inv_id, tx_id) values('E','8');
> insert into trans (inv_id, tx_id) values('F','8');
>
>
>
>
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

Re: grouping/clustering query

From

"Tony Wasson"

Date:

24 October 2008, 14:25:02

On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote:
> At 11:28 AM 10/23/2008, Joe wrote:
>>
>> Steve Midgley wrote:
>>>>
>>>> # (invoiceid, txid)
>>>> (A, 1)
>>>> (A, 3)
>>>> (B, 1)
>>>> (B, 2)
>>>> (C, 5)
>>>> (D, 6)
>>>> (D, 7)
>>>> (E, 8)
>>>> (F, 8)
>>>>
>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>> query that can generate this output:
>>>>
>>>> # (journal: invoiceids, txids)
>>>> [A,B] , [1,2,3]
>>>> [C], [5]
>>>> [D], [6,7]
>>>> [E,F], [8]
>>>
>>> Hi Dave,
>>>
>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>> the first line print:
>>>
>>>> [A,B] , [1,2,3]
>>>
>>> What's the rule that tells the query to output this way? Is it that all
>>> of B's values are between A's values?
>>
>> From a purely accounting standpoint, since transaction 1 was applied to
>> both invoices A and B, you need to group the invoices so that you can
>> compare total invoiced against total paid.
>
> I tinkered around briefly but didn't come up with a good idea, but I bet
> someone on this list can. However, I did create a CREATE script for your
> table design which, in my experience, makes it more likely that a real
> expert will take on your problem..
>
> Hope this helps,
>
> Steve
>
> DROP TABLE IF EXISTS trans;
>
> CREATE TABLE trans
> (
>  id serial NOT NULL,
>  inv_id character varying,
>  tx_id character varying,
>  CONSTRAINT pk_id PRIMARY KEY (id)
> )
> WITH (OIDS=FALSE);
>
> insert into trans (inv_id, tx_id) values('A','1');
> insert into trans (inv_id, tx_id) values('A','3');
> insert into trans (inv_id, tx_id) values('B','1');
> insert into trans (inv_id, tx_id) values('B','2');
> insert into trans (inv_id, tx_id) values('C','5');
> insert into trans (inv_id, tx_id) values('D','6');
> insert into trans (inv_id, tx_id) values('D','7');
> insert into trans (inv_id, tx_id) values('E','8');
> insert into trans (inv_id, tx_id) values('F','8');

This is as close as I can get the data. I think I'd need a custom
array grouping aggregate to get the results to match completely.
Notice how ABC are on their own lines?

test=# SELECT inv_array, tx_array
FROM (       SELECT tx_id, array_accum(inv_id) AS inv_array       FROM trans       GROUP BY tx_id       ORDER BY tx_id
 ) AS t
 
JOIN (       SELECT inv_id, array_accum(tx_id) AS tx_array       FROM trans       GROUP BY inv_id       ORDER BY inv_id
  ) AS i ON (t.tx_id = ANY(i.tx_array) OR i.inv_id =ANY(t.inv_array))
 
GROUP BY tx_array,inv_array
;inv_array | tx_array
-----------+----------{A,B}     | {1,2}{B}       | {1,2}{A}       | {1,3}{A,B}     | {1,3}{C}       | {5}{D}       |
{6,7}{E,F}    | {8}
 
(7 rows)

Re: grouping/clustering query

From

"Tony Wasson"

Date:

24 October 2008, 17:48:56

On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <ajwasson@gmail.com> wrote:
> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote:
>> At 11:28 AM 10/23/2008, Joe wrote:
>>>
>>> Steve Midgley wrote:
>>>>>
>>>>> # (invoiceid, txid)
>>>>> (A, 1)
>>>>> (A, 3)
>>>>> (B, 1)
>>>>> (B, 2)
>>>>> (C, 5)
>>>>> (D, 6)
>>>>> (D, 7)
>>>>> (E, 8)
>>>>> (F, 8)
>>>>>
>>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>>> query that can generate this output:
>>>>>
>>>>> # (journal: invoiceids, txids)
>>>>> [A,B] , [1,2,3]
>>>>> [C], [5]
>>>>> [D], [6,7]
>>>>> [E,F], [8]
>>>>
>>>> Hi Dave,
>>>>
>>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>>> the first line print:
>>>>
>>>>> [A,B] , [1,2,3]
>>>>
>>>> What's the rule that tells the query to output this way? Is it that all
>>>> of B's values are between A's values?
>>>
>>> From a purely accounting standpoint, since transaction 1 was applied to
>>> both invoices A and B, you need to group the invoices so that you can
>>> compare total invoiced against total paid.
>>
>> I tinkered around briefly but didn't come up with a good idea, but I bet
>> someone on this list can. However, I did create a CREATE script for your
>> table design which, in my experience, makes it more likely that a real
>> expert will take on your problem..
>>
>> Hope this helps,
>>
>> Steve
>>
>> DROP TABLE IF EXISTS trans;
>>
>> CREATE TABLE trans
>> (
>>  id serial NOT NULL,
>>  inv_id character varying,
>>  tx_id character varying,
>>  CONSTRAINT pk_id PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>>
>> insert into trans (inv_id, tx_id) values('A','1');
>> insert into trans (inv_id, tx_id) values('A','3');
>> insert into trans (inv_id, tx_id) values('B','1');
>> insert into trans (inv_id, tx_id) values('B','2');
>> insert into trans (inv_id, tx_id) values('C','5');
>> insert into trans (inv_id, tx_id) values('D','6');
>> insert into trans (inv_id, tx_id) values('D','7');
>> insert into trans (inv_id, tx_id) values('E','8');
>> insert into trans (inv_id, tx_id) values('F','8');


Here's a stab at a custom aggregate attempting to explain what's going
on inside.

CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
VARCHAR) RETURNS VARCHAR[] AS $$
DECLAREres VARCHAR[];
BEGINRAISE NOTICE 'input state is %',x;RAISE NOTICE 'input variable is %',y;IF x = '{}' THEN       RAISE NOTICE 'x is
empty,returning input variable %',y;       res[1] := y;ELSE       RAISE NOTICE 'input array is not empty, checking if
input
variable is a member %',y;       res := x;       IF y = ANY(res) THEN               RAISE NOTICE 'y is already in array
%,skipping',res;       ELSE               res := array_append(res, y);               RAISE NOTICE 'appending input
variable%',y;       END IF;END IF;RETURN res;
 
END
$$ LANGUAGE plpgsql STRICT;

DROP AGGREGATE array_accum_unique(VARCHAR);
CREATE AGGREGATE array_accum_unique
(   basetype = VARCHAR   , sfunc = varchar_array_accum_unique   , stype = VARCHAR[]   , initcond = '{}'
);

SELECT array_accum_unique(inv_id) AS invoiceids, array_accum_unique(tx_id) AS transactionids
FROM (       SELECT tx_id, inv_id       FROM trans       WHERE inv_id IN (                               SELECT inv_id
                            FROM trans                               WHERE id IN (
        SELECT id FROM trans
 
WHERE tx_id=1                                           )                       )       ORDER BY tx_id, inv_id     ) AS
ss
;


Returns this result for transaction id 1.
invoiceids | transactionids
------------+----------------{A,B}      | {1,2,3}
(1 row)

Hope this helps!
Tony

Re: grouping/clustering query

From

"David Garamond"

Date:

24 October 2008, 22:50:16

Tony, Joe, Steve,

Thanks for the follow-ups. Yes, the problem is related to double-entry
accounting, where one needs to balance total debit and credit
(payments and invoices) in each journal/transaction.

Due to time constraint, I ended up doing this in the client-side
programming language, since I am nowhere near fluent in PLs. The
algorithm should be simple (at least the "brute force" version), it's
basically checking if each element of the pair (txid, invoiceid) is
already mentioned in some journal and if it is, add the pair to the
journal, otherwise create a new journal with that pair as the first
entry. I believe this can easily be implemented in a PL. But still I
wonder if there is some SQL incantation that can do the same without
any PL.

Regards,
dave

On Sat, Oct 25, 2008 at 3:48 AM, Tony Wasson <ajwasson@gmail.com> wrote:
> On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <ajwasson@gmail.com> wrote:
>> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote:
>>> At 11:28 AM 10/23/2008, Joe wrote:
>>>>
>>>> Steve Midgley wrote:
>>>>>>
>>>>>> # (invoiceid, txid)
>>>>>> (A, 1)
>>>>>> (A, 3)
>>>>>> (B, 1)
>>>>>> (B, 2)
>>>>>> (C, 5)
>>>>>> (D, 6)
>>>>>> (D, 7)
>>>>>> (E, 8)
>>>>>> (F, 8)
>>>>>>
>>>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>>>> query that can generate this output:
>>>>>>
>>>>>> # (journal: invoiceids, txids)
>>>>>> [A,B] , [1,2,3]
>>>>>> [C], [5]
>>>>>> [D], [6,7]
>>>>>> [E,F], [8]
>>>>>
>>>>> Hi Dave,
>>>>>
>>>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>>>> the first line print:
>>>>>
>>>>>> [A,B] , [1,2,3]
>>>>>
>>>>> What's the rule that tells the query to output this way? Is it that all
>>>>> of B's values are between A's values?
>>>>
>>>> From a purely accounting standpoint, since transaction 1 was applied to
>>>> both invoices A and B, you need to group the invoices so that you can
>>>> compare total invoiced against total paid.
>>>
>>> I tinkered around briefly but didn't come up with a good idea, but I bet
>>> someone on this list can. However, I did create a CREATE script for your
>>> table design which, in my experience, makes it more likely that a real
>>> expert will take on your problem..
>>>
>>> Hope this helps,
>>>
>>> Steve
>>>
>>> DROP TABLE IF EXISTS trans;
>>>
>>> CREATE TABLE trans
>>> (
>>>  id serial NOT NULL,
>>>  inv_id character varying,
>>>  tx_id character varying,
>>>  CONSTRAINT pk_id PRIMARY KEY (id)
>>> )
>>> WITH (OIDS=FALSE);
>>>
>>> insert into trans (inv_id, tx_id) values('A','1');
>>> insert into trans (inv_id, tx_id) values('A','3');
>>> insert into trans (inv_id, tx_id) values('B','1');
>>> insert into trans (inv_id, tx_id) values('B','2');
>>> insert into trans (inv_id, tx_id) values('C','5');
>>> insert into trans (inv_id, tx_id) values('D','6');
>>> insert into trans (inv_id, tx_id) values('D','7');
>>> insert into trans (inv_id, tx_id) values('E','8');
>>> insert into trans (inv_id, tx_id) values('F','8');
>
>
> Here's a stab at a custom aggregate attempting to explain what's going
> on inside.
>
> CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
> VARCHAR) RETURNS VARCHAR[] AS $$
> DECLARE
>  res VARCHAR[];
> BEGIN
>  RAISE NOTICE 'input state is %',x;
>  RAISE NOTICE 'input variable is %',y;
>  IF x = '{}' THEN
>        RAISE NOTICE 'x is empty, returning input variable %',y;
>        res[1] := y;
>  ELSE
>        RAISE NOTICE 'input array is not empty, checking if input
> variable is a member %',y;
>        res := x;
>        IF y = ANY(res) THEN
>                RAISE NOTICE 'y is already in array %, skipping',res;
>        ELSE
>                res := array_append(res, y);
>                RAISE NOTICE 'appending input variable %',y;
>        END IF;
>  END IF;
>  RETURN res;
> END
> $$ LANGUAGE plpgsql STRICT;
>
> DROP AGGREGATE array_accum_unique(VARCHAR);
> CREATE AGGREGATE array_accum_unique
> (
>    basetype = VARCHAR
>    , sfunc = varchar_array_accum_unique
>    , stype = VARCHAR[]
>    , initcond = '{}'
> );
>
> SELECT array_accum_unique(inv_id) AS invoiceids
>  , array_accum_unique(tx_id) AS transactionids
> FROM (
>        SELECT tx_id, inv_id
>        FROM trans
>        WHERE inv_id IN (
>                                SELECT inv_id
>                                FROM trans
>                                WHERE id IN (
>                                                SELECT id FROM trans
> WHERE tx_id=1
>                                            )
>                        )
>        ORDER BY tx_id, inv_id
>      ) AS ss
> ;
>
>
> Returns this result for transaction id 1.
>
>  invoiceids | transactionids
> ------------+----------------
>  {A,B}      | {1,2,3}
> (1 row)
>
> Hope this helps!
> Tony
>

Fwd: grouping/clustering query

From

"Osvaldo Kussama"

Date:

27 October 2008, 16:28:14

I forgot the list.

---------- Forwarded message ----------
From: Osvaldo Kussama <osvaldo.kussama@gmail.com>
Date: Mon, 27 Oct 2008 12:28:57 -0200
Subject: Re: [SQL] grouping/clustering query
To: David Garamond <davidgaramond@gmail.com>

2008/10/24, David Garamond <davidgaramond@gmail.com>:
> Tony, Joe, Steve,
>
> Thanks for the follow-ups. Yes, the problem is related to double-entry
> accounting, where one needs to balance total debit and credit
> (payments and invoices) in each journal/transaction.
>
> Due to time constraint, I ended up doing this in the client-side
> programming language, since I am nowhere near fluent in PLs. The
> algorithm should be simple (at least the "brute force" version), it's
> basically checking if each element of the pair (txid, invoiceid) is
> already mentioned in some journal and if it is, add the pair to the
> journal, otherwise create a new journal with that pair as the first
> entry. I believe this can easily be implemented in a PL. But still I
> wonder if there is some SQL incantation that can do the same without
> any PL.
>


Interesting problem.

I think there are no SQL-only solution.

Using arrays and PL/pgSQL function:
bdteste=# SELECT * FROM bar;aid | bid
-----+-----A   |   1A   |   3B   |   1B   |   2C   |   5D   |   6D   |   7E   |   8F   |   8
(9 registros)

bdteste=# CREATE OR REPLACE FUNCTION combina() RETURNS setof record AS $$
bdteste$# DECLARE
bdteste$#    res   record;
bdteste$#    res1  record;
bdteste$# BEGIN
bdteste$#    CREATE TEMP TABLE foobar(
bdteste$#       fbaid   text[],
bdteste$#       fbbid   int[])
bdteste$#    ON COMMIT DROP;
bdteste$#
bdteste$#    FOR res IN SELECT agr1, bid FROM (SELECT bid,
array_accum(aid) AS agr1 FROM bar
bdteste$#                  GROUP BY bid) b1 ORDER BY array_upper(agr1,
1) DESC, agr1 LOOP
bdteste$#       SELECT * INTO res1 FROM foobar WHERE fbaid @> res.agr1;
bdteste$#       IF NOT FOUND THEN
bdteste$#          INSERT INTO foobar VALUES (res.agr1, array[res.bid]);
bdteste$#       ELSE
bdteste$#          UPDATE foobar SET fbbid = array_append(fbbid,
res.bid) WHERE fbaid @> res.agr1;
bdteste$#       END IF;
bdteste$#    END LOOP;
bdteste$#
bdteste$#    RETURN QUERY SELECT * FROM foobar;
bdteste$# END;
bdteste$# $$ LANGUAGE PLPGSQL;
CREATE FUNCTION
bdteste=#
bdteste=# SELECT * FROM combina() AS(a text[], b int[]);  a   |    b
-------+---------{E,F} | {8}{A,B} | {1,3,2}{C}   | {5}{D}   | {7,6}
(4 registros)

Osvaldo

PS. - Aggregate array_accum defined at:
http://www.postgresql.org/docs/current/interactive/xaggr.html
- If you need sorted arrays use Andreas Kretschmer's function array_sort:
http://archives.postgresql.org/pgsql-general/2007-02/msg01534.php