Fwd: grouping/clustering query - Mailing list pgsql-sql

From Osvaldo Kussama
Subject Fwd: grouping/clustering query
Date
Msg-id 690707f60810271228t316cc37dofff3f1831a2e8c2f@mail.gmail.com
Whole thread Raw
In response to Re: grouping/clustering query  (Steve Midgley <science@misuse.org>)
List pgsql-sql
I forgot the list.

---------- Forwarded message ----------
From: Osvaldo Kussama <osvaldo.kussama@gmail.com>
Date: Mon, 27 Oct 2008 12:28:57 -0200
Subject: Re: [SQL] grouping/clustering query
To: David Garamond <davidgaramond@gmail.com>

2008/10/24, David Garamond <davidgaramond@gmail.com>:
> Tony, Joe, Steve,
>
> Thanks for the follow-ups. Yes, the problem is related to double-entry
> accounting, where one needs to balance total debit and credit
> (payments and invoices) in each journal/transaction.
>
> Due to time constraint, I ended up doing this in the client-side
> programming language, since I am nowhere near fluent in PLs. The
> algorithm should be simple (at least the "brute force" version), it's
> basically checking if each element of the pair (txid, invoiceid) is
> already mentioned in some journal and if it is, add the pair to the
> journal, otherwise create a new journal with that pair as the first
> entry. I believe this can easily be implemented in a PL. But still I
> wonder if there is some SQL incantation that can do the same without
> any PL.
>


Interesting problem.

I think there are no SQL-only solution.

Using arrays and PL/pgSQL function:
bdteste=# SELECT * FROM bar;aid | bid
-----+-----A   |   1A   |   3B   |   1B   |   2C   |   5D   |   6D   |   7E   |   8F   |   8
(9 registros)

bdteste=# CREATE OR REPLACE FUNCTION combina() RETURNS setof record AS $$
bdteste$# DECLARE
bdteste$#    res   record;
bdteste$#    res1  record;
bdteste$# BEGIN
bdteste$#    CREATE TEMP TABLE foobar(
bdteste$#       fbaid   text[],
bdteste$#       fbbid   int[])
bdteste$#    ON COMMIT DROP;
bdteste$#
bdteste$#    FOR res IN SELECT agr1, bid FROM (SELECT bid,
array_accum(aid) AS agr1 FROM bar
bdteste$#                  GROUP BY bid) b1 ORDER BY array_upper(agr1,
1) DESC, agr1 LOOP
bdteste$#       SELECT * INTO res1 FROM foobar WHERE fbaid @> res.agr1;
bdteste$#       IF NOT FOUND THEN
bdteste$#          INSERT INTO foobar VALUES (res.agr1, array[res.bid]);
bdteste$#       ELSE
bdteste$#          UPDATE foobar SET fbbid = array_append(fbbid,
res.bid) WHERE fbaid @> res.agr1;
bdteste$#       END IF;
bdteste$#    END LOOP;
bdteste$#
bdteste$#    RETURN QUERY SELECT * FROM foobar;
bdteste$# END;
bdteste$# $$ LANGUAGE PLPGSQL;
CREATE FUNCTION
bdteste=#
bdteste=# SELECT * FROM combina() AS(a text[], b int[]);  a   |    b
-------+---------{E,F} | {8}{A,B} | {1,3,2}{C}   | {5}{D}   | {7,6}
(4 registros)

Osvaldo

PS. - Aggregate array_accum defined at:
http://www.postgresql.org/docs/current/interactive/xaggr.html
- If you need sorted arrays use Andreas Kretschmer's function array_sort:
http://archives.postgresql.org/pgsql-general/2007-02/msg01534.php


pgsql-sql by date:

Previous
From: "Jan Peters"
Date:
Subject: How to hand over array as variable in plpgsql function?
Next
From: Tom Lane
Date:
Subject: Re: How to hand over array as variable in plpgsql function?