Re: grouping/clustering query - Mailing list pgsql-sql
From | Tony Wasson |
---|---|
Subject | Re: grouping/clustering query |
Date | |
Msg-id | 6d8daee30810241348x4a5d4f22le5622be56a1ade27@mail.gmail.com Whole thread Raw |
In response to | Re: grouping/clustering query ("Tony Wasson" <ajwasson@gmail.com>) |
Responses |
Re: grouping/clustering query
|
List | pgsql-sql |
On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <ajwasson@gmail.com> wrote: > On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote: >> At 11:28 AM 10/23/2008, Joe wrote: >>> >>> Steve Midgley wrote: >>>>> >>>>> # (invoiceid, txid) >>>>> (A, 1) >>>>> (A, 3) >>>>> (B, 1) >>>>> (B, 2) >>>>> (C, 5) >>>>> (D, 6) >>>>> (D, 7) >>>>> (E, 8) >>>>> (F, 8) >>>>> >>>>> For journalling, I need to group/cluster this together. Is there a SQL >>>>> query that can generate this output: >>>>> >>>>> # (journal: invoiceids, txids) >>>>> [A,B] , [1,2,3] >>>>> [C], [5] >>>>> [D], [6,7] >>>>> [E,F], [8] >>>> >>>> Hi Dave, >>>> >>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does >>>> the first line print: >>>> >>>>> [A,B] , [1,2,3] >>>> >>>> What's the rule that tells the query to output this way? Is it that all >>>> of B's values are between A's values? >>> >>> From a purely accounting standpoint, since transaction 1 was applied to >>> both invoices A and B, you need to group the invoices so that you can >>> compare total invoiced against total paid. >> >> I tinkered around briefly but didn't come up with a good idea, but I bet >> someone on this list can. However, I did create a CREATE script for your >> table design which, in my experience, makes it more likely that a real >> expert will take on your problem.. >> >> Hope this helps, >> >> Steve >> >> DROP TABLE IF EXISTS trans; >> >> CREATE TABLE trans >> ( >> id serial NOT NULL, >> inv_id character varying, >> tx_id character varying, >> CONSTRAINT pk_id PRIMARY KEY (id) >> ) >> WITH (OIDS=FALSE); >> >> insert into trans (inv_id, tx_id) values('A','1'); >> insert into trans (inv_id, tx_id) values('A','3'); >> insert into trans (inv_id, tx_id) values('B','1'); >> insert into trans (inv_id, tx_id) values('B','2'); >> insert into trans (inv_id, tx_id) values('C','5'); >> insert into trans (inv_id, tx_id) values('D','6'); >> insert into trans (inv_id, tx_id) values('D','7'); >> insert into trans (inv_id, tx_id) values('E','8'); >> insert into trans (inv_id, tx_id) values('F','8'); Here's a stab at a custom aggregate attempting to explain what's going on inside. CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y VARCHAR) RETURNS VARCHAR[] AS $$ DECLAREres VARCHAR[]; BEGINRAISE NOTICE 'input state is %',x;RAISE NOTICE 'input variable is %',y;IF x = '{}' THEN RAISE NOTICE 'x is empty,returning input variable %',y; res[1] := y;ELSE RAISE NOTICE 'input array is not empty, checking if input variable is a member %',y; res := x; IF y = ANY(res) THEN RAISE NOTICE 'y is already in array %,skipping',res; ELSE res := array_append(res, y); RAISE NOTICE 'appending input variable%',y; END IF;END IF;RETURN res; END $$ LANGUAGE plpgsql STRICT; DROP AGGREGATE array_accum_unique(VARCHAR); CREATE AGGREGATE array_accum_unique ( basetype = VARCHAR , sfunc = varchar_array_accum_unique , stype = VARCHAR[] , initcond = '{}' ); SELECT array_accum_unique(inv_id) AS invoiceids, array_accum_unique(tx_id) AS transactionids FROM ( SELECT tx_id, inv_id FROM trans WHERE inv_id IN ( SELECT inv_id FROM trans WHERE id IN ( SELECT id FROM trans WHERE tx_id=1 ) ) ORDER BY tx_id, inv_id ) AS ss ; Returns this result for transaction id 1. invoiceids | transactionids ------------+----------------{A,B} | {1,2,3} (1 row) Hope this helps! Tony