Re: grouping/clustering query - Mailing list pgsql-sql
From | Tony Wasson |
---|---|
Subject | Re: grouping/clustering query |
Date | |
Msg-id | 6d8daee30810241024i661c73eag6edc20ab051da09b@mail.gmail.com Whole thread Raw |
In response to | Re: grouping/clustering query (Steve Midgley <science@misuse.org>) |
Responses |
Re: grouping/clustering query
|
List | pgsql-sql |
On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote: > At 11:28 AM 10/23/2008, Joe wrote: >> >> Steve Midgley wrote: >>>> >>>> # (invoiceid, txid) >>>> (A, 1) >>>> (A, 3) >>>> (B, 1) >>>> (B, 2) >>>> (C, 5) >>>> (D, 6) >>>> (D, 7) >>>> (E, 8) >>>> (F, 8) >>>> >>>> For journalling, I need to group/cluster this together. Is there a SQL >>>> query that can generate this output: >>>> >>>> # (journal: invoiceids, txids) >>>> [A,B] , [1,2,3] >>>> [C], [5] >>>> [D], [6,7] >>>> [E,F], [8] >>> >>> Hi Dave, >>> >>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does >>> the first line print: >>> >>>> [A,B] , [1,2,3] >>> >>> What's the rule that tells the query to output this way? Is it that all >>> of B's values are between A's values? >> >> From a purely accounting standpoint, since transaction 1 was applied to >> both invoices A and B, you need to group the invoices so that you can >> compare total invoiced against total paid. > > I tinkered around briefly but didn't come up with a good idea, but I bet > someone on this list can. However, I did create a CREATE script for your > table design which, in my experience, makes it more likely that a real > expert will take on your problem.. > > Hope this helps, > > Steve > > DROP TABLE IF EXISTS trans; > > CREATE TABLE trans > ( > id serial NOT NULL, > inv_id character varying, > tx_id character varying, > CONSTRAINT pk_id PRIMARY KEY (id) > ) > WITH (OIDS=FALSE); > > insert into trans (inv_id, tx_id) values('A','1'); > insert into trans (inv_id, tx_id) values('A','3'); > insert into trans (inv_id, tx_id) values('B','1'); > insert into trans (inv_id, tx_id) values('B','2'); > insert into trans (inv_id, tx_id) values('C','5'); > insert into trans (inv_id, tx_id) values('D','6'); > insert into trans (inv_id, tx_id) values('D','7'); > insert into trans (inv_id, tx_id) values('E','8'); > insert into trans (inv_id, tx_id) values('F','8'); This is as close as I can get the data. I think I'd need a custom array grouping aggregate to get the results to match completely. Notice how ABC are on their own lines? test=# SELECT inv_array, tx_array FROM ( SELECT tx_id, array_accum(inv_id) AS inv_array FROM trans GROUP BY tx_id ORDER BY tx_id ) AS t JOIN ( SELECT inv_id, array_accum(tx_id) AS tx_array FROM trans GROUP BY inv_id ORDER BY inv_id ) AS i ON (t.tx_id = ANY(i.tx_array) OR i.inv_id =ANY(t.inv_array)) GROUP BY tx_array,inv_array ;inv_array | tx_array -----------+----------{A,B} | {1,2}{B} | {1,2}{A} | {1,3}{A,B} | {1,3}{C} | {5}{D} | {6,7}{E,F} | {8} (7 rows)