Re: grouping/clustering query - Mailing list pgsql-sql

From Tony Wasson
Subject Re: grouping/clustering query
Date
Msg-id 6d8daee30810241024i661c73eag6edc20ab051da09b@mail.gmail.com
Whole thread Raw
In response to Re: grouping/clustering query  (Steve Midgley <science@misuse.org>)
Responses Re: grouping/clustering query  ("Tony Wasson" <ajwasson@gmail.com>)
List pgsql-sql
On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote:
> At 11:28 AM 10/23/2008, Joe wrote:
>>
>> Steve Midgley wrote:
>>>>
>>>> # (invoiceid, txid)
>>>> (A, 1)
>>>> (A, 3)
>>>> (B, 1)
>>>> (B, 2)
>>>> (C, 5)
>>>> (D, 6)
>>>> (D, 7)
>>>> (E, 8)
>>>> (F, 8)
>>>>
>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>> query that can generate this output:
>>>>
>>>> # (journal: invoiceids, txids)
>>>> [A,B] , [1,2,3]
>>>> [C], [5]
>>>> [D], [6,7]
>>>> [E,F], [8]
>>>
>>> Hi Dave,
>>>
>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>> the first line print:
>>>
>>>> [A,B] , [1,2,3]
>>>
>>> What's the rule that tells the query to output this way? Is it that all
>>> of B's values are between A's values?
>>
>> From a purely accounting standpoint, since transaction 1 was applied to
>> both invoices A and B, you need to group the invoices so that you can
>> compare total invoiced against total paid.
>
> I tinkered around briefly but didn't come up with a good idea, but I bet
> someone on this list can. However, I did create a CREATE script for your
> table design which, in my experience, makes it more likely that a real
> expert will take on your problem..
>
> Hope this helps,
>
> Steve
>
> DROP TABLE IF EXISTS trans;
>
> CREATE TABLE trans
> (
>  id serial NOT NULL,
>  inv_id character varying,
>  tx_id character varying,
>  CONSTRAINT pk_id PRIMARY KEY (id)
> )
> WITH (OIDS=FALSE);
>
> insert into trans (inv_id, tx_id) values('A','1');
> insert into trans (inv_id, tx_id) values('A','3');
> insert into trans (inv_id, tx_id) values('B','1');
> insert into trans (inv_id, tx_id) values('B','2');
> insert into trans (inv_id, tx_id) values('C','5');
> insert into trans (inv_id, tx_id) values('D','6');
> insert into trans (inv_id, tx_id) values('D','7');
> insert into trans (inv_id, tx_id) values('E','8');
> insert into trans (inv_id, tx_id) values('F','8');

This is as close as I can get the data. I think I'd need a custom
array grouping aggregate to get the results to match completely.
Notice how ABC are on their own lines?

test=# SELECT inv_array, tx_array
FROM (       SELECT tx_id, array_accum(inv_id) AS inv_array       FROM trans       GROUP BY tx_id       ORDER BY tx_id
 ) AS t
 
JOIN (       SELECT inv_id, array_accum(tx_id) AS tx_array       FROM trans       GROUP BY inv_id       ORDER BY inv_id
  ) AS i ON (t.tx_id = ANY(i.tx_array) OR i.inv_id =ANY(t.inv_array))
 
GROUP BY tx_array,inv_array
;inv_array | tx_array
-----------+----------{A,B}     | {1,2}{B}       | {1,2}{A}       | {1,3}{A,B}     | {1,3}{C}       | {5}{D}       |
{6,7}{E,F}    | {8}
 
(7 rows)


pgsql-sql by date:

Previous
From: "Oliveiros Cristina"
Date:
Subject: Re: grouping/clustering query
Next
From: "Tony Wasson"
Date:
Subject: Re: grouping/clustering query