Re: grouping/clustering query - Mailing list pgsql-sql
| From | Tony Wasson |
|---|---|
| Subject | Re: grouping/clustering query |
| Date | |
| Msg-id | 6d8daee30810241348x4a5d4f22le5622be56a1ade27@mail.gmail.com Whole thread Raw |
| In response to | Re: grouping/clustering query ("Tony Wasson" <ajwasson@gmail.com>) |
| Responses |
Re: grouping/clustering query
|
| List | pgsql-sql |
On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <ajwasson@gmail.com> wrote:
> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote:
>> At 11:28 AM 10/23/2008, Joe wrote:
>>>
>>> Steve Midgley wrote:
>>>>>
>>>>> # (invoiceid, txid)
>>>>> (A, 1)
>>>>> (A, 3)
>>>>> (B, 1)
>>>>> (B, 2)
>>>>> (C, 5)
>>>>> (D, 6)
>>>>> (D, 7)
>>>>> (E, 8)
>>>>> (F, 8)
>>>>>
>>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>>> query that can generate this output:
>>>>>
>>>>> # (journal: invoiceids, txids)
>>>>> [A,B] , [1,2,3]
>>>>> [C], [5]
>>>>> [D], [6,7]
>>>>> [E,F], [8]
>>>>
>>>> Hi Dave,
>>>>
>>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>>> the first line print:
>>>>
>>>>> [A,B] , [1,2,3]
>>>>
>>>> What's the rule that tells the query to output this way? Is it that all
>>>> of B's values are between A's values?
>>>
>>> From a purely accounting standpoint, since transaction 1 was applied to
>>> both invoices A and B, you need to group the invoices so that you can
>>> compare total invoiced against total paid.
>>
>> I tinkered around briefly but didn't come up with a good idea, but I bet
>> someone on this list can. However, I did create a CREATE script for your
>> table design which, in my experience, makes it more likely that a real
>> expert will take on your problem..
>>
>> Hope this helps,
>>
>> Steve
>>
>> DROP TABLE IF EXISTS trans;
>>
>> CREATE TABLE trans
>> (
>> id serial NOT NULL,
>> inv_id character varying,
>> tx_id character varying,
>> CONSTRAINT pk_id PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>>
>> insert into trans (inv_id, tx_id) values('A','1');
>> insert into trans (inv_id, tx_id) values('A','3');
>> insert into trans (inv_id, tx_id) values('B','1');
>> insert into trans (inv_id, tx_id) values('B','2');
>> insert into trans (inv_id, tx_id) values('C','5');
>> insert into trans (inv_id, tx_id) values('D','6');
>> insert into trans (inv_id, tx_id) values('D','7');
>> insert into trans (inv_id, tx_id) values('E','8');
>> insert into trans (inv_id, tx_id) values('F','8');
Here's a stab at a custom aggregate attempting to explain what's going
on inside.
CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
VARCHAR) RETURNS VARCHAR[] AS $$
DECLAREres VARCHAR[];
BEGINRAISE NOTICE 'input state is %',x;RAISE NOTICE 'input variable is %',y;IF x = '{}' THEN RAISE NOTICE 'x is
empty,returning input variable %',y; res[1] := y;ELSE RAISE NOTICE 'input array is not empty, checking if
input
variable is a member %',y; res := x; IF y = ANY(res) THEN RAISE NOTICE 'y is already in array
%,skipping',res; ELSE res := array_append(res, y); RAISE NOTICE 'appending input
variable%',y; END IF;END IF;RETURN res;
END
$$ LANGUAGE plpgsql STRICT;
DROP AGGREGATE array_accum_unique(VARCHAR);
CREATE AGGREGATE array_accum_unique
( basetype = VARCHAR , sfunc = varchar_array_accum_unique , stype = VARCHAR[] , initcond = '{}'
);
SELECT array_accum_unique(inv_id) AS invoiceids, array_accum_unique(tx_id) AS transactionids
FROM ( SELECT tx_id, inv_id FROM trans WHERE inv_id IN ( SELECT inv_id
FROM trans WHERE id IN (
SELECT id FROM trans
WHERE tx_id=1 ) ) ORDER BY tx_id, inv_id ) AS
ss
;
Returns this result for transaction id 1.
invoiceids | transactionids
------------+----------------{A,B} | {1,2,3}
(1 row)
Hope this helps!
Tony