Re: grouping/clustering query - Mailing list pgsql-sql

From Tony Wasson
Subject Re: grouping/clustering query
Date
Msg-id 6d8daee30810241348x4a5d4f22le5622be56a1ade27@mail.gmail.com
Whole thread Raw
In response to Re: grouping/clustering query  ("Tony Wasson" <ajwasson@gmail.com>)
Responses Re: grouping/clustering query
List pgsql-sql
On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <ajwasson@gmail.com> wrote:
> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science@misuse.org> wrote:
>> At 11:28 AM 10/23/2008, Joe wrote:
>>>
>>> Steve Midgley wrote:
>>>>>
>>>>> # (invoiceid, txid)
>>>>> (A, 1)
>>>>> (A, 3)
>>>>> (B, 1)
>>>>> (B, 2)
>>>>> (C, 5)
>>>>> (D, 6)
>>>>> (D, 7)
>>>>> (E, 8)
>>>>> (F, 8)
>>>>>
>>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>>> query that can generate this output:
>>>>>
>>>>> # (journal: invoiceids, txids)
>>>>> [A,B] , [1,2,3]
>>>>> [C], [5]
>>>>> [D], [6,7]
>>>>> [E,F], [8]
>>>>
>>>> Hi Dave,
>>>>
>>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>>> the first line print:
>>>>
>>>>> [A,B] , [1,2,3]
>>>>
>>>> What's the rule that tells the query to output this way? Is it that all
>>>> of B's values are between A's values?
>>>
>>> From a purely accounting standpoint, since transaction 1 was applied to
>>> both invoices A and B, you need to group the invoices so that you can
>>> compare total invoiced against total paid.
>>
>> I tinkered around briefly but didn't come up with a good idea, but I bet
>> someone on this list can. However, I did create a CREATE script for your
>> table design which, in my experience, makes it more likely that a real
>> expert will take on your problem..
>>
>> Hope this helps,
>>
>> Steve
>>
>> DROP TABLE IF EXISTS trans;
>>
>> CREATE TABLE trans
>> (
>>  id serial NOT NULL,
>>  inv_id character varying,
>>  tx_id character varying,
>>  CONSTRAINT pk_id PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>>
>> insert into trans (inv_id, tx_id) values('A','1');
>> insert into trans (inv_id, tx_id) values('A','3');
>> insert into trans (inv_id, tx_id) values('B','1');
>> insert into trans (inv_id, tx_id) values('B','2');
>> insert into trans (inv_id, tx_id) values('C','5');
>> insert into trans (inv_id, tx_id) values('D','6');
>> insert into trans (inv_id, tx_id) values('D','7');
>> insert into trans (inv_id, tx_id) values('E','8');
>> insert into trans (inv_id, tx_id) values('F','8');


Here's a stab at a custom aggregate attempting to explain what's going
on inside.

CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
VARCHAR) RETURNS VARCHAR[] AS $$
DECLAREres VARCHAR[];
BEGINRAISE NOTICE 'input state is %',x;RAISE NOTICE 'input variable is %',y;IF x = '{}' THEN       RAISE NOTICE 'x is
empty,returning input variable %',y;       res[1] := y;ELSE       RAISE NOTICE 'input array is not empty, checking if
input
variable is a member %',y;       res := x;       IF y = ANY(res) THEN               RAISE NOTICE 'y is already in array
%,skipping',res;       ELSE               res := array_append(res, y);               RAISE NOTICE 'appending input
variable%',y;       END IF;END IF;RETURN res;
 
END
$$ LANGUAGE plpgsql STRICT;

DROP AGGREGATE array_accum_unique(VARCHAR);
CREATE AGGREGATE array_accum_unique
(   basetype = VARCHAR   , sfunc = varchar_array_accum_unique   , stype = VARCHAR[]   , initcond = '{}'
);

SELECT array_accum_unique(inv_id) AS invoiceids, array_accum_unique(tx_id) AS transactionids
FROM (       SELECT tx_id, inv_id       FROM trans       WHERE inv_id IN (                               SELECT inv_id
                            FROM trans                               WHERE id IN (
        SELECT id FROM trans
 
WHERE tx_id=1                                           )                       )       ORDER BY tx_id, inv_id     ) AS
ss
;


Returns this result for transaction id 1.
invoiceids | transactionids
------------+----------------{A,B}      | {1,2,3}
(1 row)

Hope this helps!
Tony


pgsql-sql by date:

Previous
From: "Tony Wasson"
Date:
Subject: Re: grouping/clustering query
Next
From: "David Garamond"
Date:
Subject: Re: grouping/clustering query