Re: Removing redundant itemsets - Mailing list pgsql-sql

From Allan Kamau
Subject Re: Removing redundant itemsets
Date
Msg-id 47F0B5DA.7020101@sanbi.ac.za
Whole thread Raw
In response to Re: Removing redundant itemsets  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-sql
Craig,
Thank you so much for the solution. I have spent many hours since 
Thursday last week including the weekend (and it took you just a few 
minutes) trying to figure out a solution not involving procedural 
programming and looping (as the size of the items and even the number of 
"purchases" in the datasets I may be working with may be large), I was 
looking for a solution that may take (almost) polynomial time (and 
resources) and also make use of Postgresql refined and efficient engine. 
Your solution satisfies these requirements. Thanks.

Allan.

Craig Ringer wrote:
> Allan Kamau wrote:
>   
>> Hi all,
>> I have a list of purchases (market basket) and I would like to select
>> non redundant longest possible patterns by eliminating
>> (creating/populating other table to contain only non redandant itemsets)
>> purchases having item lists which are fully included in at least one
>> other purchase.
>>     
>
> Here's a possibly slow and surely ugly solution (I think it's right,
> though I haven't done more than passing testing):
>
>
>
> CREATE VIEW togo_as_arr AS
>   SELECT a.tid,
>     ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
>     AS items
>   FROM togo a GROUP BY tid;
>
> SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
> FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
> WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;
>
>
>
> (the view isn't necessary, but does improve the readability of the query).
>
> It groups the purchases up with item lists as arrays, then finds any
> purchases with items arrays wholly contained by other item arrays from
> other purchases.
>
> I'm *sure* there's a smarter way to do this that avoids the use of
> arrays, but I don't seem to be able to come up with one right now. It's
> interesting, though, so I might keep fiddling.
>
> --
> Craig Ringer
>   



pgsql-sql by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Removing redundant itemsets
Next
From: Emi Lu
Date:
Subject: drop table where tableName like 'backup_2007%' ?