Home > mailing lists

Re: Ugly group by problem - Mailing list pgsql-sql

From	Achilleus Mantzios
Subject	Re: Ugly group by problem
Date	March 29, 2006 10:35:32
Msg-id	Pine.LNX.4.44.0603291838520.5673-100000@matrix.gatewaynet.com Whole thread Raw
In response to	Ugly group by problem (Markus Schaber <schabi@logix-tt.com>)
Responses	Re: Ugly group by problem Re: Ugly group by problem
List	pgsql-sql

Tree view

O Markus Schaber έγραψε στις Mar 29, 2006 :

> Hello,
> 
> I have a table filled from third party that basically has the following
> structure:
> 
> link_id | feat_id | other | columns...
> --------+---------+-------+-----------
> 1       | 2       | ...
> 2       | 5       | ...
> 2       | 23      | ...
> 3       | 5       | ...
> 3       | 23      | some  | data
> 3       | 23      | other | data
> 5       | 23      | ...
> 9       | 23      | ...
> 
> This structure is fixed, and we can't change it, but we can create
> 
> We have about 37 million different link_ids, and 35 million feat_ids.
> There are feat_ids that appear at several thousand link_ids, but a
> link_id does not have more than a douzen feat_ids.
> 
> Now I need to group together all link_ids that have the same set of
> feat_ids. In the example above, the sets would be (1), (2,3) and (5,9),
> and the other columns would be run through some aggregate functions.
> 
> Currently, this is done via an external JAVA application, but I'm
> looking for a way to express this via sql / plpgsql to ease deployment.
> 
> I could imagine some ugly code using ARRAY (not tried yet), but how
> would you pack this problem? It seems that I'm just stuck in my thoughts
> and miss the beauty way to solve it.

Well i did it with arrays, i dont know the performance 
implications tho:

foodb=# SELECT * from markustest ;link_id | feat_id | other
---------+---------+-------      1 |       2 | 1      2 |       5 | 2      2 |      23 | 2      3 |       5 | 3      3
|     23 | 3      3 |      23 | 3      5 |      23 | 5      9 |      23 | 9
 
(8 rows)

foodb=# SELECT qoo.foo2,sum(qoo.foo3) from (SELECT mt.link_id as 
foo,_int_union(array(select mt2.feat_id from markustest mt2 where 
mt2.link_id=mt.link_id order by mt2.feat_id),'{}') as foo2,other::int4 as 
foo3 from markustest mt) as qoo GROUP BY qoo.foo2; foo2  | sum
--------+-----{2}    |   1{5,23} |  13{23}   |  14
(3 rows)

foodb=#           

The _int_union trick is to force the arrays to have unique values.
The order by has the meaning that '{5,23}' and '{23,5}' should be treated
the same way.

> 
> Thanks,
> Markus
> 

-- 
-Achilleus

pgsql-sql by date:

From: Markus Schaber
Date: 29 March 2006, 09:40:20
Subject: Ugly group by problem

From: Scott Marlowe
Date: 29 March 2006, 12:50:13
Subject: Re: Flight numbers data

Re: Ugly group by problem - Mailing list pgsql-sql

Previous

Next