Thread: Need help in grouping records

Need help in grouping records

From

Andreas

Date:

19 May 2012, 16:04:54

Hi,

I'm trying to fight against double entries in tables.
I got as far as I can find similar records with trigram string matching.
If I do this with a table compared to itself I get something like this:

id_a, id_b
3,   5
3,   7
5,   3
5,   7
7,   3
7,   5
11,   13
13,   11

so the records with the IDs 3, 5, 7 and 11, 13 are similar enough to 
form a group.

How would I get a list of record-IDs with a group-ID like this

record_id, group_id
3,   1
5,   1
7,   1
11,   2
13,   2


Is there a way to get this by SQL ?

Re: Need help in grouping records

From

Jasen Betts

Date:

20 May 2012, 00:32:06

On 2012-05-19, Andreas <maps.on@gmx.net> wrote:
> Hi,
>
> I'm trying to fight against double entries in tables.
> I got as far as I can find similar records with trigram string matching.
> If I do this with a table compared to itself I get something like this:
>
> id_a, id_b
> 3,   5
> 3,   7
> 5,   3
> 5,   7
> 7,   3
> 7,   5
> 11,   13
> 13,   11
>
> so the records with the IDs 3, 5, 7 and 11, 13 are similar enough to 
> form a group.
>
> How would I get a list of record-IDs with a group-ID like this
>
> record_id, group_id
> 3,   1
> 5,   1
> 7,   1
> 11,   2
> 13,   2
>
> Is there a way to get this by SQL ?

 select id_a,min(least(id_a,id_b)) from SOMETHING group by id_a

close enough?

or this: ?
 select id_a, rank() over order by g  from  ( select id_a,min(least(id_a,id_b)) as g from SOMETHING group by id_a ) as
foo



-- 
⚂⚃ 100% natural

Re: Need help in grouping records

From

Andreas

Date:

20 May 2012, 06:08:42

Am 20.05.2012 05:04, schrieb Jasen Betts:
> On 2012-05-19, Andreas<maps.on@gmx.net>  wrote:
>> Hi,
>>
>> I'm trying to fight against double entries in tables.
>> I got as far as I can find similar records with trigram string matching.
>> If I do this with a table compared to itself I get something like this:
>>
>> id_a, id_b
>> 3,   5
>> 3,   7
>> 5,   3
>> 5,   7
>> 7,   3
>> 7,   5
>> 11,   13
>> 13,   11
>>
>> so the records with the IDs 3, 5, 7 and 11, 13 are similar enough to
>> form a group.
>>
>> How would I get a list of record-IDs with a group-ID like this
>>
>> record_id, group_id
>> 3,   1
>> 5,   1
>> 7,   1
>> 11,   2
>> 13,   2
>>
>> Is there a way to get this by SQL ?
>
>    select id_a,min(least(id_a,id_b)) from SOMETHING group by id_a
>
> close enough?
>
> or this: ?
>
>    select id_a, rank() over order by g  from
>    ( select id_a,min(least(id_a,id_b)) as g from SOMETHING group by id_a ) as foo
>
>
>
Thanks   :)
Thats by far more elegant as my approach with arrays I figured out in 
the meantime.

I changed rank() to dense_rank() in your solution.

Functionally the 1st line does allready all the magic, though.

Great   :)

master/detail

From

Jan Bakuwel

Date:

20 May 2012, 20:17:35

Hi,

I'm trying to get my head around the following question. As an example
take a table with products:

productid (pk)
name

and productprice

productpriceid (pk)
productid (fk)
pricedate
price

There are multiple records in productprice for each product as prices
(often) go up and (less often) go down.

I'm looking for a query that returns the following:

productid, name, pricedate, current_price, difference

current_price is the latest (ie. most recent date) price of the product
and difference is the difference in price between the latest price and
the price before the latest.

Any suggestions how to do this with SQL only? I can make it work with a
function (probably less efficient) but think this should be possible
with SQL too...

cheers,
Jan

Re: master/detail

From

"Raj Mathur (राज माथुर)"

Date:

21 May 2012, 00:11:16

On Monday 21 May 2012, Jan Bakuwel wrote:
> Hi,
> 
> I'm trying to get my head around the following question. As an
> example take a table with products:
> 
> productid (pk)
> name
> 
> and productprice
> 
> productpriceid (pk)
> productid (fk)
> pricedate
> price
> 
> There are multiple records in productprice for each product as prices
> (often) go up and (less often) go down.
> 
> I'm looking for a query that returns the following:
> 
> productid, name, pricedate, current_price, difference
> 
> current_price is the latest (ie. most recent date) price of the
> product and difference is the difference in price between the latest
> price and the price before the latest.
> 
> Any suggestions how to do this with SQL only? I can make it work with
> a function (probably less efficient) but think this should be
> possible with SQL too...

Something like this ought to do it (not tested):

select latest.price, latest.price - next.price
from (select price from productprice where productid = 1   order by pricedate desc limit 1) latest, (select price from
productpricewhere productid  = 1   order by pricedate desc limit 2 offset 1) next;
 

Regards,

-- Raj
-- 
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F

Re: master/detail

From

Mario Dankoor

Date:

21 May 2012, 08:57:45

On 2012-05-21 1:17 AM, Jan Bakuwel wrote:
> Hi,
>
> I'm trying to get my head around the following question. As an example
> take a table with products:
>
> productid (pk)
> name
>
> and productprice
>
> productpriceid (pk)
> productid (fk)
> pricedate
> price
>
> There are multiple records in productprice for each product as prices
> (often) go up and (less often) go down.
>
> I'm looking for a query that returns the following:
>
> productid, name, pricedate, current_price, difference
>
> current_price is the latest (ie. most recent date) price of the product
> and difference is the difference in price between the latest price and
> the price before the latest.
>
> Any suggestions how to do this with SQL only? I can make it work with a
> function (probably less efficient) but think this should be possible
> with SQL too...
>
> cheers,
> Jan
>
>
With windowing functions:

select  FRS.name       ,FRS.productprice  as current_price       ,FRS.productprice- NXT.productprice
from
(
SELECT PRC.productid       ,PRC.productprice       ,ROW_NUMBER() OVER(PARTITION BY PRC.productid ORDER BY 
PRC.pricedate desc) rank_nr
FROM  productprice PRC     ,products     PRD
WHERE 1 = 1
AND PRD.productid = PRC.productid
) FRS
LEFT JOIN
(
SELECT  productid       ,productprice       ,ROW_NUMBER() OVER(PARTITION BY productid ORDER BY pricedate 
desc) rank_nr
FROM productprice
) NXT
ON FRS.productid = NXT.productid
AND NXT.rank_nr < 3
AND FRS.rank_nr <> NXT.rank_nr
WHERE 1 = 1
AND FRS.rank_nr = 1

Without windowing functions (slightly complicated, it's merely a max-mix 
issue and assuming that the date includes the time)

SELECT  CUR.name       ,CUR.productid       ,CUR.current_price       ,CUR.current_price - PRV.previous_price
price_difference
FROM ( --LATEST PRICE
SELECT PRD.name              ,PRC.productid
,PRC.productprice current_price       FROM  productprice PRC     ,products     PRD     ,(--LATEST PRICEDATE
SELECT productid       ,MAX(pricedate) pricedate
 
FROM productprice
GROUP BY productid
) MMX
WHERE 1 = 1
AND PRC.productid = PRD.productid
AND PRC.productid = MMX.productid
AND PRC.pricedate = MMX.pricedate     )CUR
-- IN CASE THERE'S NO PREVIOUS PRICE
LEFT JOIN
(  -- PREVIOUS PRICE
SELECT  PRC.productid       ,PRC.productprice previous_price
FROM  productprice PRC     ,(--PREVIOUS PRICEDATE       SELECT PRC.productid       ,MAX(PRC.pricedate) pricedate
FROM  productprice PRC     ,(
SELECT  productid       ,MAX(pricedate) pricedate
FROM productprice
GROUP BY productid
) MMX
WHERE 1 = 1
AND PRC.productid = MMX.productid
AND PRC.pricedate < MMX.pricedate      ) PRV
WHERE 1 = 1
AND PRC.productid = PRV.productid
AND PRC.pricedate = PRV.pricedate
) LST
ON CUR.productid = LST.productid

cheers

Mario

Re: master/detail

From

Jan Bakuwel

Date:

22 May 2012, 04:29:03

Hi Andreas and Mario,

Many thanks for your suggestion to use PostgreSQL's window function,
exactly what I needed.

kind regards,
Jan

Re: master/detail

From

Andreas Kretschmer

Date:

23 May 2012, 17:25:49



Jan Bakuwel <jan.bakuwel@greenpeace.org> hat am 21. Mai 2012 um 01:17
geschrieben:

> Hi,
>
> I'm trying to get my head around the following question. As an example
> take a table with products:
>
> productid (pk)
> name
>
> and productprice
>
> productpriceid (pk)
> productid (fk)
> pricedate
> price
>
> There are multiple records in productprice for each product as prices
> (often) go up and (less often) go down.
>
> I'm looking for a query that returns the following:
>
> productid, name, pricedate, current_price, difference
>
> current_price is the latest (ie. most recent date) price of the product
> and difference is the difference in price between the latest price and
> the price before the latest.
>
> Any suggestions how to do this with SQL only? I can make it work with a
> function (probably less efficient) but think this should be possible
> with SQL too...


You can use window-function, in your case something like:

test=# select * from productprice ;id | product | pricedate  | price
----+---------+------------+------- 1 |       1 | 2012-05-01 |    10 2 |       1 | 2012-05-05 |    15 3 |       1 |
2012-05-10|    12 4 |       1 | 2012-05-15 |    22
 
(4 rows)

test=*# select id, product, pricedate, price, lead(price) over (partition by
product order by pricedate desc), price - (lead(price) over (partition by
product order by pricedate desc)) from productprice;id | product | pricedate  | price | lead | ?column?
----+---------+------------+-------+------+---------- 4 |       1 | 2012-05-15 |    22 |   12 |       10 3 |       1 |
2012-05-10|    12 |   15 |       -3 2 |       1 | 2012-05-05 |    15 |   10 |        5 1 |       1 | 2012-05-01 |    10
|     |
 
(4 rows)

Regards, Andreas