Thread: Need help in grouping records
Hi, I'm trying to fight against double entries in tables. I got as far as I can find similar records with trigram string matching. If I do this with a table compared to itself I get something like this: id_a, id_b 3, 5 3, 7 5, 3 5, 7 7, 3 7, 5 11, 13 13, 11 so the records with the IDs 3, 5, 7 and 11, 13 are similar enough to form a group. How would I get a list of record-IDs with a group-ID like this record_id, group_id 3, 1 5, 1 7, 1 11, 2 13, 2 Is there a way to get this by SQL ?
On 2012-05-19, Andreas <maps.on@gmx.net> wrote: > Hi, > > I'm trying to fight against double entries in tables. > I got as far as I can find similar records with trigram string matching. > If I do this with a table compared to itself I get something like this: > > id_a, id_b > 3, 5 > 3, 7 > 5, 3 > 5, 7 > 7, 3 > 7, 5 > 11, 13 > 13, 11 > > so the records with the IDs 3, 5, 7 and 11, 13 are similar enough to > form a group. > > How would I get a list of record-IDs with a group-ID like this > > record_id, group_id > 3, 1 > 5, 1 > 7, 1 > 11, 2 > 13, 2 > > Is there a way to get this by SQL ? select id_a,min(least(id_a,id_b)) from SOMETHING group by id_a close enough? or this: ? select id_a, rank() over order by g from ( select id_a,min(least(id_a,id_b)) as g from SOMETHING group by id_a ) as foo -- ⚂⚃ 100% natural
Am 20.05.2012 05:04, schrieb Jasen Betts: > On 2012-05-19, Andreas<maps.on@gmx.net> wrote: >> Hi, >> >> I'm trying to fight against double entries in tables. >> I got as far as I can find similar records with trigram string matching. >> If I do this with a table compared to itself I get something like this: >> >> id_a, id_b >> 3, 5 >> 3, 7 >> 5, 3 >> 5, 7 >> 7, 3 >> 7, 5 >> 11, 13 >> 13, 11 >> >> so the records with the IDs 3, 5, 7 and 11, 13 are similar enough to >> form a group. >> >> How would I get a list of record-IDs with a group-ID like this >> >> record_id, group_id >> 3, 1 >> 5, 1 >> 7, 1 >> 11, 2 >> 13, 2 >> >> Is there a way to get this by SQL ? > > select id_a,min(least(id_a,id_b)) from SOMETHING group by id_a > > close enough? > > or this: ? > > select id_a, rank() over order by g from > ( select id_a,min(least(id_a,id_b)) as g from SOMETHING group by id_a ) as foo > > > Thanks :) Thats by far more elegant as my approach with arrays I figured out in the meantime. I changed rank() to dense_rank() in your solution. Functionally the 1st line does allready all the magic, though. Great :)
Hi, I'm trying to get my head around the following question. As an example take a table with products: productid (pk) name and productprice productpriceid (pk) productid (fk) pricedate price There are multiple records in productprice for each product as prices (often) go up and (less often) go down. I'm looking for a query that returns the following: productid, name, pricedate, current_price, difference current_price is the latest (ie. most recent date) price of the product and difference is the difference in price between the latest price and the price before the latest. Any suggestions how to do this with SQL only? I can make it work with a function (probably less efficient) but think this should be possible with SQL too... cheers, Jan
On Monday 21 May 2012, Jan Bakuwel wrote: > Hi, > > I'm trying to get my head around the following question. As an > example take a table with products: > > productid (pk) > name > > and productprice > > productpriceid (pk) > productid (fk) > pricedate > price > > There are multiple records in productprice for each product as prices > (often) go up and (less often) go down. > > I'm looking for a query that returns the following: > > productid, name, pricedate, current_price, difference > > current_price is the latest (ie. most recent date) price of the > product and difference is the difference in price between the latest > price and the price before the latest. > > Any suggestions how to do this with SQL only? I can make it work with > a function (probably less efficient) but think this should be > possible with SQL too... Something like this ought to do it (not tested): select latest.price, latest.price - next.price from (select price from productprice where productid = 1 order by pricedate desc limit 1) latest, (select price from productpricewhere productid = 1 order by pricedate desc limit 2 offset 1) next; Regards, -- Raj -- Raj Mathur || raju@kandalaya.org || GPG: http://otheronepercent.blogspot.com || http://kandalaya.org || CC68 It is the mind that moves || http://schizoid.in || D17F
On 2012-05-21 1:17 AM, Jan Bakuwel wrote: > Hi, > > I'm trying to get my head around the following question. As an example > take a table with products: > > productid (pk) > name > > and productprice > > productpriceid (pk) > productid (fk) > pricedate > price > > There are multiple records in productprice for each product as prices > (often) go up and (less often) go down. > > I'm looking for a query that returns the following: > > productid, name, pricedate, current_price, difference > > current_price is the latest (ie. most recent date) price of the product > and difference is the difference in price between the latest price and > the price before the latest. > > Any suggestions how to do this with SQL only? I can make it work with a > function (probably less efficient) but think this should be possible > with SQL too... > > cheers, > Jan > > With windowing functions: select FRS.name ,FRS.productprice as current_price ,FRS.productprice- NXT.productprice from ( SELECT PRC.productid ,PRC.productprice ,ROW_NUMBER() OVER(PARTITION BY PRC.productid ORDER BY PRC.pricedate desc) rank_nr FROM productprice PRC ,products PRD WHERE 1 = 1 AND PRD.productid = PRC.productid ) FRS LEFT JOIN ( SELECT productid ,productprice ,ROW_NUMBER() OVER(PARTITION BY productid ORDER BY pricedate desc) rank_nr FROM productprice ) NXT ON FRS.productid = NXT.productid AND NXT.rank_nr < 3 AND FRS.rank_nr <> NXT.rank_nr WHERE 1 = 1 AND FRS.rank_nr = 1 Without windowing functions (slightly complicated, it's merely a max-mix issue and assuming that the date includes the time) SELECT CUR.name ,CUR.productid ,CUR.current_price ,CUR.current_price - PRV.previous_price price_difference FROM ( --LATEST PRICE SELECT PRD.name ,PRC.productid ,PRC.productprice current_price FROM productprice PRC ,products PRD ,(--LATEST PRICEDATE SELECT productid ,MAX(pricedate) pricedate FROM productprice GROUP BY productid ) MMX WHERE 1 = 1 AND PRC.productid = PRD.productid AND PRC.productid = MMX.productid AND PRC.pricedate = MMX.pricedate )CUR -- IN CASE THERE'S NO PREVIOUS PRICE LEFT JOIN ( -- PREVIOUS PRICE SELECT PRC.productid ,PRC.productprice previous_price FROM productprice PRC ,(--PREVIOUS PRICEDATE SELECT PRC.productid ,MAX(PRC.pricedate) pricedate FROM productprice PRC ,( SELECT productid ,MAX(pricedate) pricedate FROM productprice GROUP BY productid ) MMX WHERE 1 = 1 AND PRC.productid = MMX.productid AND PRC.pricedate < MMX.pricedate ) PRV WHERE 1 = 1 AND PRC.productid = PRV.productid AND PRC.pricedate = PRV.pricedate ) LST ON CUR.productid = LST.productid cheers Mario
Hi Andreas and Mario, Many thanks for your suggestion to use PostgreSQL's window function, exactly what I needed. kind regards, Jan
Jan Bakuwel <jan.bakuwel@greenpeace.org> hat am 21. Mai 2012 um 01:17 geschrieben: > Hi, > > I'm trying to get my head around the following question. As an example > take a table with products: > > productid (pk) > name > > and productprice > > productpriceid (pk) > productid (fk) > pricedate > price > > There are multiple records in productprice for each product as prices > (often) go up and (less often) go down. > > I'm looking for a query that returns the following: > > productid, name, pricedate, current_price, difference > > current_price is the latest (ie. most recent date) price of the product > and difference is the difference in price between the latest price and > the price before the latest. > > Any suggestions how to do this with SQL only? I can make it work with a > function (probably less efficient) but think this should be possible > with SQL too... You can use window-function, in your case something like: test=# select * from productprice ;id | product | pricedate | price ----+---------+------------+------- 1 | 1 | 2012-05-01 | 10 2 | 1 | 2012-05-05 | 15 3 | 1 | 2012-05-10| 12 4 | 1 | 2012-05-15 | 22 (4 rows) test=*# select id, product, pricedate, price, lead(price) over (partition by product order by pricedate desc), price - (lead(price) over (partition by product order by pricedate desc)) from productprice;id | product | pricedate | price | lead | ?column? ----+---------+------------+-------+------+---------- 4 | 1 | 2012-05-15 | 22 | 12 | 10 3 | 1 | 2012-05-10| 12 | 15 | -3 2 | 1 | 2012-05-05 | 15 | 10 | 5 1 | 1 | 2012-05-01 | 10 | | (4 rows) Regards, Andreas