Thread: percentile rank query
Hi all I'm trying to calculate the percentile rank for a record based on a 'score' column, e.g. a column of integers such as: 23,77,88,23,23,23,12,12,12,13,13,13 without using a stored procedure. So, select count(*) as frequency, score from scoretable group by score order by score Yields: frequency score 3 12 3 13 4 23 1 77 1 88 However I'd like this result set: frequency score runningtotal 3 12 3 3 13 6 4 23 10 1 77 11 1 88 12 Where the running total is the previous frequency added to the current frequency. Score order is significant. So I can then do ((runningtotal-frequency)+(frequency/2))/(select count(*) from scoretable) to give me the percentile rank for each score. Is this possible in one query? I just can't figure out how to get the running total in a result set. Thanks very much, Will Temperley
William Temperley escreveu: > Hi all > > I'm trying to calculate the percentile rank for a record based on a > 'score' column, e.g. a column of integers such as: > 23,77,88,23,23,23,12,12,12,13,13,13 > without using a stored procedure. > > So, > select count(*) as frequency, score > from scoretable > group by score > order by score > > Yields: > > frequency score > 3 12 > 3 13 > 4 23 > 1 77 > 1 88 > > > However I'd like this result set: > > frequency score runningtotal > 3 12 3 > 3 13 6 > 4 23 10 > 1 77 11 > 1 88 12 > > Where the running total is the previous frequency added to the current > frequency. Score order is significant. > > So I can then do ((runningtotal-frequency)+(frequency/2))/(select > count(*) from scoretable) to give me the percentile rank for each > score. > > Is this possible in one query? I just can't figure out how to get the > running total in a result set. > Try: SELECT count(*) AS frequency, score, count((SELECT * FROM scoretable st2 WHERE st2.score <= st1.score)) AS runningtotal FROM scoretable st1 GROUP BY score ORDER BY score Osvaldo
On Thu, Apr 10, 2008 at 4:36 PM, Osvaldo Rosario Kussama <osvaldo.kussama@gmail.com> wrote: > > Try: > > SELECT count(*) AS frequency, score, > count((SELECT * FROM scoretable st2 WHERE st2.score <= st1.score)) AS > runningtotal > FROM scoretable st1 > GROUP BY score > ORDER BY score > > Osvaldo > Thankyou Osvaldo- that worked! Final version: SELECT count(*) AS frequency, score, ((((SELECT count(uid) FROM scoretable st2 WHERE st2.score <= st1.score) - count(*)) + (count(*)/2))::float/(select count(*) from scoretable)) FROM scoretable st1 GROUP BY score ORDER BY score I think that's a percentile rank now. Cheers Will
On Thu, Apr 10, 2008 at 05:20:21PM +0100, William Temperley wrote: > SELECT count(*) AS frequency, score, > ((((SELECT count(uid) FROM scoretable st2 WHERE st2.score <= > st1.score) - count(*)) + (count(*)/2))::float/(select > count(*) from scoretable)) > > FROM scoretable st1 > GROUP BY score > ORDER BY score > > I think that's a percentile rank now. I'm not quite sure how this is calculated but I think you may want to be converting to a non-integral type earlier (i.e. as you're dividing by two, not after). I also find all the subselects a bit difficult to follow so have moved them around: SELECT x.frequency, x.score, (x.rank + x.frequency / 2.0) / y.total AS pr FROM ( SELECT count(*) AS frequency, score, (SELECT count(*) FROM scoretable t WHERE t.score < s.score) AS rank FROM scoretable s GROUP BY score) x, (SELECT count(*) AS total FROM scoretable) y ORDER BY score; Sam