Thread: percentile rank query

percentile rank query

From
"William Temperley"
Date:
Hi all

I'm trying to calculate the percentile rank for a record based on a
'score' column, e.g. a column of integers such as:
23,77,88,23,23,23,12,12,12,13,13,13
without using a stored procedure.

So,
select count(*) as frequency, score
from scoretable
group by score
order by score

Yields:

frequency score
3             12
3             13
4             23
1             77
1              88


However I'd like this result set:

frequency score   runningtotal
3             12        3
3             13        6
4             23        10
1             77        11
1              88       12

Where the running total is the previous frequency added to the current
frequency. Score order is significant.

So I can then do ((runningtotal-frequency)+(frequency/2))/(select
count(*) from scoretable) to give me the percentile rank for each
score.

Is this possible in one query? I just can't figure out how to get the
running total in a result set.

Thanks very much,

Will Temperley

Re: percentile rank query

From
Osvaldo Rosario Kussama
Date:
William Temperley escreveu:
> Hi all
>
> I'm trying to calculate the percentile rank for a record based on a
> 'score' column, e.g. a column of integers such as:
> 23,77,88,23,23,23,12,12,12,13,13,13
> without using a stored procedure.
>
> So,
> select count(*) as frequency, score
> from scoretable
> group by score
> order by score
>
> Yields:
>
> frequency score
> 3             12
> 3             13
> 4             23
> 1             77
> 1              88
>
>
> However I'd like this result set:
>
> frequency score   runningtotal
> 3             12        3
> 3             13        6
> 4             23        10
> 1             77        11
> 1              88       12
>
> Where the running total is the previous frequency added to the current
> frequency. Score order is significant.
>
> So I can then do ((runningtotal-frequency)+(frequency/2))/(select
> count(*) from scoretable) to give me the percentile rank for each
> score.
>
> Is this possible in one query? I just can't figure out how to get the
> running total in a result set.
>


Try:

SELECT count(*) AS frequency, score,
count((SELECT * FROM scoretable st2 WHERE st2.score <= st1.score)) AS
runningtotal
FROM scoretable st1
GROUP BY score
ORDER BY score

Osvaldo

Re: percentile rank query

From
"William Temperley"
Date:
On Thu, Apr 10, 2008 at 4:36 PM, Osvaldo Rosario Kussama
<osvaldo.kussama@gmail.com> wrote:
>
>  Try:
>
>  SELECT count(*) AS frequency, score,
>  count((SELECT * FROM scoretable st2 WHERE st2.score <= st1.score)) AS
> runningtotal
>  FROM scoretable st1
>  GROUP BY score
>  ORDER BY score
>
>  Osvaldo
>

Thankyou Osvaldo- that worked!

Final version:

SELECT count(*) AS frequency, score,
((((SELECT count(uid) FROM scoretable st2 WHERE st2.score <=
st1.score) - count(*)) + (count(*)/2))::float/(select
count(*) from scoretable))

FROM scoretable st1
GROUP BY score
ORDER BY score

I think that's a percentile rank now.

Cheers

Will

Re: percentile rank query

From
Sam Mason
Date:
On Thu, Apr 10, 2008 at 05:20:21PM +0100, William Temperley wrote:
> SELECT count(*) AS frequency, score,
> ((((SELECT count(uid) FROM scoretable st2 WHERE st2.score <=
> st1.score) - count(*)) + (count(*)/2))::float/(select
> count(*) from scoretable))
>
> FROM scoretable st1
> GROUP BY score
> ORDER BY score
>
> I think that's a percentile rank now.

I'm not quite sure how this is calculated but I think you may want to
be converting to a non-integral type earlier (i.e. as you're dividing
by two, not after).  I also find all the subselects a bit difficult to
follow so have moved them around:

  SELECT x.frequency, x.score, (x.rank + x.frequency / 2.0) / y.total AS pr
  FROM (
    SELECT count(*) AS frequency, score,
      (SELECT count(*) FROM scoretable t WHERE t.score < s.score) AS rank
    FROM scoretable s
    GROUP BY score) x, (SELECT count(*) AS total FROM scoretable) y
  ORDER BY score;


  Sam