Thread: Selecting top N percent of records.
Is there a way to select the top 10% of the values from a column? For example the top 10% best selling items where number of sales is a column. Thanks.
Tim Uckun wrote: > Is there a way to select the top 10% of the values from a column? > > For example the top 10% best selling items where number of sales is a column. The top 10% would be a variable number of records. Is that what you want, or would you rather, say, just see the top N items? The latter is easy, just something like this: SELECT * FROM mytbl ORDER BY num_sales DESC LIMIT $1 ... where $1 is the number of records you want. To actually get 10% of the records, you can replace the $1 with a subquery, something like this probably: SELECT * FROM mytbl ORDER BY num_sales DESC LIMIT (SELECT (count(*) / 10) AS selnum FROM mytbl) There are probably other ways to do it also. Maybe you want all items whose sales are in the top 90 percentile or something, or maybe you want what generated the most profit, etc. -- Darren Duncan
On 18 October 2010 00:33, Tim Uckun <timuckun@gmail.com> wrote: > Is there a way to select the top 10% of the values from a column? > > For example the top 10% best selling items where number of sales is a column. > That is a bit problematic because it necessitates knowing the number of rows total, and slow counting is an idiosyncrasy of postgres. http://wiki.postgresql.org/wiki/Slow_Counting To get the top 10%: SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table) -- Regards, Peter Geoghegan
On Mon, Oct 18, 2010 at 12:53 PM, Darren Duncan <darren@darrenduncan.net> wrote: > Tim Uckun wrote: >> >> Is there a way to select the top 10% of the values from a column? >> >> For example the top 10% best selling items where number of sales is a >> column. > > The top 10% would be a variable number of records. Is that what you want, > or would you rather, say, just see the top N items? Yes this is what I want. I don't care how many records I pull up (well I might want to put a LIMIT on them). I want the top 10%
> That is a bit problematic because it necessitates knowing the number > of rows total, and slow counting is an idiosyncrasy of postgres. > > http://wiki.postgresql.org/wiki/Slow_Counting > > To get the top 10%: > > SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table) I think I wasn't making myself clear. I don't want the top 10% of the rows. I want the rows with the top 10% of the values in a column. In my case there is a very non linear set of values. The lowest value is 1 and the vast majority of records have a 1 in the column. The highest value might be in the tens of thousands. I want to pull up the records that have the top ten percent values.
On 10/18/2010 08:06 AM, Tim Uckun wrote: >> That is a bit problematic because it necessitates knowing the number >> of rows total, and slow counting is an idiosyncrasy of postgres. >> >> http://wiki.postgresql.org/wiki/Slow_Counting >> >> To get the top 10%: >> >> SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table) > > > I think I wasn't making myself clear. I don't want the top 10% of the > rows. I want the rows with the top 10% of the values in a column. OK, so you want a median-style "sort them in descending order and count down until you've selected the first 10% of rows" approach? In other words, values in the 90th percentile of the distribution? Try this. Given table "x" with single integer column "y", obtain rows of x in the 90th percentile of y: select ranked.y FROM (select percent_rank() over (order by y desc) as pc, y from x) AS ranked WHERE pc <= 0.1; or: select ranked.y from (select ntile(10) over (order by y desc) as pc, y from x) AS ranked WHERE pc = 1; See: http://www.postgresql.org/docs/current/static/functions-window.html Both of these seem to produce odd results with small input row counts. Test carefully before trusting these expressions, as I'm quite new to the use of window functions. -- Craig Ringer
> > OK, so you want a median-style "sort them in descending order and count down > until you've selected the first 10% of rows" approach? In other words, > values in the 90th percentile of the distribution? > > Try this. Given table "x" with single integer column "y", obtain rows of x > in the 90th percentile of y: > > select ranked.y FROM (select percent_rank() over (order by y desc) as pc, y > from x) AS ranked WHERE pc <= 0.1; > > or: > > select ranked.y from (select ntile(10) over (order by y desc) as pc, y from > x) AS ranked WHERE pc = 1; > Thanks I will look into the window functions. I haven't used them before so thanks for the tip.
Something like this should work - (but is untested), and does require the extra subquery, so there may be a more efficient way?
However, off the top of my head:
select a,b,c
from table
where <where clause>
order by c desc
limit (select count(*)/10 from table where <same where clause>);
where c is the no of sales column
>>> Tim Uckun <timuckun@gmail.com> 10/18/10 3:40 PM >>>
Is there a way to select the top 10% of the values from a column?
For example the top 10% best selling items where number of sales is a column.
Thanks.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
However, off the top of my head:
select a,b,c
from table
where <where clause>
order by c desc
limit (select count(*)/10 from table where <same where clause>);
where c is the no of sales column
Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
DBA/GIS consultant
NIWA, Wellington
New Zealand
>>> Tim Uckun <timuckun@gmail.com> 10/18/10 3:40 PM >>>
Is there a way to select the top 10% of the values from a column?
For example the top 10% best selling items where number of sales is a column.
Thanks.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.