Thread: histogram
I have a column of 2 million float values from 0 to 1. I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc. What is the best way to do this? Thanks, Joel -------------------------------------------------------------------------- - for hire: mac osx device driver ninja, kernel extensions and usb drivers ---------------------+------------+--------------------------------------- http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont ---------------------+------------+---------------------------------------
Hi, try something like this: select trunc(random() * 10.)/10. , count(*) from generate_series(1,2000000) group by 1 order by 2 regards Thomas Am 30.04.2011 18:37, schrieb Joel Reymont: > I have a column of 2 million float values from 0 to 1. > > I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc. > > What is the best way to do this? > > Thanks, Joel > > -------------------------------------------------------------------------- > - for hire: mac osx device driver ninja, kernel extensions and usb drivers > ---------------------+------------+--------------------------------------- > http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont > ---------------------+------------+--------------------------------------- > > > >
Thank you Thomas! Is there a way for the code below to determine the number of rows in the table and use it? Thanks, Joel On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote: > Hi, > > try something like this: > > select > trunc(random() * 10.)/10. > , count(*) > from > generate_series(1,2000000) > group by 1 order by 2 -------------------------------------------------------------------------- - for hire: mac osx device driver ninja, kernel extensions and usb drivers ---------------------+------------+--------------------------------------- http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont ---------------------+------------+---------------------------------------
What is the meaning of group by 1 order by 2 e.g. what to the numbers 1 and 2 stand for? What would change if I do the following? group by 1 order by 1 On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote: > Hi, > > try something like this: > > select > trunc(random() * 10.)/10. > , count(*) > from > generate_series(1,2000000) > group by 1 order by 2 -------------------------------------------------------------------------- - for hire: mac osx device driver ninja, kernel extensions and usb drivers ---------------------+------------+--------------------------------------- http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont ---------------------+------------+---------------------------------------
I think this should do what I want select trunc(distance * 10.)/10., count(*) from doc_ads group by 1 order by 1 Thanks, Joel -------------------------------------------------------------------------- - for hire: mac osx device driver ninja, kernel extensions and usb drivers ---------------------+------------+--------------------------------------- http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont ---------------------+------------+---------------------------------------
re: 1 and 2. They're horrible (imho) reference to the attributes of the returned tuple. Or at best an exposure of the implementation. :) Order by "2" if you want the most frequent (highest counts) of your distances at the bottom of the output (or ordery by 2 desc) if you want them at the top of your output. Joel Reymont wrote: > I think this should do what I want > > select trunc(distance * 10.)/10., count(*) > from doc_ads > group by 1 order by 1 > > Thanks, Joel > > > -------------------------------------------------------------------------- > - for hire: mac osx device driver ninja, kernel extensions and usb drivers > ---------------------+------------+--------------------------------------- > http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont > ---------------------+------------+--------------------------------------- > > > > >
Given that you are actively implementing the code that uses the 1 and 2 I don't see how it is that egregious. When generating calculated fields it is cleaner than the alternative: Select trunc(distance * 10.)/10., count(*) From doc_ads Group by (trunc(distance * 10.)) Order by (trunc(distance * 10.)) It would be nice if you could do: Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency From doc_ads Group by bin Order by bin But I do not believe that is allowed (though I may have my syntax wrong...) David J. >> re: 1 and 2. They're horrible (imho) reference to the attributes of the returned tuple. Or at best an exposure of the implementation. :) >>Joel Reymont wrote: >>> I think this should do what I want >>> >>> select trunc(distance * 10.)/10., count(*) >>> from doc_ads >>> group by 1 order by 1 >>> >>> Thanks, Joel
David Johnston wrote: > Given that you are actively implementing the code that uses the 1 and 2 I > don't see how it is that egregious. When generating calculated fields it is > cleaner than the alternative: > > Select trunc(distance * 10.)/10., count(*) > From doc_ads > Group by (trunc(distance * 10.)) > Order by (trunc(distance * 10.)) > > It would be nice if you could do: > > Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency > From doc_ads > Group by bin > Order by bin > > But I do not believe that is allowed (though I may have my syntax wrong...) > > David J. > > >>> re: 1 and 2. They're horrible (imho) reference to the attributes of the >>> > returned tuple. Or at best an exposure of the implementation. :) > > >>> Joel Reymont wrote: >>> >>>> I think this should do what I want >>>> >>>> select trunc(distance * 10.)/10., count(*) >>>> from doc_ads >>>> group by 1 order by 1 >>>> >>>> Thanks, Joel >>>> > > I think we're supposed to bottom-post here. I agree in the case of generated columns and old servers but you see the practice more commonly than really necessary. But in 8.4 at least select trunc(distance * 10.0 )/10.0 as histo, count(*) as tally from d group by histo order by tally; works just fine for me
hi, group by 1 means group by first output column order by 2 means order by second output column ascending Am 30.04.2011 19:00, schrieb Joel Reymont: > What is the meaning of > > group by 1 order by 2 > > e.g. what to the numbers 1 and 2 stand for? > > What would change if I do the following? > > group by 1 order by 1 > > On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote: > >> Hi, >> >> try something like this: >> >> select >> trunc(random() * 10.)/10. >> , count(*) >> from >> generate_series(1,2000000) >> group by 1 order by 2 > -------------------------------------------------------------------------- > - for hire: mac osx device driver ninja, kernel extensions and usb drivers > ---------------------+------------+--------------------------------------- > http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont > ---------------------+------------+--------------------------------------- > > >