Thread: histogram

histogram

From

Joel Reymont

Date:

30 April 2011, 13:37:31

I have a column of 2 million float values from 0 to 1.

I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20,
etc.

What is the best way to do this?

    Thanks, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

Re: histogram

From

Thomas Markus

Date:

30 April 2011, 13:49:11

Hi,

try something like this:

select
     trunc(random() * 10.)/10.
     , count(*)
from
     generate_series(1,2000000)
group by 1 order by 2

regards
Thomas

Am 30.04.2011 18:37, schrieb Joel Reymont:
> I have a column of 2 million float values from 0 to 1.
>
> I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20,
etc.
>
> What is the best way to do this?
>
>     Thanks, Joel
>
> --------------------------------------------------------------------------
> - for hire: mac osx device driver ninja, kernel extensions and usb drivers
> ---------------------+------------+---------------------------------------
> http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
> ---------------------+------------+---------------------------------------
>
>
>
>

Re: histogram

From

Joel Reymont

Date:

30 April 2011, 13:53:35

Thank you Thomas!

Is there a way for the code below to determine the number of rows in the table and use it?

    Thanks, Joel

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

> Hi,
>
> try something like this:
>
> select
>    trunc(random() * 10.)/10.
>    , count(*)
> from
>    generate_series(1,2000000)
> group by 1 order by 2

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

Re: histogram

From

Joel Reymont

Date:

30 April 2011, 14:00:47

What is the meaning of

    group by 1 order by 2

e.g. what to the numbers 1 and 2 stand for?

What would change if I do the following?

    group by 1 order by 1

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

> Hi,
>
> try something like this:
>
> select
>    trunc(random() * 10.)/10.
>    , count(*)
> from
>    generate_series(1,2000000)
> group by 1 order by 2

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

Re: histogram

From

Joel Reymont

Date:

30 April 2011, 14:03:57

I think this should do what I want

    select trunc(distance * 10.)/10., count(*)
    from doc_ads
    group by 1 order by 1

  Thanks, Joel


--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

Re: histogram

From

Rob Sargent

Date:

30 April 2011, 15:17:29

re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)

Order by "2" if you want the most frequent (highest counts) of your
distances at the bottom of the output (or ordery by 2 desc) if you want
them at the top of your output.

Joel Reymont wrote:
> I think this should do what I want
>
>     select trunc(distance * 10.)/10., count(*)
>     from doc_ads
>     group by 1 order by 1
>
>   Thanks, Joel
>
>
> --------------------------------------------------------------------------
> - for hire: mac osx device driver ninja, kernel extensions and usb drivers
> ---------------------+------------+---------------------------------------
> http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
> ---------------------+------------+---------------------------------------
>
>
>
>
>

Re: histogram

From

"David Johnston"

Date:

30 April 2011, 18:42:11

Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious.  When generating calculated fields it is
cleaner than the alternative:

Select trunc(distance * 10.)/10., count(*)
From doc_ads
Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))

It would be nice if you could do:

Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
From doc_ads
Group by bin
Order by bin

But I do not believe that is allowed (though I may have my syntax wrong...)

David J.

>> re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)

>>Joel Reymont wrote:
>>> I think this should do what I want
>>>
>>>     select trunc(distance * 10.)/10., count(*)
>>>     from doc_ads
>>>     group by 1 order by 1
>>>
>>>   Thanks, Joel

Re: histogram

From

Rob Sargent

Date:

30 April 2011, 20:36:32


David Johnston wrote:
> Given that you are actively implementing the code that uses the 1 and 2 I
> don't see how it is that egregious.  When generating calculated fields it is
> cleaner than the alternative:
>
> Select trunc(distance * 10.)/10., count(*)
> From doc_ads
> Group by (trunc(distance * 10.))
> Order by (trunc(distance * 10.))
>
> It would be nice if you could do:
>
> Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
> From doc_ads
> Group by bin
> Order by bin
>
> But I do not believe that is allowed (though I may have my syntax wrong...)
>
> David J.
>
>
>>> re: 1 and 2. They're horrible (imho) reference to the attributes of the
>>>
> returned tuple. Or at best an exposure of the implementation. :)
>
>
>>> Joel Reymont wrote:
>>>
>>>> I think this should do what I want
>>>>
>>>>     select trunc(distance * 10.)/10., count(*)
>>>>     from doc_ads
>>>>     group by 1 order by 1
>>>>
>>>>   Thanks, Joel
>>>>
>
>
I think we're supposed to bottom-post here.

I agree in the case of generated columns and old servers but you see the
practice more commonly than really necessary. But in 8.4 at least

select trunc(distance * 10.0 )/10.0 as histo, count(*) as tally
from d group by histo order by tally;


works just fine for me

Re: histogram

From

Thomas Markus

Date:

01 May 2011, 04:41:56

hi,

group by 1 means group by first output column
order by 2 means order by second output column ascending



Am 30.04.2011 19:00, schrieb Joel Reymont:
> What is the meaning of
>
>     group by 1 order by 2
>
> e.g. what to the numbers 1 and 2 stand for?
>
> What would change if I do the following?
>
>     group by 1 order by 1
>
> On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:
>
>> Hi,
>>
>> try something like this:
>>
>> select
>>     trunc(random() * 10.)/10.
>>     , count(*)
>> from
>>     generate_series(1,2000000)
>> group by 1 order by 2
> --------------------------------------------------------------------------
> - for hire: mac osx device driver ninja, kernel extensions and usb drivers
> ---------------------+------------+---------------------------------------
> http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
> ---------------------+------------+---------------------------------------
>
>
>