Thread: Distribution of results

Distribution of results

From
"Raghuraman K"
Date:
Hi,
 
   I have a table like this: create table(xyz char(10), answer number(4)).  There are a lot of rows in this table. I am looking at a query that will help me represent the distribution of data records based on the column answer. For example, we may take that the highest entry for answer column is 90 and the lowest is 2 and there are 1000 records. I am looking at a query that will tell how the 1000 records are distributed between the highest and lowest answer (in this case between 90 and 2).  Can anyone please help?
 
   Regards,
 
Raghu

The information contained in, or attached to, this e-mail, contains confidential information and is intended solely for the use of the individual or entity to whom they are addressed and is subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it for any purpose, or disclose its contents to any other person. The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused, directly or indirectly, by any virus transmitted in this email.

www.aztecsoft.com

Re: Distribution of results

From
Jorge Godoy
Date:
"Raghuraman K" <raghuramank@aztecsoft.com> writes:

> Hi,
> 
>    I have a table like this: create table(xyz char(10), answer number(4)).  There are a lot of rows in this table. I
amlooking at a query that will help me represent the distribution of data records based on the column answer. For
example,we may take that the highest entry for answer column is 90 and the lowest is 2 and there are 1000 records. I am
lookingat a query that will tell how the 1000 records are distributed between the highest and lowest answer (in this
casebetween 90 and 2).  Can anyone please help?
 
> 

I believe this isn't hard if you use a statistical function.  You can have one
fairly quickly with PL/R. 

-- 
Jorge Godoy      <jgodoy@gmail.com>


Re: Distribution of results

From
imad
Date:
What else do you want to know about it? (keeping in mind the example you gave)
Because, apparently this is just a matter of min and max.

--Imad


On 11/1/06, Raghuraman K <raghuramank@aztecsoft.com> wrote:
Hi,
 
   I have a table like this: create table(xyz char(10), answer number(4)).  There are a lot of rows in this table. I am looking at a query that will help me represent the distribution of data records based on the column answer. For example, we may take that the highest entry for answer column is 90 and the lowest is 2 and there are 1000 records. I am looking at a query that will tell how the 1000 records are distributed between the highest and lowest answer (in this case between 90 and 2).  Can anyone please help?
 
   Regards,
 
Raghu

The information contained in, or attached to, this e-mail, contains confidential information and is intended solely for the use of the individual or entity to whom they are addressed and is subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it for any purpose, or disclose its contents to any other person. The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused, directly or indirectly, by any virus transmitted in this email.

www.aztecsoft.com

Re: Distribution of results

From
"Aaron Bono"
Date:
On 11/1/06, Raghuraman K <raghuramank@aztecsoft.com> wrote:
Hi,
 
   I have a table like this: create table(xyz char(10), answer number(4)).  There are a lot of rows in this table. I am looking at a query that will help me represent the distribution of data records based on the column answer. For example, we may take that the highest entry for answer column is 90 and the lowest is 2 and there are 1000 records. I am looking at a query that will tell how the 1000 records are distributed between the highest and lowest answer (in this case between 90 and 2).  Can anyone please help?


It helps to know what kind of distribution information you are after.

Mean:
select sum(number) / count(*) from xyz;

Median:
Check out this URL
http://72.14.203.104/search?q=cache:kvZMBQuoAbkJ:people.planetpostgresql.org/greg/index.php%3F/categories/13-Math+postgresql+median+mean+functions&hl=en&gl=us&ct=clnk&cd=1&client=firefox-a

Range:
select max(number) - min(number) from xyz;

Population Variance:
select power(sum(number - mean), 2) / count(*)
from xyz
inner join (
select sum(number) / count(*) as mean from xyz
) as xyz_mean
;

Sample Variance:
select power(sum(number - mean), 2) / (count(*) - 1)
from xyz
inner join (
select sum(number) / count(*) as mean from xyz
) as xyz_mean
;

Note that I did not check the syntax for typos.

Anything more than this will require you whip out a Statistics book.
 
==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================