Thread: surprising results with random()

surprising results with random()

From
Jessi Berkelhammer
Date:
Hi,

I have a view in which I want to randomly assign values if certain
conditions hold. I was getting surprising results. Here is a (very)
simplified version of the view, which seems to indicate the problem:

CREATE OR REPLACE VIEW test_view AS
SELECT
    CASE
        WHEN random() < .3333 THEN '1'
        WHEN random() < .3333 THEN '2'
        ELSE '3'
    END AS test_value

FROM client ;

It seems this should generate a random number between 0 and 1, and set
test_value to '1' if this first generated number is less than .3333.
Otherwise, it should generate another random number, and set test_value
to '2' if this is less than .3333. And if neither of the random numbers
are less than .3333, it should set test_value to '3'. It seems to me
that there should be a relative even distribution of the 3 values.

However when I run this, the values are always similar to what is below:

X_test=>  select test_value, count(*) from test_view group by 1 order by 1;
  test_value | count
------------+-------
  1          | 23947
  2          | 16061
  3          | 32443

Why are there significantly fewer 2s? I understand that random() is not
truly random, and that the seed affects this value. But it still
confuses me that, no matter how many times I run this, there are always
so few 2s. If it is generating an independent random number in the
second call to random(), then I don't know why there are more so many
more 1s than 2s.

Thanks!
-jessi

--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist



Re: surprising results with random()

From
Steve Atkins
Date:
On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote:

> Hi,
>
> I have a view in which I want to randomly assign values if certain
> conditions hold. I was getting surprising results. Here is a (very)
> simplified version of the view, which seems to indicate the problem:
>
> CREATE OR REPLACE VIEW test_view AS
> SELECT
>     CASE
>         WHEN random() < .3333 THEN '1'
>         WHEN random() < .3333 THEN '2'
>         ELSE '3'
>     END AS test_value
>
> FROM client ;
>
> It seems this should generate a random number between 0 and 1, and set
> test_value to '1' if this first generated number is less than .3333.
> Otherwise, it should generate another random number, and set
> test_value
> to '2' if this is less than .3333. And if neither of the random
> numbers
> are less than .3333, it should set test_value to '3'. It seems to me
> that there should be a relative even distribution of the 3 values.
>
>
> However when I run this, the values are always similar to what is
> below:
>
> X_test=>  select test_value, count(*) from test_view group by 1
> order by 1;
> test_value | count
> ------------+-------
> 1          | 23947
> 2          | 16061
> 3          | 32443
>
> Why are there significantly fewer 2s? I understand that random() is
> not
> truly random, and that the seed affects this value. But it still
> confuses me that, no matter how many times I run this, there are
> always
> so few 2s. If it is generating an independent random number in the
> second call to random(), then I don't know why there are more so many
> more 1s than 2s.

Nope, it's nothing to do with random(), it's that your maths is wrong.

There are 9 possible cases. In 3 of them you return 1. In 2 of them you
return 2. In the remaining 4 cases you return 3.

If you were to run this 72451 times I'd expect to see
1:  24150 = 72451 * 3/9
2: 16100 = 72451 * 2/9
3: 32200 = 72451 * 4/9

Which, unsurprisingly, is fairly close to what you get.

Cheers,
   Steve



Re: surprising results with random()

From
ries van Twisk
Date:
Jessi,

should the function not look like this???

CREATE OR REPLACE VIEW test_view AS
SELECT
    CASE
        WHEN random() < .333333333 THEN '1'
        WHEN random() < .5 THEN '2'
        ELSE '3'
    END AS test_value

FROM client;

On Feb 23, 2009, at 5:09 PM, Jessi Berkelhammer wrote:

> Hi,
>
> I have a view in which I want to randomly assign values if certain
> conditions hold. I was getting surprising results. Here is a (very)
> simplified version of the view, which seems to indicate the problem:
>
> CREATE OR REPLACE VIEW test_view AS
> SELECT
>     CASE
>         WHEN random() < .3333 THEN '1'
>         WHEN random() < .3333 THEN '2'
>         ELSE '3'
>     END AS test_value
>
> FROM client ;
>
> It seems this should generate a random number between 0 and 1, and set
> test_value to '1' if this first generated number is less than .3333.
> Otherwise, it should generate another random number, and set
> test_value
> to '2' if this is less than .3333. And if neither of the random
> numbers
> are less than .3333, it should set test_value to '3'. It seems to me
> that there should be a relative even distribution of the 3 values.
>
> However when I run this, the values are always similar to what is
> below:
>
> X_test=>  select test_value, count(*) from test_view group by 1
> order by 1;
> test_value | count
> ------------+-------
> 1          | 23947
> 2          | 16061
> 3          | 32443
>
> Why are there significantly fewer 2s? I understand that random() is
> not
> truly random, and that the seed affects this value. But it still
> confuses me that, no matter how many times I run this, there are
> always
> so few 2s. If it is generating an independent random number in the
> second call to random(), then I don't know why there are more so many
> more 1s than 2s.
>
> Thanks!
> -jessi
>
> --
> Jessi Berkelhammer
> Downtown Emergency Service Center
> Computer Programming Specialist







Re: surprising results with random()

From
"Brent Wood"
Date:
Or perhaps:

CREATE OR REPLACE VIEW test_view AS
SELECT (random()*3)::int as test_value;

At least in this case, that should give the same result.

in this case 1/3 should be 1,  1/3 = 2 & 1/3=3

in your case 1/3 = 1, 1/2 the remainder (1/2 * 2/3 = 1/3) = 2, remaining 1/3 = 3

Although I'm guessing the original intent is to NOT generate an equal distribution, but I'm not sure what distribution
isrequired. 


Cheers,

   Brent Wood


Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
>>> ries van Twisk <pg@rvt.dds.nl> 02/24/09 12:13 PM >>>

Jessi,

should the function not look like this???

CREATE OR REPLACE VIEW test_view AS
SELECT
    CASE
        WHEN random() < .333333333 THEN '1'
        WHEN random() < .5 THEN '2'
        ELSE '3'
    END AS test_value

FROM client;

On Feb 23, 2009, at 5:09 PM, Jessi Berkelhammer wrote:

> Hi,
>
> I have a view in which I want to randomly assign values if certain
> conditions hold. I was getting surprising results. Here is a (very)
> simplified version of the view, which seems to indicate the problem:
>
> CREATE OR REPLACE VIEW test_view AS
> SELECT
>     CASE
>         WHEN random() < .3333 THEN '1'
>         WHEN random() < .3333 THEN '2'
>         ELSE '3'
>     END AS test_value
>
> FROM client ;
>
> It seems this should generate a random number between 0 and 1, and set
> test_value to '1' if this first generated number is less than .3333.
> Otherwise, it should generate another random number, and set
> test_value
> to '2' if this is less than .3333. And if neither of the random
> numbers
> are less than .3333, it should set test_value to '3'. It seems to me
> that there should be a relative even distribution of the 3 values.
>
> However when I run this, the values are always similar to what is
> below:
>
> X_test=>  select test_value, count(*) from test_view group by 1
> order by 1;
> test_value | count
> ------------+-------
> 1          | 23947
> 2          | 16061
> 3          | 32443
>
> Why are there significantly fewer 2s? I understand that random() is
> not
> truly random, and that the seed affects this value. But it still
> confuses me that, no matter how many times I run this, there are
> always
> so few 2s. If it is generating an independent random number in the
> second call to random(), then I don't know why there are more so many
> more 1s than 2s.
>
> Thanks!
> -jessi
>
> --
> Jessi Berkelhammer
> Downtown Emergency Service Center
> Computer Programming Specialist







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

Re: surprising results with random()

From
raf
Date:
Steve Atkins wrote:

>
> On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote:
>
> >Hi,
> >
> >I have a view in which I want to randomly assign values if certain
> >conditions hold. I was getting surprising results. Here is a (very)
> >simplified version of the view, which seems to indicate the problem:
> >
> >CREATE OR REPLACE VIEW test_view AS
> >SELECT
> >    CASE
> >        WHEN random() < .3333 THEN '1'
> >        WHEN random() < .3333 THEN '2'
> >        ELSE '3'
> >    END AS test_value
> >
> >FROM client ;
> >
> >It seems this should generate a random number between 0 and 1, and set
> >test_value to '1' if this first generated number is less than .3333.
> >Otherwise, it should generate another random number, and set
> >test_value
> >to '2' if this is less than .3333. And if neither of the random
> >numbers
> >are less than .3333, it should set test_value to '3'. It seems to me
> >that there should be a relative even distribution of the 3 values.
> >
> >
> >However when I run this, the values are always similar to what is
> >below:
> >
> >X_test=>  select test_value, count(*) from test_view group by 1
> >order by 1;
> >test_value | count
> >------------+-------
> >1          | 23947
> >2          | 16061
> >3          | 32443
> >
> >Why are there significantly fewer 2s? I understand that random() is
> >not
> >truly random, and that the seed affects this value. But it still
> >confuses me that, no matter how many times I run this, there are
> >always
> >so few 2s. If it is generating an independent random number in the
> >second call to random(), then I don't know why there are more so many
> >more 1s than 2s.
>
> Nope, it's nothing to do with random(), it's that your maths is wrong.
>
> There are 9 possible cases. In 3 of them you return 1. In 2 of them you
> return 2. In the remaining 4 cases you return 3.
>
> If you were to run this 72451 times I'd expect to see
> 1:  24150 = 72451 * 3/9
> 2: 16100 = 72451 * 2/9
> 3: 32200 = 72451 * 4/9
>
> Which, unsurprisingly, is fairly close to what you get.
>
> Cheers,
>   Steve

this looks like an attempt to understand the monty hall problem.
http://en.wikipedia.org/wiki/Monty_Hall_problem
except that there's no goat :(

cheers,
raf


Re: surprising results with random()

From
Scott Marlowe
Date:
On Mon, Feb 23, 2009 at 5:16 PM, raf <raf@raf.org> wrote:

> this looks like an attempt to understand the monty hall problem.
> http://en.wikipedia.org/wiki/Monty_Hall_problem
> except that there's no goat :(

And what database management system can be complete without a goat? :)