Thread: surprising results with random()
Hi, I have a view in which I want to randomly assign values if certain conditions hold. I was getting surprising results. Here is a (very) simplified version of the view, which seems to indicate the problem: CREATE OR REPLACE VIEW test_view AS SELECT CASE WHEN random() < .3333 THEN '1' WHEN random() < .3333 THEN '2' ELSE '3' END AS test_value FROM client ; It seems this should generate a random number between 0 and 1, and set test_value to '1' if this first generated number is less than .3333. Otherwise, it should generate another random number, and set test_value to '2' if this is less than .3333. And if neither of the random numbers are less than .3333, it should set test_value to '3'. It seems to me that there should be a relative even distribution of the 3 values. However when I run this, the values are always similar to what is below: X_test=> select test_value, count(*) from test_view group by 1 order by 1; test_value | count ------------+------- 1 | 23947 2 | 16061 3 | 32443 Why are there significantly fewer 2s? I understand that random() is not truly random, and that the seed affects this value. But it still confuses me that, no matter how many times I run this, there are always so few 2s. If it is generating an independent random number in the second call to random(), then I don't know why there are more so many more 1s than 2s. Thanks! -jessi -- Jessi Berkelhammer Downtown Emergency Service Center Computer Programming Specialist
On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote: > Hi, > > I have a view in which I want to randomly assign values if certain > conditions hold. I was getting surprising results. Here is a (very) > simplified version of the view, which seems to indicate the problem: > > CREATE OR REPLACE VIEW test_view AS > SELECT > CASE > WHEN random() < .3333 THEN '1' > WHEN random() < .3333 THEN '2' > ELSE '3' > END AS test_value > > FROM client ; > > It seems this should generate a random number between 0 and 1, and set > test_value to '1' if this first generated number is less than .3333. > Otherwise, it should generate another random number, and set > test_value > to '2' if this is less than .3333. And if neither of the random > numbers > are less than .3333, it should set test_value to '3'. It seems to me > that there should be a relative even distribution of the 3 values. > > > However when I run this, the values are always similar to what is > below: > > X_test=> select test_value, count(*) from test_view group by 1 > order by 1; > test_value | count > ------------+------- > 1 | 23947 > 2 | 16061 > 3 | 32443 > > Why are there significantly fewer 2s? I understand that random() is > not > truly random, and that the seed affects this value. But it still > confuses me that, no matter how many times I run this, there are > always > so few 2s. If it is generating an independent random number in the > second call to random(), then I don't know why there are more so many > more 1s than 2s. Nope, it's nothing to do with random(), it's that your maths is wrong. There are 9 possible cases. In 3 of them you return 1. In 2 of them you return 2. In the remaining 4 cases you return 3. If you were to run this 72451 times I'd expect to see 1: 24150 = 72451 * 3/9 2: 16100 = 72451 * 2/9 3: 32200 = 72451 * 4/9 Which, unsurprisingly, is fairly close to what you get. Cheers, Steve
Jessi, should the function not look like this??? CREATE OR REPLACE VIEW test_view AS SELECT CASE WHEN random() < .333333333 THEN '1' WHEN random() < .5 THEN '2' ELSE '3' END AS test_value FROM client; On Feb 23, 2009, at 5:09 PM, Jessi Berkelhammer wrote: > Hi, > > I have a view in which I want to randomly assign values if certain > conditions hold. I was getting surprising results. Here is a (very) > simplified version of the view, which seems to indicate the problem: > > CREATE OR REPLACE VIEW test_view AS > SELECT > CASE > WHEN random() < .3333 THEN '1' > WHEN random() < .3333 THEN '2' > ELSE '3' > END AS test_value > > FROM client ; > > It seems this should generate a random number between 0 and 1, and set > test_value to '1' if this first generated number is less than .3333. > Otherwise, it should generate another random number, and set > test_value > to '2' if this is less than .3333. And if neither of the random > numbers > are less than .3333, it should set test_value to '3'. It seems to me > that there should be a relative even distribution of the 3 values. > > However when I run this, the values are always similar to what is > below: > > X_test=> select test_value, count(*) from test_view group by 1 > order by 1; > test_value | count > ------------+------- > 1 | 23947 > 2 | 16061 > 3 | 32443 > > Why are there significantly fewer 2s? I understand that random() is > not > truly random, and that the seed affects this value. But it still > confuses me that, no matter how many times I run this, there are > always > so few 2s. If it is generating an independent random number in the > second call to random(), then I don't know why there are more so many > more 1s than 2s. > > Thanks! > -jessi > > -- > Jessi Berkelhammer > Downtown Emergency Service Center > Computer Programming Specialist
Or perhaps: CREATE OR REPLACE VIEW test_view AS SELECT (random()*3)::int as test_value; At least in this case, that should give the same result. in this case 1/3 should be 1, 1/3 = 2 & 1/3=3 in your case 1/3 = 1, 1/2 the remainder (1/2 * 2/3 = 1/3) = 2, remaining 1/3 = 3 Although I'm guessing the original intent is to NOT generate an equal distribution, but I'm not sure what distribution isrequired. Cheers, Brent Wood Brent Wood DBA/GIS consultant NIWA, Wellington New Zealand >>> ries van Twisk <pg@rvt.dds.nl> 02/24/09 12:13 PM >>> Jessi, should the function not look like this??? CREATE OR REPLACE VIEW test_view AS SELECT CASE WHEN random() < .333333333 THEN '1' WHEN random() < .5 THEN '2' ELSE '3' END AS test_value FROM client; On Feb 23, 2009, at 5:09 PM, Jessi Berkelhammer wrote: > Hi, > > I have a view in which I want to randomly assign values if certain > conditions hold. I was getting surprising results. Here is a (very) > simplified version of the view, which seems to indicate the problem: > > CREATE OR REPLACE VIEW test_view AS > SELECT > CASE > WHEN random() < .3333 THEN '1' > WHEN random() < .3333 THEN '2' > ELSE '3' > END AS test_value > > FROM client ; > > It seems this should generate a random number between 0 and 1, and set > test_value to '1' if this first generated number is less than .3333. > Otherwise, it should generate another random number, and set > test_value > to '2' if this is less than .3333. And if neither of the random > numbers > are less than .3333, it should set test_value to '3'. It seems to me > that there should be a relative even distribution of the 3 values. > > However when I run this, the values are always similar to what is > below: > > X_test=> select test_value, count(*) from test_view group by 1 > order by 1; > test_value | count > ------------+------- > 1 | 23947 > 2 | 16061 > 3 | 32443 > > Why are there significantly fewer 2s? I understand that random() is > not > truly random, and that the seed affects this value. But it still > confuses me that, no matter how many times I run this, there are > always > so few 2s. If it is generating an independent random number in the > second call to random(), then I don't know why there are more so many > more 1s than 2s. > > Thanks! > -jessi > > -- > Jessi Berkelhammer > Downtown Emergency Service Center > Computer Programming Specialist -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.
Steve Atkins wrote: > > On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote: > > >Hi, > > > >I have a view in which I want to randomly assign values if certain > >conditions hold. I was getting surprising results. Here is a (very) > >simplified version of the view, which seems to indicate the problem: > > > >CREATE OR REPLACE VIEW test_view AS > >SELECT > > CASE > > WHEN random() < .3333 THEN '1' > > WHEN random() < .3333 THEN '2' > > ELSE '3' > > END AS test_value > > > >FROM client ; > > > >It seems this should generate a random number between 0 and 1, and set > >test_value to '1' if this first generated number is less than .3333. > >Otherwise, it should generate another random number, and set > >test_value > >to '2' if this is less than .3333. And if neither of the random > >numbers > >are less than .3333, it should set test_value to '3'. It seems to me > >that there should be a relative even distribution of the 3 values. > > > > > >However when I run this, the values are always similar to what is > >below: > > > >X_test=> select test_value, count(*) from test_view group by 1 > >order by 1; > >test_value | count > >------------+------- > >1 | 23947 > >2 | 16061 > >3 | 32443 > > > >Why are there significantly fewer 2s? I understand that random() is > >not > >truly random, and that the seed affects this value. But it still > >confuses me that, no matter how many times I run this, there are > >always > >so few 2s. If it is generating an independent random number in the > >second call to random(), then I don't know why there are more so many > >more 1s than 2s. > > Nope, it's nothing to do with random(), it's that your maths is wrong. > > There are 9 possible cases. In 3 of them you return 1. In 2 of them you > return 2. In the remaining 4 cases you return 3. > > If you were to run this 72451 times I'd expect to see > 1: 24150 = 72451 * 3/9 > 2: 16100 = 72451 * 2/9 > 3: 32200 = 72451 * 4/9 > > Which, unsurprisingly, is fairly close to what you get. > > Cheers, > Steve this looks like an attempt to understand the monty hall problem. http://en.wikipedia.org/wiki/Monty_Hall_problem except that there's no goat :( cheers, raf
On Mon, Feb 23, 2009 at 5:16 PM, raf <raf@raf.org> wrote: > this looks like an attempt to understand the monty hall problem. > http://en.wikipedia.org/wiki/Monty_Hall_problem > except that there's no goat :( And what database management system can be complete without a goat? :)