Thread: random() function produces wrong range

random() function produces wrong range

From
Tom Lane
Date:
The comment in the random() function indicates that its author thought
it'd produce output in the range 0..1, which seems like a pretty
reasonable definition:

    /* result 0.0-1.0 */
    result = ((double) random()) / RAND_MAX;

Unfortunately, at least on my box, it produces no such thing.  random()
actually yields values in the range 0..2^31-1 --- while RAND_MAX is
only 32767, because it applies to the rand() function not random().
So what I actually get is floating-point output in the range 0..65535.

regression=# select random();
      random
------------------
 35771.3981139561
(1 row)

regression=# select random();
      random
------------------
 58647.5821405683
(1 row)

This is, to say the least, a bizarre definition.

I would like to propose changing the code to

    /* result 0.0-1.0 */
    result = ((double) random()) / INT_MAX;

(and making the corresponding change in setseed()).  But I wonder if
anyone out there has applications that depend on the current behavior.

As far as I can find, random() isn't mentioned in the documentation
currently, so there probably aren't many people using it...

            regards, tom lane

Re: random() function produces wrong range

From
Stephan Szabo
Date:
On Tue, 1 Aug 2000, Tom Lane wrote:

> The comment in the random() function indicates that its author thought
> it'd produce output in the range 0..1, which seems like a pretty
> reasonable definition:
>
>     /* result 0.0-1.0 */
>     result = ((double) random()) / RAND_MAX;
>
> Unfortunately, at least on my box, it produces no such thing.  random()
> actually yields values in the range 0..2^31-1 --- while RAND_MAX is
> only 32767, because it applies to the rand() function not random().

> I would like to propose changing the code to
>
>     /* result 0.0-1.0 */
>     result = ((double) random()) / INT_MAX;
>
> (and making the corresponding change in setseed()).  But I wonder if
> anyone out there has applications that depend on the current behavior.

Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit machines.


Re: random() function produces wrong range

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> Actually, on my machines, both man pages for rand() and random() say
> they return values between 0 and RAND_MAX (whether that's true or not
> is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> be a problem, but it might be problematic on some of the 64 bit machines.

Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is.  But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().

            regards, tom lane

Re: random() function produces wrong range

From
"Mike Sears"
Date:
What build of postgres is this running on. I've tried this on 6x and it
doesn't seem to work.


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>
Cc: <pgsql-hackers@postgreSQL.org>; <pgsql-general@postgreSQL.org>
Sent: Tuesday, August 01, 2000 11:23 AM
Subject: Re: [GENERAL] random() function produces wrong range


> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > Actually, on my machines, both man pages for rand() and random() say
> > they return values between 0 and RAND_MAX (whether that's true or not
> > is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> > be a problem, but it might be problematic on some of the 64 bit
machines.
>
> Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
> to random() on some machines that'd probably explain why the code is
> written like it is.  But on my box (HPUX) the rand() function is old
> and crufty and considerably different from random().
>
> regards, tom lane
>


Re: random() function produces wrong range

From
Tom Lane
Date:
"Mike Sears" <msears@vianet.ca> writes:
> What build of postgres is this running on.

I think random() is new in 7.0.

            regards, tom lane

Re: random() function produces wrong range

From
Stephan Szabo
Date:
On Tue, 1 Aug 2000, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > Actually, on my machines, both man pages for rand() and random() say
> > they return values between 0 and RAND_MAX (whether that's true or not
> > is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> > be a problem, but it might be problematic on some of the 64 bit machines.
>
> Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
> to random() on some machines that'd probably explain why the code is
> written like it is.  But on my box (HPUX) the rand() function is old
> and crufty and considerably different from random().

That's from a pair of linux boxes, although checking on a FreeBSD box a
friend has, his boxes man pages show the range as explicitly 0 to 2^31-1
as your box does.



Re: [HACKERS] Re: random() function produces wrong range

From
Dave Smith
Date:
Stephan Szabo wrote:
>
> On Tue, 1 Aug 2000, Tom Lane wrote:
>
> > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > > Actually, on my machines, both man pages for rand() and random() say
> > > they return values between 0 and RAND_MAX (whether that's true or not
> > > is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> > > be a problem, but it might be problematic on some of the 64 bit machines.
> >
> > Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
> > to random() on some machines that'd probably explain why the code is
> > written like it is.  But on my box (HPUX) the rand() function is old
> > and crufty and considerably different from random().
>
> That's from a pair of linux boxes, although checking on a FreeBSD box a
> friend has, his boxes man pages show the range as explicitly 0 to 2^31-1
> as your box does.

On my SCO 5.0.4 box, rand() from the man page...

The rand function uses a multiplicative congruential random-number
generator
  with period 2^32 that returns successive pseudo-random numbers in the
range
  from 0 to (2^15)-1.


The following functions define the semantics of the functions rand and
srand.

     static unsigned long int next = 1;
     int rand()
     {
          next = next * 1103515245 + 12345;
          return ((unsigned int)(next/65536) % 32768);
     }
     void srand(seed)
     unsigned int seed;
     {
          next = seed;
     }


--
Dave Smith
Candata Systems Ltd.
(416) 493-9020
dave@candata.com

Re: [HACKERS] random() function produces wrong range

From
Thomas Lockhart
Date:
> The comment in the random() function indicates that its author thought
> it'd produce output in the range 0..1, which seems like a pretty
> reasonable definition:
> Unfortunately, at least on my box, it produces no such thing.  random()
> actually yields values in the range 0..2^31-1 --- while RAND_MAX is
> only 32767, because it applies to the rand() function not random().
> So what I actually get is floating-point output in the range 0..65535.
> This is, to say the least, a bizarre definition.

Or, a bizarre machine. Linux (where I did the testing) produces the
expected result.

> I would like to propose changing the code to
>         /* result 0.0-1.0 */
>         result = ((double) random()) / INT_MAX;

Erk...

Actually, I depend on the behavior being as advertised, which is what
you would have expected. This is true on Linux boxes from RedHat-5.2 to
Mandrake-7.1. Not sure why your machine is different, but if there is a
more portable way to define this let's find it. Otherwise, get used to
#ifdef HPUX ;)

The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one). Would it be better to try using rand() instead?

                     - Thomas

Re: [HACKERS] random() function produces wrong range

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> The Linux man pages indicate that the behavior and underlying
> implementation of random() and rand() are the same (so I just picked
> one).

Ah, well, there's your problem.  Whoever did this part of the library
on Linux took shortcuts.  On older-line systems, rand() is a
considerably older and crummier generator than random().  It would
definitely not be a wise decision to use rand() instead.

It appears that on SysV-heritage machines, rand() delivers 15-bit
results (which is what I'm getting) whereas on BSD-heritage platforms
it produces 31-bit results.  But even the BSD machines say

     The spectral properties of rand() leave a great deal  to  be
     desired.   drand48(3)  and  random(3)  provide  much better,
     though more elaborate, random-number generators.

(quote from SunOS 4.1 man page for rand()).

I believe using random() is the right thing.  The portability bug here
is the assumption that RAND_MAX applies to random() (or is even defined;
none of the man pages I've looked at so far mention it).  But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.

            regards, tom lane

Re: [HACKERS] random() function produces wrong range

From
Tom Lane
Date:
Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes:
> Tom Lane writes:
>> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>>>> The Linux man pages indicate that the behavior and underlying
>>>> implementation of random() and rand() are the same (so I just picked
>>>> one).
>>
>> Ah, well, there's your problem.  Whoever did this part of the library
>> on Linux took shortcuts.

> Why? Linux is compliant with the Single Unix Standard v2 from my brief
> reading of it.

Sorry, bad choice of words on my part.  Linux is within its rights to
use the same underlying implementation for rand() and random(), but it
is a poor guide to the behavior of other systems, which are equally
within their rights not to.

>> none of the man pages I've looked at so far mention it).  But all the
>> machines say that the output of random() is 31 bits, so INT_MAX should
>> work.

> SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
> be non-portable to platforms with 64-bit ints, for example.

Maybe.  You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1?  The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.

I suspect we have a good chance at getting burned no matter what we use
:-(.  But RAND_MAX is definitely the wrong thing.

            regards, tom lane

Re: [HACKERS] random() function produces wrong range

From
Thomas Swan
Date:
>> none of the man pages I've looked at so far mention it).  But all the
>> machines say that the output of random() is 31 bits, so INT_MAX should
>> work.

> SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
> be non-portable to platforms with 64-bit ints, for example.

Maybe.  You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1?  The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.

I suspect we have a good chance at getting burned no matter what we use
:-(.  But RAND_MAX is definitely the wrong thing.

Is it possible to test (during configure phase) and then go from there... or does it need to be the same for all platforms?

-
- Thomas Swan                                   
- Graduate Student  - Computer Science
- The University of Mississippi
-
- "People can be categorized into two fundamental
- groups, those that divide people into two groups
- and those that don't."

Re: [HACKERS] random() function produces wrong range

From
Tom Lane
Date:
Thomas Swan <tswan@olemiss.edu> writes:
>> I suspect we have a good chance at getting burned no matter what we use
>> :-(.  But RAND_MAX is definitely the wrong thing.

> Is it possible to test (during configure phase) and then go from there...
> or does it need to be the same for all platforms?

I thought about that last night.  We could do a configure test.  Since
it'd be probing random() results there'd be a small probability of
failure, but if we wire in an assumption that the max value must be
2^(15 + n*16)-1 for some n, ten or so probes would give us a failure
probability on the order of 2^-160, which ought to satisfy anyone.

However, in the absence of any hard evidence that there are platforms
where the value is different from 2^31-1, it's probably just a waste of
configuration cycles at the moment.

I suggest we add a config.h constant like

/* The local random() function yields values 0 .. MAX_RANDOM_VALUE */
#define MAX_RANDOM_VALUE  <2^31-1>

and use that in the code.  Then, if we ever find a platform where
random() does actually produce 64-bit results, it'll be time enough
to crank up a real configure test to set the value.

Comments?

            regards, tom lane

Re: [HACKERS] random() function produces wrong range

From
Roland Roberts
Date:
-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Thomas" == Thomas Swan <tswan@olemiss.edu> writes:

    Thomas> Is it possible to test (during configure phase) and then
    Thomas> go from there...  or does it need to be the same for all
    Thomas> platforms?

You should be able to test, but it is, of course, a statistical test.
Call random() several times and test the maximum value against your
thresholds of 2^15 and 2^31.  If random() is generating values in the
range 1:2^31-1, you would expect half of your values to be greater
than 2^15-1; more importantly, if you generate, say, 10 values, you
expect only a 1:1024 chance that they are all below 2^15.  If those
odds aren't good enough, generate more.

roland
- --
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                    Unix Software Solutions
roberts@panix.com                      76-15 113th Street, Apt 3B
rbroberts@acm.org                          Forest Hills, NY 11375

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface

iQCVAwUBOYmNSeoW38lmvDvNAQHFegQAlexEvaG0t+1H0IkPWikbdMUIck1fE0DI
rfcGi1M/SQ6K9Hmvi1HB/SVEU4DKGaHDoqlU7ei78OgOzchbsLL5cqAJNIsKv5QJ
F4u/A0Fg7yuyRZ3/CNCo0+nTwhyDANktMw78AM5ssHCs75UxC+vHWibHHBmXJzrF
8WD2LyjPSNI=
=dR25
-----END PGP SIGNATURE-----

Re: Re: [HACKERS] random() function produces wrong range

From
Tom Lane
Date:
Roland Roberts <roberts@panix.com> writes:
> Call random() several times and test the maximum value against your
> thresholds of 2^15 and 2^31.  If random() is generating values in the
> range 1:2^31-1, you would expect half of your values to be greater
> than 2^15-1; more importantly, if you generate, say, 10 values, you
> expect only a 1:1024 chance that they are all below 2^15.

Actually the odds are far better than that.  If the range is 2^31-1
then only about 2^-16th of the outputs should be less than 2^15.
So ten probes gives you a failure probability of about 2^-160 not
2^-10.

Generalizing, you could tell the difference between widths of 31,
47, or 63 bits with the same level of reliability.

            regards, tom lane

Re: Re: [HACKERS] random() function produces wrong range

From
Roland Roberts
Date:
-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

    Tom> Actually the odds are far better than that.  If the range is
    Tom> 2^31-1 then only about 2^-16th of the outputs should be less
    Tom> than 2^15.  So ten probes gives you a failure probability of
    Tom> about 2^-160 not 2^-10.

Oops, 2^16 != 2^32 / 2.

So a dynamic test is not only possible but wouldn't cost to much at
configure time.

roland
- --
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                    Unix Software Solutions
roberts@panix.com                      76-15 113th Street, Apt 3B
rbroberts@acm.org                          Forest Hills, NY 11375

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface

iQCVAwUBOYmtf+oW38lmvDvNAQGWYwP/eXRtrDPu/xN+W9pCd9y34d4jbrPH7jku
nBAuSYtCRyoMgTkjdCtqThzq3vzPLDwfmOZcmWP8W5AmQPJjvcdFwI7y1XgGlaxd
aAIlqqf+TTkZwIUh2vnWTuu5JKkiAZI6UuzNSzy79O/frxKE2y97zCuMw02I0kMK
iGNSybN3L5w=
=36yP
-----END PGP SIGNATURE-----

Re: [HACKERS] random() function produces wrong range

From
Chris Jones
Date:
Tom Lane <tgl@sss.phg.pa.us> writes:

> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> > The Linux man pages indicate that the behavior and underlying
> > implementation of random() and rand() are the same (so I just picked
> > one).
>
> Ah, well, there's your problem.  Whoever did this part of the library
> on Linux took shortcuts.  On older-line systems, rand() is a
> considerably older and crummier generator than random().  It would
> definitely not be a wise decision to use rand() instead.
>
> I believe using random() is the right thing.  The portability bug here
> is the assumption that RAND_MAX applies to random() (or is even defined;
> none of the man pages I've looked at so far mention it).  But all the
> machines say that the output of random() is 31 bits, so INT_MAX should
> work.

On an i386 machine, certainly; but not on an Alpha or a Sparc.
Probably safer to use (2**31)-1, which is what my (NetBSD) man page
says.

FWIW, I believe random(3) running on NetBSD/alpha, for example, will
return a 31-bit result.

Chris

--
---------------------------------------------------- cjones@rightnowtech.com
Chris Jones
           System Administrator, RightNow Technologies
"Is this going to be a stand-up programming session, sir, or another bug hunt?"

Re: random() function produces wrong range

From
Christopher Masto
Date:
On Thu, Aug 03, 2000 at 11:45:39AM -0400, Tom Lane wrote:
> Actually the odds are far better than that.  If the range is 2^31-1
> then only about 2^-16th of the outputs should be less than 2^15.
> So ten probes gives you a failure probability of about 2^-160 not
> 2^-10.

It occurs to me that Perl has to provide a portable rand function, so
I looked at how its Configure script works.  It's pretty much what
you've been discussing.  First it checks for a couple of possible
random functions (preferring drand48(), then random(), then bitching
and using rand()).

int main()
{
        register int i;
        register unsigned long tmp;
        register unsigned long max = 0L;

        for (i = 1000; i; i--) {
                tmp = (unsigned long) $randfunc();
                if (tmp > max) max = tmp;
        }
        for (i = 0; max; i++)
                max /= 2;
        printf("%d\n",i);
}

Oh well.
--
Christopher Masto         Senior Network Monkey      NetMonger Communications
chris@netmonger.net        info@netmonger.net        http://www.netmonger.net

Free yourself, free your machine, free the daemon -- http://www.freebsd.org/