Thread: random() function produces wrong range

random() function produces wrong range

From
Tom Lane
Date:
The comment in the random() function indicates that its author thought
it'd produce output in the range 0..1, which seems like a pretty
reasonable definition:

    /* result 0.0-1.0 */
    result = ((double) random()) / RAND_MAX;

Unfortunately, at least on my box, it produces no such thing.  random()
actually yields values in the range 0..2^31-1 --- while RAND_MAX is
only 32767, because it applies to the rand() function not random().
So what I actually get is floating-point output in the range 0..65535.

regression=# select random();
      random
------------------
 35771.3981139561
(1 row)

regression=# select random();
      random
------------------
 58647.5821405683
(1 row)

This is, to say the least, a bizarre definition.

I would like to propose changing the code to

    /* result 0.0-1.0 */
    result = ((double) random()) / INT_MAX;

(and making the corresponding change in setseed()).  But I wonder if
anyone out there has applications that depend on the current behavior.

As far as I can find, random() isn't mentioned in the documentation
currently, so there probably aren't many people using it...

            regards, tom lane

Re: [GENERAL] random() function produces wrong range

From
Stephan Szabo
Date:
On Tue, 1 Aug 2000, Tom Lane wrote:

> The comment in the random() function indicates that its author thought
> it'd produce output in the range 0..1, which seems like a pretty
> reasonable definition:
>
>     /* result 0.0-1.0 */
>     result = ((double) random()) / RAND_MAX;
>
> Unfortunately, at least on my box, it produces no such thing.  random()
> actually yields values in the range 0..2^31-1 --- while RAND_MAX is
> only 32767, because it applies to the rand() function not random().

> I would like to propose changing the code to
>
>     /* result 0.0-1.0 */
>     result = ((double) random()) / INT_MAX;
>
> (and making the corresponding change in setseed()).  But I wonder if
> anyone out there has applications that depend on the current behavior.

Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit machines.


Re: [GENERAL] random() function produces wrong range

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> Actually, on my machines, both man pages for rand() and random() say
> they return values between 0 and RAND_MAX (whether that's true or not
> is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> be a problem, but it might be problematic on some of the 64 bit machines.

Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is.  But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().

            regards, tom lane

Re: [GENERAL] random() function produces wrong range

From
Tom Lane
Date:
"Mike Sears" <msears@vianet.ca> writes:
> What build of postgres is this running on.

I think random() is new in 7.0.

            regards, tom lane

Re: [GENERAL] random() function produces wrong range

From
Stephan Szabo
Date:
On Tue, 1 Aug 2000, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > Actually, on my machines, both man pages for rand() and random() say
> > they return values between 0 and RAND_MAX (whether that's true or not
> > is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> > be a problem, but it might be problematic on some of the 64 bit machines.
>
> Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
> to random() on some machines that'd probably explain why the code is
> written like it is.  But on my box (HPUX) the rand() function is old
> and crufty and considerably different from random().

That's from a pair of linux boxes, although checking on a FreeBSD box a
friend has, his boxes man pages show the range as explicitly 0 to 2^31-1
as your box does.



Re: Re: [GENERAL] random() function produces wrong range

From
Dave Smith
Date:
Stephan Szabo wrote:
>
> On Tue, 1 Aug 2000, Tom Lane wrote:
>
> > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > > Actually, on my machines, both man pages for rand() and random() say
> > > they return values between 0 and RAND_MAX (whether that's true or not
> > > is another matter).  In my case RAND_MAX==INT_MAX so the change wouldn't
> > > be a problem, but it might be problematic on some of the 64 bit machines.
> >
> > Oh, that's interesting.  What platform do you use?  If RAND_MAX applies
> > to random() on some machines that'd probably explain why the code is
> > written like it is.  But on my box (HPUX) the rand() function is old
> > and crufty and considerably different from random().
>
> That's from a pair of linux boxes, although checking on a FreeBSD box a
> friend has, his boxes man pages show the range as explicitly 0 to 2^31-1
> as your box does.

On my SCO 5.0.4 box, rand() from the man page...

The rand function uses a multiplicative congruential random-number
generator
  with period 2^32 that returns successive pseudo-random numbers in the
range
  from 0 to (2^15)-1.


The following functions define the semantics of the functions rand and
srand.

     static unsigned long int next = 1;
     int rand()
     {
          next = next * 1103515245 + 12345;
          return ((unsigned int)(next/65536) % 32768);
     }
     void srand(seed)
     unsigned int seed;
     {
          next = seed;
     }


--
Dave Smith
Candata Systems Ltd.
(416) 493-9020
dave@candata.com

Re: random() function produces wrong range

From
Thomas Lockhart
Date:
> The comment in the random() function indicates that its author thought
> it'd produce output in the range 0..1, which seems like a pretty
> reasonable definition:
> Unfortunately, at least on my box, it produces no such thing.  random()
> actually yields values in the range 0..2^31-1 --- while RAND_MAX is
> only 32767, because it applies to the rand() function not random().
> So what I actually get is floating-point output in the range 0..65535.
> This is, to say the least, a bizarre definition.

Or, a bizarre machine. Linux (where I did the testing) produces the
expected result.

> I would like to propose changing the code to
>         /* result 0.0-1.0 */
>         result = ((double) random()) / INT_MAX;

Erk...

Actually, I depend on the behavior being as advertised, which is what
you would have expected. This is true on Linux boxes from RedHat-5.2 to
Mandrake-7.1. Not sure why your machine is different, but if there is a
more portable way to define this let's find it. Otherwise, get used to
#ifdef HPUX ;)

The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one). Would it be better to try using rand() instead?

                     - Thomas

Re: random() function produces wrong range

From
Guo Bin
Date:
Hope this function won't be removed. I'm using it
to select a random row like "select xxx from yyy
order by random() limit 1". It's painful to do it
before 7.0.

Regards,
-- 
Guo Bin

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> As far as I can find, random() isn't mentioned in the
> documentation
> currently, so there probably aren't many people using
> it...
> 
>             regards, tom lane


__________________________________________________
Do You Yahoo!?
Kick off your party with Yahoo! Invites.
http://invites.yahoo.com/


Re: random() function produces wrong range

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> The Linux man pages indicate that the behavior and underlying
> implementation of random() and rand() are the same (so I just picked
> one).

Ah, well, there's your problem.  Whoever did this part of the library
on Linux took shortcuts.  On older-line systems, rand() is a
considerably older and crummier generator than random().  It would
definitely not be a wise decision to use rand() instead.

It appears that on SysV-heritage machines, rand() delivers 15-bit
results (which is what I'm getting) whereas on BSD-heritage platforms
it produces 31-bit results.  But even the BSD machines say

     The spectral properties of rand() leave a great deal  to  be
     desired.   drand48(3)  and  random(3)  provide  much better,
     though more elaborate, random-number generators.

(quote from SunOS 4.1 man page for rand()).

I believe using random() is the right thing.  The portability bug here
is the assumption that RAND_MAX applies to random() (or is even defined;
none of the man pages I've looked at so far mention it).  But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.

            regards, tom lane

Re: random() function produces wrong range

From
Malcolm Beattie
Date:
Tom Lane writes:
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> > The Linux man pages indicate that the behavior and underlying
> > implementation of random() and rand() are the same (so I just picked
> > one).
> 
> Ah, well, there's your problem.  Whoever did this part of the library
> on Linux took shortcuts.

Why? Linux is compliant with the Single Unix Standard v2 from my brief
reading of it. The only fragile part is the man page for random(3)
which on Linux says "range from 0 to RAND_MAX" whereas SuSv2 says
"range from 0 to 2^31-1". Since RAND_MAX on Linux is actually 2^31-1
anyway, it is still correct albeit misleading. Documentation bug, say.

> I believe using random() is the right thing.  The portability bug here
> is the assumption that RAND_MAX applies to random() (or is even defined;
> none of the man pages I've looked at so far mention it).  But all the
> machines say that the output of random() is 31 bits, so INT_MAX should
> work.

SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
be non-portable to platforms with 64-bit ints, for example.

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services


Re: random() function produces wrong range

From
Tom Lane
Date:
Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes:
> Tom Lane writes:
>> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>>>> The Linux man pages indicate that the behavior and underlying
>>>> implementation of random() and rand() are the same (so I just picked
>>>> one).
>>
>> Ah, well, there's your problem.  Whoever did this part of the library
>> on Linux took shortcuts.

> Why? Linux is compliant with the Single Unix Standard v2 from my brief
> reading of it.

Sorry, bad choice of words on my part.  Linux is within its rights to
use the same underlying implementation for rand() and random(), but it
is a poor guide to the behavior of other systems, which are equally
within their rights not to.

>> none of the man pages I've looked at so far mention it).  But all the
>> machines say that the output of random() is 31 bits, so INT_MAX should
>> work.

> SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
> be non-portable to platforms with 64-bit ints, for example.

Maybe.  You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1?  The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.

I suspect we have a good chance at getting burned no matter what we use
:-(.  But RAND_MAX is definitely the wrong thing.

            regards, tom lane

Re: random() function produces wrong range

From
Thomas Swan
Date:
>> none of the man pages I've looked at so far mention it).  But all the
>> machines say that the output of random() is 31 bits, so INT_MAX should
>> work.

> SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
> be non-portable to platforms with 64-bit ints, for example.

Maybe.  You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1?  The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.

I suspect we have a good chance at getting burned no matter what we use
:-(.  But RAND_MAX is definitely the wrong thing.

Is it possible to test (during configure phase) and then go from there... or does it need to be the same for all platforms?

-
- Thomas Swan                                   
- Graduate Student  - Computer Science
- The University of Mississippi
-
- "People can be categorized into two fundamental
- groups, those that divide people into two groups
- and those that don't."

Re: random() function produces wrong range

From
Tom Lane
Date:
Thomas Swan <tswan@olemiss.edu> writes:
>> I suspect we have a good chance at getting burned no matter what we use
>> :-(.  But RAND_MAX is definitely the wrong thing.

> Is it possible to test (during configure phase) and then go from there...
> or does it need to be the same for all platforms?

I thought about that last night.  We could do a configure test.  Since
it'd be probing random() results there'd be a small probability of
failure, but if we wire in an assumption that the max value must be
2^(15 + n*16)-1 for some n, ten or so probes would give us a failure
probability on the order of 2^-160, which ought to satisfy anyone.

However, in the absence of any hard evidence that there are platforms
where the value is different from 2^31-1, it's probably just a waste of
configuration cycles at the moment.

I suggest we add a config.h constant like

/* The local random() function yields values 0 .. MAX_RANDOM_VALUE */
#define MAX_RANDOM_VALUE  <2^31-1>

and use that in the code.  Then, if we ever find a platform where
random() does actually produce 64-bit results, it'll be time enough
to crank up a real configure test to set the value.

Comments?

            regards, tom lane

Re: random() function produces wrong range

From
Roland Roberts
Date:
-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Thomas" == Thomas Swan <tswan@olemiss.edu> writes:

    Thomas> Is it possible to test (during configure phase) and then
    Thomas> go from there...  or does it need to be the same for all
    Thomas> platforms?

You should be able to test, but it is, of course, a statistical test.
Call random() several times and test the maximum value against your
thresholds of 2^15 and 2^31.  If random() is generating values in the
range 1:2^31-1, you would expect half of your values to be greater
than 2^15-1; more importantly, if you generate, say, 10 values, you
expect only a 1:1024 chance that they are all below 2^15.  If those
odds aren't good enough, generate more.

roland
- --
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                    Unix Software Solutions
roberts@panix.com                      76-15 113th Street, Apt 3B
rbroberts@acm.org                          Forest Hills, NY 11375

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface

iQCVAwUBOYmNSeoW38lmvDvNAQHFegQAlexEvaG0t+1H0IkPWikbdMUIck1fE0DI
rfcGi1M/SQ6K9Hmvi1HB/SVEU4DKGaHDoqlU7ei78OgOzchbsLL5cqAJNIsKv5QJ
F4u/A0Fg7yuyRZ3/CNCo0+nTwhyDANktMw78AM5ssHCs75UxC+vHWibHHBmXJzrF
8WD2LyjPSNI=
=dR25
-----END PGP SIGNATURE-----

Re: random() function produces wrong range

From
Malcolm Beattie
Date:
Tom Lane writes:
> I thought about that last night.  We could do a configure test.  Since
> it'd be probing random() results there'd be a small probability of
> failure, but if we wire in an assumption that the max value must be
> 2^(15 + n*16)-1 for some n, ten or so probes would give us a failure
> probability on the order of 2^-160, which ought to satisfy anyone.
> 
> However, in the absence of any hard evidence that there are platforms
> where the value is different from 2^31-1, it's probably just a waste of
> configuration cycles at the moment.
> 
> I suggest we add a config.h constant like
> 
> /* The local random() function yields values 0 .. MAX_RANDOM_VALUE */
> #define MAX_RANDOM_VALUE  <2^31-1>
> 
> and use that in the code.  Then, if we ever find a platform where
> random() does actually produce 64-bit results, it'll be time enough
> to crank up a real configure test to set the value.
> 
> Comments?

If any platform *does* produced 64-bit results, it won't be compliant
with SUSv2 which states explicitly that the resulting range is up to
2^31-1. Since most portability problems are with older platforms which
haven't caught up, I'd be hopeful that any new 64-bit-int platforms
would get it right from the outset. Maybe I'm being over-optimistic :-)

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>  I am looking for a Linux (and
Unix Systems Programmer                  maybe Apache/mod_perl) job/contract
Oxford University Computing Services   http://users.ox.ac.uk/~mbeattie/cv.html


Re: [GENERAL] Re: random() function produces wrong range

From
Tom Lane
Date:
Roland Roberts <roberts@panix.com> writes:
> Call random() several times and test the maximum value against your
> thresholds of 2^15 and 2^31.  If random() is generating values in the
> range 1:2^31-1, you would expect half of your values to be greater
> than 2^15-1; more importantly, if you generate, say, 10 values, you
> expect only a 1:1024 chance that they are all below 2^15.

Actually the odds are far better than that.  If the range is 2^31-1
then only about 2^-16th of the outputs should be less than 2^15.
So ten probes gives you a failure probability of about 2^-160 not
2^-10.

Generalizing, you could tell the difference between widths of 31,
47, or 63 bits with the same level of reliability.

            regards, tom lane

Re: [GENERAL] Re: random() function produces wrong range

From
Roland Roberts
Date:
-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

    Tom> Actually the odds are far better than that.  If the range is
    Tom> 2^31-1 then only about 2^-16th of the outputs should be less
    Tom> than 2^15.  So ten probes gives you a failure probability of
    Tom> about 2^-160 not 2^-10.

Oops, 2^16 != 2^32 / 2.

So a dynamic test is not only possible but wouldn't cost to much at
configure time.

roland
- --
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                    Unix Software Solutions
roberts@panix.com                      76-15 113th Street, Apt 3B
rbroberts@acm.org                          Forest Hills, NY 11375

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface

iQCVAwUBOYmtf+oW38lmvDvNAQGWYwP/eXRtrDPu/xN+W9pCd9y34d4jbrPH7jku
nBAuSYtCRyoMgTkjdCtqThzq3vzPLDwfmOZcmWP8W5AmQPJjvcdFwI7y1XgGlaxd
aAIlqqf+TTkZwIUh2vnWTuu5JKkiAZI6UuzNSzy79O/frxKE2y97zCuMw02I0kMK
iGNSybN3L5w=
=36yP
-----END PGP SIGNATURE-----

Re: random() function produces wrong range

From
Christopher Masto
Date:
On Thu, Aug 03, 2000 at 11:45:39AM -0400, Tom Lane wrote:
> Actually the odds are far better than that.  If the range is 2^31-1
> then only about 2^-16th of the outputs should be less than 2^15.
> So ten probes gives you a failure probability of about 2^-160 not
> 2^-10.

It occurs to me that Perl has to provide a portable rand function, so
I looked at how its Configure script works.  It's pretty much what
you've been discussing.  First it checks for a couple of possible
random functions (preferring drand48(), then random(), then bitching
and using rand()).

int main()
{
        register int i;
        register unsigned long tmp;
        register unsigned long max = 0L;

        for (i = 1000; i; i--) {
                tmp = (unsigned long) $randfunc();
                if (tmp > max) max = tmp;
        }
        for (i = 0; max; i++)
                max /= 2;
        printf("%d\n",i);
}

Oh well.
--
Christopher Masto         Senior Network Monkey      NetMonger Communications
chris@netmonger.net        info@netmonger.net        http://www.netmonger.net

Free yourself, free your machine, free the daemon -- http://www.freebsd.org/

Differences between int8 and int4 as pkeys and fkeys

From
Thomas Swan
Date:
<div>Perhaps a little off topic.</div><br /><div>I had been using int8 as primary / foreign keys pairs.   After joining
severallarge tables on the keys, I noticed about a 30/40% speed improvement just in changing the keys from int8 to int4
datatypes.</div><br /> Should it affect it that much? <br /> - <br /> - <b><u>Thomas Swan</u></b>
                                  <br/> - Graduate Student  - Computer Science<br /> - The University of Mississippi<br
/>- <br /> - "People can be categorized into two fundamental <br /> - groups, those that divide people into two groups
<br/> - and those that don't." 

Re: Differences between int8 and int4 as pkeys and fkeys

From
Thomas Lockhart
Date:
> I had been using int8 as primary / foreign keys pairs.   After joining
> several large tables on the keys, I noticed about a 30/40% speed
> improvement just in changing the keys from int8 to int4 data types.

Apparently it should be that much (though this is the first report
either way). Here are some reasons why there is the difference:

1) int4 is "pass by value", int8 is "pass by reference". So int8 hits
palloc() every time you generate a new one, whereas int4 may just copy
the data itself.

2) int8 is implemented in software libraries, at least on 32-bit
machines. int4 is implemented in hardware. Libraries are slower than
~1cycle hardware operations.
                         - Thomas