Re: [PATCH] random_normal function - Mailing list pgsql-hackers

From Paul Ramsey
Subject Re: [PATCH] random_normal function
Date
Msg-id D1722BB1-78A1-48ED-BE61-EF49648BBD6B@cleverelephant.ca
Whole thread Raw
In response to Re: [PATCH] random_normal function  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: [PATCH] random_normal function  (Joe Conway <mail@joeconway.com>)
List pgsql-hackers

> On Dec 9, 2022, at 10:39 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>> On Dec 8, 2022, at 1:53 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>>
>> Just a utility function to generate random numbers from a normal
>> distribution. I find myself doing this several times a year, and I am
>> sure I must not be the only one.
>
> Thanks for the patch.  What do you think about these results?

Angels on pins time! :)

> +-- The semantics of a negative stddev are not well defined
> +SELECT random_normal(mean := 0, stddev := -1);
> +    random_normal
> +---------------------
> + -1.0285744583010896
> +(1 row)

Question is does a negative stddev make enough sense? It is functionally using fabs(stddev),

SELECT avg(random_normal(mean := 0, stddev := -1)) from generate_series(1,1000);
         avg
---------------------
 0.03156106778729526

So could toss an invalid parameter on negative? Not sure if that's more helpful than just being mellow about this
input.


> +
> +SELECT random_normal(mean := 0, stddev := '-Inf');
> + random_normal
> +---------------
> +      Infinity
> +(1 row)

The existing logic around means and stddevs and Inf is hard to tease out:

SELECT avg(v),stddev(v) from (VALUES ('Inf'::float8, '-Inf'::float8)) a(v);
   avg    | stddev
----------+--------
 Infinity |

The return of NULL of stddev would seem to argue that null in this case means "does not compute" at some level. So
returnNULL on Inf stddev? 

> +
> +-- This result may be defensible...
> +SELECT random_normal(mean := '-Inf', stddev := 'Inf');
> + random_normal
> +---------------
> +     -Infinity
> +(1 row)
> +
> +-- but if so, why is this NaN?
> +SELECT random_normal(mean := 'Inf', stddev := 'Inf');
> + random_normal
> +---------------
> +           NaN
> +(1 row)

An Inf mean only implies that one value in the distribution is Inf, but running the function in reverse (generating
values)and only generating one value from the distribution implies we have to always return Inf (except in this case
stddevis also Inf, so I'd go with NULL, assuming we accept the NULL premise above. 

How do you read the tea leaves?

P.





pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: add \dpS to psql
Next
From: Joe Conway
Date:
Subject: Re: [PATCH] random_normal function