Re: New function normal_rand_array function to contrib/tablefunc. - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: New function normal_rand_array function to contrib/tablefunc. |
Date | |
Msg-id | CAEZATCX_Rd24mS4ScEYV=cLKMoQEJX=+oq40kYbok8rw211W8g@mail.gmail.com Whole thread Raw |
In response to | Re: New function normal_rand_array function to contrib/tablefunc. (Jim Jones <jim.jones@uni-muenster.de>) |
List | pgsql-hackers |
On Tue, 2 Jul 2024 at 11:18, Jim Jones <jim.jones@uni-muenster.de> wrote: > > When either minval or maxval exceeds int4 the function cannot be > executed/found > > SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); > > ERROR: function normal_rand_array(integer, integer, integer, bigint) > does not exist > LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); > ^ > HINT: No function matches the given name and argument types. You might > need to add explicit type casts. > This could be solved by defining separate functions for each supported type, rather than one function with type anyelement. Internally, they could continue to share common code, of course. > In some cases the function returns an empty array. Is it also expected? > Perhaps it would be useful to have separate minimum and maximum array length arguments, rather than a mean array length argument. Actually, I find the current behaviour somewhat counterintuitive. Only after reading the source code did I realise what it's actually doing, which is this: Row 1: array of random length in range [0, meanarraylen] Row 2: array of length 2*meanarraylen - length of previous array Row 3: array of random length in range [0, meanarraylen] Row 4: array of length 2*meanarraylen - length of previous array ... That's far from obvious (it's certainly not documented) and I don't think it's a particularly good way of achieving a specified mean array length, because only half the lengths are random. One thing that's definitely needed is more documentation. It should have its own new subsection, like the other tablefunc functions. Something else that confused me is why this function is called normal_rand_array(). The underlying random functions that it's calling return random values from a uniform distribution, not a normal distribution. Arrays of normally distributed random values might also be useful, but that's not what this patch is doing. Also, the function accepts float8 minval and maxval arguments, and then simply ignores them and returns random float8 values in the range [0,1), which is highly counterintuitive. My suggestion would be to mirror the signatures of the core random() functions more closely, and have this: 1). rand_array(numvals int, minlen int, maxlen int) returns setof float8[] 2). rand_array(numvals int, minlen int, maxlen int, minval int, maxval int) returns setof int[] 3). rand_array(numvals int, minlen int, maxlen int, minval bigint, maxval bigint) returns setof bigint[] 4). rand_array(numvals int, minlen int, maxlen int, minval numeric, maxval numeric) returns setof numeric[] Also, I'd recommend giving the function arguments names in SQL, like this, since that makes them easier to use. Something else that's not obvious is that this patch is relying on the core random functions, which means that it's using the same PRNG state as those functions. That's probably OK, but it should be documented, because it's different from tablefunc's normal_rand() function, which uses pg_global_prng_state. In particular, what this means is that calling setseed() will affect the output of these new functions, but not normal_rand(). Incidentally, that makes writing tests much simpler -- just call setseed() first and the output will be repeatable. Regards, Dean
pgsql-hackers by date: