Am 24.07.22 um 10:15 schrieb Fabien COELHO:
>
> My 0,02€ about this patch:
Thank you for your feedback. I attached a patch, that addresses most of
your points.
> Use (void) when declaring no parameters in headers or in functions.
Done.
> Should the exchange be skipped when i == k?
The additional branch is actually slower (on my machine, tested with an
10M element int array) for 1D-arrays, which i assume is the most common
case. Even with a 2D-array with a subarray size of 1M there is no
difference in execution time. i == k should be relatively rare for large
arrays, given a good prng.
> I do not see the point of having *only* inline functions in a c file.
> The external functions should not be inlined?
Done.
> The random and array shuffling functions share a common state. I'm
> wondering whether it should ot should not be so. It seems ok, but then
> ISTM that the documentation suggests implicitely that setseed applies to
> random() only, which is not the case anymore after the patch.
I really like the idea of a prng state owned by the user, that is used
by all user facing random functions, but see that their might be
concerns about this change. I would update the setseed() documentation,
if this proposal is accepted. Do you think my patch should already
include the documentation update?
> If more samples are required than the number of elements, it does not
> error out. I'm wondering whether it should.
The second argument to array_sample() is treated like a LIMIT, rather
than the actual number of elements. I updated the documentation. My
use-case is: take max random items from an array of unknown size.
> Also, the sampling should not return its result in order when the number
> of elements required is the full array, ISTM that it should be shuffled
> there as well.
You are the second person asking for this. It's done. I am thinking
about ditching array_sample() and replace it with a second signature for
array_shuffle():
array_shuffle(array anyarray) -> anyarray
array_shuffle(array anyarray, limit int) -> anyarray
What do you think?
> I must say that without significant knowledge of the array internal
> implementation, the swap code looks pretty strange. ISTM that the code
> would be clearer if pointers and array references style were not
> intermixed.
Done. Went with pointers.
> Maybe you could add a test with a 3D array? Some sample with NULLs?
Done.
> Unrelated: I notice again that (postgre)SQL does not provide a way to
> generate random integers. I do not see why not. Should we provide one?
Maybe. I think it is outside the scope of this patch.
Thank you, Martin