Re: [PATCH] Introduce array_shuffle() and array_sample() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PATCH] Introduce array_shuffle() and array_sample()
Date
Msg-id CA+TgmoaPKwPqhmrk7i1jRjqqt5=JDLeGze5wYPav5ietV4OFbw@mail.gmail.com
Whole thread Raw
In response to [PATCH] Introduce array_shuffle() and array_sample()  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Responses Re: [PATCH] Introduce array_shuffle() and array_sample()
List pgsql-hackers
On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher
<martin.kalcher@aboutsource.net> wrote:
> Thanks for all your feedback and help. I got a patch that i consider
> ready for review. It introduces two new functions:
>
>    array_shuffle(anyarray) -> anyarray
>    array_sample(anyarray, integer) -> anyarray
>
> array_shuffle() shuffles an array (obviously). array_sample() picks n
> random elements from an array.

I like this idea.

I think it's questionable whether the behavior of array_shuffle() is
correct for a multi-dimensional array. The implemented behavior is to
keep the dimensions as they were, but permute the elements across all
levels at random. But there are at least two other behaviors that seem
potentially defensible: (1) always return a 1-dimensional array, (2)
shuffle the sub-arrays at the top-level without the possibility of
moving elements within or between sub-arrays. What behavior we decide
is best here should be documented.

array_sample() will return elements in random order when sample_size <
array_size, but in the original order when sample_size >= array_size.
Similarly, it will always return a 1-dimensional array in the former
case, but will keep the original dimensions in the latter case. That
seems pretty hard to defend. I think it should always return a
1-dimensional array with elements in random order, and I think this
should be documented.

I also think you should add test cases involving multi-dimensional
arrays, as well as arrays with non-default bounds. e.g. trying
shuffling or sampling some values like
'[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[]

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [RFC] building postgres with meson - v10
Next
From: Robert Haas
Date:
Subject: Re: pg15b2: large objects lost on upgrade