Re: Proposal to introduce a shuffle function to intarray extension - Mailing list pgsql-general

From Martin Kalcher
Subject Re: Proposal to introduce a shuffle function to intarray extension
Date
Msg-id 7933c4a5-0e72-889e-1be9-e3574ffe333b@aboutsource.net
Whole thread Raw
In response to Re: Proposal to introduce a shuffle function to intarray extension  (Mladen Gogala <gogala.mladen@gmail.com>)
Responses Re: Proposal to introduce a shuffle function to intarray extension  (Mladen Gogala <gogala.mladen@gmail.com>)
List pgsql-general
Am 16.07.22 um 18:53 schrieb Mladen Gogala:
> On 7/15/22 04:36, Martin Kalcher wrote:
>> Dear list,
>>
>> i am dealing with an application that processes fairly large arrays of
>> integers. It makes heavy use of the intarray extension, which works
>> great in most cases. However, there are two requirements that cannot
>> be addressed by the extension and are rather slow with plain SQL. Both
>> can be met with shuffling:
>>
>> - Taking n random members from an integer array
>> - Splitting an array into n chunks, where each member is assigned to a
>> random chunk
>>
>> Shuffling is currently implemented by unnesting the array, ordering
>> the members by random() and aggregating them again.
> 
> 
> Martin, have you considered PL/Python and NumPy module?

Hey Mladen,

thank you for your advice. Unfortunately the performance of shuffling 
with NumPy is about the same as with SQL.

   create function numpy_shuffle(arr int[])
   returns int[]
   as $$
     import numpy
     numpy.random.shuffle(arr)
     return arr
   $$ language 'plpython3u';

   select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
   from (
     select numpy_shuffle(arr) arr from numbers
   ) shuffled;

   -------------------------------------------------------
    {674026,3306457,1727170} ... {343875,3825484,1235246}

   Time: 2315.431 ms (00:02.315)

Am i doing something wrong?

Martin



pgsql-general by date:

Previous
From: Lucie Šimečková
Date:
Subject: Proposed Translations of Updated Code of Conduct Policy
Next
From: Mladen Gogala
Date:
Subject: Re: Proposal to introduce a shuffle function to intarray extension