Re: Proposal to introduce a shuffle function to intarray extension - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Proposal to introduce a shuffle function to intarray extension |
Date | |
Msg-id | 4073858.1658027909@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Proposal to introduce a shuffle function to intarray extension (Martin Kalcher <martin.kalcher@aboutsource.net>) |
Responses |
Re: Proposal to introduce a shuffle function to intarray extension
Re: Proposal to introduce a shuffle function to intarray extension |
List | pgsql-hackers |
Martin Kalcher <martin.kalcher@aboutsource.net> writes: > Am 16.07.22 um 23:56 schrieb Thomas Munro: >> I understand that this is a specialised int[] shuffle, but I wonder if >> someone would ever want to have a general array shuffle, and how that >> would work, in terms of naming convention etc. > - I am not shure about the naming. intarray has generic names like > sort() and uniq() and specialised names like icount(). I guess in case > someone wants to have a general array shuffle it could be accomplished > with function overloading. Or am i wrong here? I suppose this is exactly the point Thomas was wondering about: if we use a generic function name for this, will it cause problems for someone trying to add a generic function later? We can investigate that question with a couple of toy functions: regression=# create function foo(int[]) returns text as 'select ''int[] version''' language sql; CREATE FUNCTION regression=# create function foo(anyarray) returns text as 'select ''anyarray version''' language sql; CREATE FUNCTION regression=# select foo('{1,2,3}'); ERROR: function foo(unknown) is not unique LINE 1: select foo('{1,2,3}'); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. OK, that's not too surprising: with an unknown input there's just not anything that the parser can use to disambiguate. But this happens with just about any overloaded name, so I don't think it's a showstopper. regression=# select foo('{1,2,3}'::int[]); foo --------------- int[] version (1 row) regression=# select foo('{1,2,3}'::int8[]); foo ------------------ anyarray version (1 row) Good, that's more or less the minimum functionality we should expect. regression=# select foo('{1,2,3}'::int2[]); ERROR: function foo(smallint[]) is not unique LINE 1: select foo('{1,2,3}'::int2[]); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. Oh, that's slightly annoying ... regression=# select foo('{1,2,3}'::oid[]); foo ------------------ anyarray version (1 row) regression=# select foo('{1,2,3}'::text[]); foo ------------------ anyarray version (1 row) regression=# select foo('{1,2,3}'::float8[]); foo ------------------ anyarray version (1 row) I couldn't readily find any case that misbehaves except for int2[]. You can force that to work, at least in one direction: regression=# select foo('{1,2,3}'::int2[]::int[]); foo --------------- int[] version (1 row) On the whole, I'd vote for calling it shuffle(), and expecting that we'd also use that name for any future generic version. That's clearly the easiest-to-remember definition, and it doesn't seem like the gotchas are bad enough to justify using separate names. > - I added a second function sample(), because it is a lot faster to take > some elements from an array than to shuffle the whole array and > slice it. This function can be removed if it is not wanted. I have no opinion about whether this one is valuable enough to include in intarray, but I do feel like sample() is a vague name, and easily confused with marginally-related operations like TABLESAMPLE. Can we think of a more on-point name? Something like "random_subset" would be pretty clear, but it's also clunky. It's too late here for me to think of le mot juste... regards, tom lane
pgsql-hackers by date: