Thread: random generated string matching index in inexplicable ways
PROBLEM: Strings or characters generated by any random function (including pg_crypto gen_random_bytes) are matching a string/char index in surprising ways. Reduced down to its simplest example: -- use random 0 or 1, plus 65, to get 'A' or 'B' # SELECT chr(round(random())::int + 65); chr ----- B (1 row) # SELECT chr(round(random())::int + 65); chr ----- A (1 row) -- simple table for matching: CREATE TABLE x( y char(1) primary key ); INSERT INTO x(y) VALUES ('A'); INSERT INTO x(y) VALUES ('B'); -- if I query 'A' or 'B' it works as expected # SELECT y FROM x WHERE y = 'A'; y --- A (1 row) # SELECT y FROM x WHERE y = 'B'; y --- B (1 row) -- if we use random-generated 'A' or 'B', things get inexplicable # SELECT y FROM x WHERE y = chr(round(random())::int + 65); y --- A B (2 rows) # SELECT y FROM x WHERE y = chr(round(random())::int + 65); y --- (0 rows) # SELECT y FROM x WHERE y = chr(round(random())::int + 65); y --- B (1 row) I've been wrestling with this for hours. Any suggestions? Thank you.
> On 7 May 2019, at 13:53, Myles Miller <pg@q7r7.com> wrote: > > PROBLEM: > Strings or characters generated by any random function (including pg_crypto > gen_random_bytes) are matching a string/char index in surprising ways. I fail to see anything surprising in your examples. > Reduced down to its simplest example: (…) > -- if we use random-generated 'A' or 'B', things get inexplicable > > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > y > --- > A > B > (2 rows) Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in the higherrange for value ‘B’, a match again, so you get 2 rows. > > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > y > --- > (0 rows) Here you got a random value in the higher range for the record with value ‘A’, so no match, and one in the lower range forvalue ‘B’, no match again, so you get 0 rows. > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > y > --- > B > (1 row) Here you got two random values in the higher range, so only the row with ‘B’ matches. You could also get two random values in the lower range and only get a match against ‘A’. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
> > # SELECT y FROM x WHERE y = chr(round(random())::int + 65); > > y > > --- > > A > > B > > (2 rows) > Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in thehigher range for value ‘B’, a match again, so you get 2 rows. No, the function is returning just one letter, either 'A' or 'B', not multiple values. # SELECT * FROM chr(round(random())::int + 65); chr ----- B (1 row) # SELECT * FROM chr(round(random())::int + 65); chr ----- A (1 row)
On Tue, May 07, 2019 at 12:17:12PM +0000, Daniel Perrett wrote: > The WHERE expression (and therefore the random function) is being evaluated once per row, not once per query. > If you run this query a few times, you will see the problem: > SELECT y, chr(round(random())::int + 65) FROM (SELECT 'A' UNION SELECT 'B') x(y); Ah! Thank you! I understand now.
On Tue, May 7, 2019 at 3:12 PM Myles Miller <pg@q7r7.com> wrote: > No, the function is returning just one letter, either 'A' or 'B', not multiple values. Your random function is being evaluated ONCE FOR EACH ROW. i.e, it's doing something like: select y from ( SELECT y, chr(round(random())::int + 65) as z FROM x ) aux WHERE y = z; o with aux as ( SELECT y, chr(round(random())::int + 65) as z FROM x ) select y from aux WHERE y = z; You may want to try something like this: with aux as ( SELECT chr(round(random())::int + 65) as z ) select y from x,aux WHERE y = z; ( Untested, but that's the idea ) Francisco Olarte