Thread: random generated string matching index in inexplicable ways

random generated string matching index in inexplicable ways

From
Myles Miller
Date:
PROBLEM:
Strings or characters generated by any random function (including pg_crypto
gen_random_bytes) are matching a string/char index in surprising ways.

Reduced down to its simplest example:

-- use random 0 or 1, plus 65, to get 'A' or 'B'

# SELECT chr(round(random())::int + 65);
chr 
-----
B
(1 row)

# SELECT chr(round(random())::int + 65);
chr 
-----
A
(1 row)


-- simple table for matching:
CREATE TABLE x( y char(1) primary key );
INSERT INTO x(y) VALUES ('A');
INSERT INTO x(y) VALUES ('B');


-- if I query 'A' or 'B' it works as expected

# SELECT y FROM x WHERE y = 'A';
y
---
A
(1 row)

# SELECT y FROM x WHERE y = 'B';
                                                           
 
y
---
B
(1 row)



-- if we use random-generated 'A' or 'B', things get inexplicable

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y 
---
A
B
(2 rows)

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y 
---
(0 rows)

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y 
---
B
(1 row)



I've been wrestling with this for hours. Any suggestions? Thank you.




Re: random generated string matching index in inexplicable ways

From
Alban Hertroys
Date:
> On 7 May 2019, at 13:53, Myles Miller <pg@q7r7.com> wrote:
>
> PROBLEM:
> Strings or characters generated by any random function (including pg_crypto
> gen_random_bytes) are matching a string/char index in surprising ways.

I fail to see anything surprising in your examples.

> Reduced down to its simplest example:

(…)

> -- if we use random-generated 'A' or 'B', things get inexplicable
>
> # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> y
> ---
> A
> B
> (2 rows)

Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in the
higherrange for value ‘B’, a match again, so you get 2 rows. 

>
> # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> y
> ---
> (0 rows)

Here you got a random value in the higher range for the record with value ‘A’, so no match, and one in the lower range
forvalue ‘B’, no match again, so you get 0 rows. 


> # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> y
> ---
> B
> (1 row)

Here you got two random values in the higher range, so only the row with ‘B’ matches.

You could also get two random values in the lower range and only get a match against ‘A’.

Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: random generated string matching index in inexplicable ways

From
Myles Miller
Date:
> > # SELECT y FROM x WHERE y = chr(round(random())::int + 65);
> > y 
> > ---
> > A
> > B
> > (2 rows)
> Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in
thehigher range for value ‘B’, a match again, so you get 2 rows.
 

No, the function is returning just one letter, either 'A' or 'B', not multiple values.

# SELECT * FROM chr(round(random())::int + 65);
 chr
-----
 B
(1 row)

# SELECT * FROM chr(round(random())::int + 65);
 chr
-----
 A
(1 row)





Re: random generated string matching index in inexplicable ways [EXT]

From
Myles Miller
Date:
On Tue, May 07, 2019 at 12:17:12PM +0000, Daniel Perrett wrote:
> The WHERE expression (and therefore the random function) is being evaluated once per row, not once per query.
> If you run this query a few times, you will see the problem:
> SELECT y, chr(round(random())::int + 65) FROM (SELECT 'A' UNION SELECT 'B') x(y);

Ah! Thank you! I understand now.




Re: random generated string matching index in inexplicable ways

From
Francisco Olarte
Date:
On Tue, May 7, 2019 at 3:12 PM Myles Miller <pg@q7r7.com> wrote:
> No, the function is returning just one letter, either 'A' or 'B', not multiple values.

Your random function is being evaluated ONCE FOR EACH ROW.

i.e, it's doing something like:
select y from
   ( SELECT y, chr(round(random())::int + 65) as z FROM x ) aux
WHERE y = z;
 o
with aux as ( SELECT y, chr(round(random())::int + 65) as z FROM x )
select y from aux WHERE y = z;

You may want to try something like this:

with aux as ( SELECT chr(round(random())::int + 65) as z )
select y from x,aux WHERE y = z;

( Untested, but that's the idea )

Francisco Olarte