Re: Recursive CTEs and randomness - is there something I'm missing? - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Recursive CTEs and randomness - is there something I'm missing? |
Date | |
Msg-id | f3d0e5dc-33fc-0ec1-caa7-d30f042a02bc@aklaver.com Whole thread Raw |
In response to | Re: Recursive CTEs and randomness - is there something I'm missing? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
On 2/28/20 9:45 AM, Tom Lane wrote: > =?UTF-8?B?UMOzbCBVYSBMYW/DrW5lY2jDoWlu?= <linehanp@tcd.ie> writes: >> The SQL: > >> WITH RECURSIVE rand (num, md, a_2_s) AS >> ( >> SELECT >> 1, >> MD5(RANDOM()::TEXT), >> ARRAY_TO_STRING(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) >> FROM GENERATE_SERIES(1, 5)), '') >> UNION >> SELECT num + 1, >> MD5(RANDOM()::TEXT), >> ARRAY_TO_STRING(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) >> FROM GENERATE_SERIES(1, 5)), '') >> FROM rand >> WHERE num < 5 >> ) >> SELECT * FROM rand; > >> A typical result is shown below: > >> 1 974ee059a1902e5ca1ec73c91275984b GYXYS >> 2 6cf5a974d5859eae23cdb9c310e3a3bf YFDPT >> 3 fa6be95eb720fe6f80c7c8fb6ba11171 YFDPT >> 4 fa54913b0bb43de0025b153fd71a5334 YFDPT >> 5 523fab9bdc6c4c51a89e0d901273fb69 YFDPT > >> The fact that the last 4 are identical is not a coincidence. If I put >> 100 in the GENERATE_SERIES, I still get the same result, the first and >> second records are different, but ALL subsequent instances of the >> ARRAY_TO_STRING are identical! > > Yeah, it's weird. A look at EXPLAIN VERBOSE offers some insight: > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > CTE Scan on rand (cost=4.75..5.37 rows=31 width=68) > Output: rand.num, rand.md, rand.a_2_s > CTE rand > -> Recursive Union (cost=0.13..4.75 rows=31 width=68) > -> Result (cost=0.13..0.15 rows=1 width=68) > Output: 1, md5((random())::text), array_to_string($1, ''::text) > InitPlan 1 (returns $1) > -> Function Scan on pg_catalog.generate_series (cost=0.00..0.13 rows=5 width=32) > Output: chr((('65'::double precision + round((random() * '25'::double precision))))::integer) > Function Call: generate_series(1, 5) > -> WorkTable Scan on rand rand_1 (cost=0.13..0.40 rows=3 width=68) > Output: (rand_1.num + 1), md5((random())::text), array_to_string($2, ''::text) > Filter: (rand_1.num < 5) > InitPlan 2 (returns $2) > -> Function Scan on pg_catalog.generate_series generate_series_1 (cost=0.00..0.13 rows=5 width=32) > Output: chr((('65'::double precision + round((random() * '25'::double precision))))::integer) > Function Call: generate_series(1, 5) > (17 rows) > > The ARRAY sub-selects are being done as initplans, not subplans, > which means they're only done once not once per row. This is correct > so far as the planner is concerned because those sub-selects are > "uncorrelated", ie they use no values from the outer query. > > There is room to argue that because the sub-selects contain volatile > functions, they ought not be optimized into initplans. We have > traditionally not considered that, however, and I'm afraid that a > lot of people have written queries that depend on it. For example, > there's lore out there that replacing > WHERE mycol < random() > with > WHERE mycol < (SELECT random()) > will freeze the random() result as a single value rather than > computing a new value for each row, which sometimes is what you > need. These days, better practice would be to put the random() > call in a CTE, but there's still a lot of code out there that > does it as above. > > For your immediate problem, since you don't care that much > (I suppose) about exactly how the strings are generated, you > could fix the issue by making the sub-selects depend on > "num" somehow. Or possibly there's a way to do it without > a sub-select. On the whole this looks like a mighty expensive > way to generate a random string, so I'd be inclined to look > for other implementations. An off the cuff Python solution: CREATE OR REPLACE FUNCTION public.upper_random() RETURNS character varying LANGUAGE plpythonu AS $function$ from string import ascii_uppercase as au import random return ''.join(random.sample(au,5)) $function$ SELECT num, MD5(RANDOM()::TEXT), upper_random() FROM generate_series(1, 5) AS t(num); num | md5 | upper_random -----+----------------------------------+-------------- 1 | 5896a9e3efa53027873d7999e58904ae | TRGEB 2 | 9f9677c32a64b6eae73759a69e1acfff | TFQHG 3 | 5aefda5b498215065e01ba697d79caee | KYBZS 4 | 1605b7fa54fef9bdc5c49f9b79810e07 | EUDML 5 | 4ba59880c1c67bca1d1f184bda5350b6 | RFPGO > > regards, tom lane > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: