Thread: some random() clarification needed
Hi,
when, in psql, on a postgres 12.3, I write:
select ceiling(random()*2582);
it does provide the expected answer, ie. a number between 1 and 2582, inclusive.
allways.
when I decide to use this to get a random row within a table prenoms having 2 columns
a id serial, and a prenom varchar, with explicitly 2582 lines, no gaps,
I write:
select id, prenom from prenoms where id=ceiling(random()*2582);
expecting to get, allways, one line.
But its not the case.
around 15% of time I get 0 lines which is already quite strange to me.
but 10% of time, I get a random number of lines, until now up to 4.
even weirder (to me !)
so, can someone please clarify ?
thanks,
regards,
On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com> wrote:
select id, prenom from prenoms where id=ceiling(random()*2582);expecting to get, allways, one line.But its not the case.around 15% of time I get 0 lines which is already quite strange to me.but 10% of time, I get a random number of lines, until now up to 4.even weirder (to me !)so, can someone please clarify ?
You are basically asking:
For each row in my table compare the id to some random number and if they match return that row, otherwise skip it. The random number being compared to is different for each row because random() is volatile and thus evaluated for each row.
David J.
Ok, thanks for the clarification.
On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com> wrote:select id, prenom from prenoms where id=ceiling(random()*2582);expecting to get, allways, one line.But its not the case.around 15% of time I get 0 lines which is already quite strange to me.but 10% of time, I get a random number of lines, until now up to 4.even weirder (to me !)so, can someone please clarify ?You are basically asking:For each row in my table compare the id to some random number and if they match return that row, otherwise skip it. The random number being compared to is different for each row because random() is volatile and thus evaluated for each row.David J.
Hi,
your answer helps me understand my first problem.
so, I rewrote a simple loop so as to avoid the "volatile" behaviour.
(at least I was thinking I did... looks like I was wrong !)
step by step loop:
DO $$
BEGIN
FOR counter IN 1..1000 LOOP
begin
declare
id1 integer =ceiling(random()*2582);
id3 date= '2000-01-01';
id2 date;
pren varchar;
begin
id2=id3 + (random()*7200)::integer;
SELECT prenom FROM prenoms WHERE id=id1 into pren;
INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
end;
end;
END LOOP;
END; $$;
BEGIN
FOR counter IN 1..1000 LOOP
begin
declare
id1 integer =ceiling(random()*2582);
id3 date= '2000-01-01';
id2 date;
pren varchar;
begin
id2=id3 + (random()*7200)::integer;
SELECT prenom FROM prenoms WHERE id=id1 into pren;
INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
end;
end;
END LOOP;
END; $$;
I truncated the table, executed the loop with no errors, and expected that a select count(*)
may answer 1000 !
no.
it varies, from less than 1000 (much less, something like 900)
and more than 1000 (up to 1094)
so... what s "volatile" in the loop ?
BTW the testparttransac table is partitioned on datenaissance, with a default partition.
thanks,
regards,
On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com> wrote:select id, prenom from prenoms where id=ceiling(random()*2582);expecting to get, allways, one line.But its not the case.around 15% of time I get 0 lines which is already quite strange to me.but 10% of time, I get a random number of lines, until now up to 4.even weirder (to me !)so, can someone please clarify ?You are basically asking:For each row in my table compare the id to some random number and if they match return that row, otherwise skip it. The random number being compared to is different for each row because random() is volatile and thus evaluated for each row.David J.
Please don't top-post. Inline (with trim) is better but at minimum bottom-post.
On Tue, Jul 14, 2020 at 9:01 AM Marc Millas <marc.millas@mokadb.com> wrote:
Hi,your answer helps me understand my first problem.so, I rewrote a simple loop so as to avoid the "volatile" behaviour.(at least I was thinking I did... looks like I was wrong !)step by step loop:DO $$
BEGIN
FOR counter IN 1..1000 LOOP
begin
declare
id1 integer =ceiling(random()*2582);
id3 date= '2000-01-01';
id2 date;
pren varchar;
begin
id2=id3 + (random()*7200)::integer;
SELECT prenom FROM prenoms WHERE id=id1 into pren;
INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren);
end;
end;
END LOOP;
END; $$;I truncated the table, executed the loop with no errors, and expected that a select count(*)may answer 1000 !no.it varies, from less than 1000 (much less, something like 900)and more than 1000 (up to 1094)so... what s "volatile" in the loop ?
Everything...
You are setting id1 to the result of an expression inside the loop. Everytime that statement gets executed within the loop a new random number is produced.
I mean, even "id 3 date = '2000-01-01'" is repeatedly casting (I think) the string to a date and assigning it to the variable even though that statement overall is effectively immutable.
David J.
On 7/14/20 9:01 AM, Marc Millas wrote: > Hi, > your answer helps me understand my first problem. > so, I rewrote a simple loop so as to avoid the "volatile" behaviour. > (at least I was thinking I did... looks like I was wrong !) > step by step loop: > DO $$ > BEGIN > FOR counter IN 1..1000 LOOP > begin > declare > id1 integer =ceiling(random()*2582); > id3 date= '2000-01-01'; > id2 date; > pren varchar; > begin > id2=id3 + (random()*7200)::integer; > SELECT prenom FROM prenoms WHERE id=id1 into pren; > INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren); > end; > end; > END LOOP; > END; $$; > > I truncated the table, executed the loop with no errors, and expected > that a select count(*) > may answer 1000 ! > no. > it varies, from less than 1000 (much less, something like 900) > and more than 1000 (up to 1094) > > so... what s "volatile" in the loop ? I think it has more to do with the structure of the function. Not sure how all those nested BEGINs interact, but when I simplify the above to: DO $$ DECLARE id1 integer =ceiling(random()*2582); id3 date= '2000-01-01'; id2 date; pren varchar = 'test'; BEGIN FOR counter IN 1..1000 LOOP id2 = id3 + (random()*7200)::integer; INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren); END LOOP; END; $$; I get 1000 rows each time I TRUNCATE testparttransac and then run above and repeat. > > BTW the testparttransac table is partitioned on datenaissance, with a > default partition. > > thanks, > regards, > > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com <http://www.mokadb.com> > > > > On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston > <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote: > > On Tue, Jul 14, 2020 at 8:15 AM Marc Millas <marc.millas@mokadb.com > <mailto:marc.millas@mokadb.com>> wrote: > > select id, prenom from prenoms where id=ceiling(random()*2582); > > expecting to get, allways, one line. > But its not the case. > around 15% of time I get 0 lines which is already quite strange > to me. > but 10% of time, I get a random number of lines, until now up to 4. > even weirder (to me !) > > so, can someone please clarify ? > > > You are basically asking: > > For each row in my table compare the id to some random number and if > they match return that row, otherwise skip it. The random number > being compared to is different for each row because random() is > volatile and thus evaluated for each row. > > David J. > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, 15 Jul 2020 at 04:01, Marc Millas <marc.millas@mokadb.com> wrote: > your answer helps me understand my first problem. > so, I rewrote a simple loop so as to avoid the "volatile" behaviour. Not sure what you're trying to do with the plpgsql, but you can just escape the multiple evaluations by putting the volatile function in a sub-select with no FROM clause. SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582)); Or the more traditional way to get a random row is: SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1; David
Thanks!
makes it clearer :-)
its not that obvious to guess the consequences of the "volatile" behaviour.
regards,
On Wed, Jul 15, 2020 at 1:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 15 Jul 2020 at 04:01, Marc Millas <marc.millas@mokadb.com> wrote:
> your answer helps me understand my first problem.
> so, I rewrote a simple loop so as to avoid the "volatile" behaviour.
Not sure what you're trying to do with the plpgsql, but you can just
escape the multiple evaluations by putting the volatile function in a
sub-select with no FROM clause.
SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582));
Or the more traditional way to get a random row is:
SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1;
David