Re: pgbench randomness initialization - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench randomness initialization
Date
Msg-id alpine.DEB.2.10.1604071242420.11001@sto
Whole thread Raw
In response to Re: pgbench randomness initialization  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

> If you run the test for longer... Or explicitly iterate over IVs. At the
> very least we need to make pgbench output the IV used, to have some
> chance of repeating tests.

Note that I'm not against providing a way to repeat tests "exactly", and I 
have suggested two means: environment variable and/or option.

> [...] That comparison pretty much invalidates any point you're making, 
> it's that bad.

At least it is simple, if simplistic.

Here is another one: I knew a financial institution which needed to 
evaluate the VAR of exotic financial products every night. They relied on 
MC for that. Alas, it was not converging quickly enough, results were 
unstable, so they took your advice: they froze the seed. Day after day the 
results were mostly the same, the VAR was stable one morning to the other, 
the management is happy, the risks were under control... That was in the 
mid 2000s:-)

>> However, from a stastistical perspective this is just heresy: you may do a
>> change which improves one given run at the expense of all possible others
>> and you would not know it: Say for instance that there are two different
>> behaviors depending on something, then you will check against one of them
>> only.
>
> Meh. That assumes that we're doing a huge number of pgbench runs;

A number of, not necessarily "huge". Or averaging a lot of intermediate 
values and having a hard look at the distribution, not just the final tps 
number.

> but usually people do maybe a handful. Tops. If you're trying to defend 
> against scenarios like that you need to design your tests so that you'll 
> encounter such problems by running longer.

People usually do a lot of things, does not mean that it is "right".

>> So I have no mathematical doubt that changing the seed is the right 
>> default setting, thus I think that the current behavior is fine. 
>> However I'm okay if someone wants to control the randomness for some 
>> reason (maybe having "less sure" results, but quickly), so it could be 
>> allowed somehow.
>
> There might be some statistics arguments,

Yep, there is.

> but I think they're pretty ignoring reality.

Hmmm. If reality wants to ignore mathematics, usually it looses, so this 
will not be with my blessing. Note that as a committer you do not need me 
to freeze the seed. I'm just providing an opinion backed by mathematical 
proofs.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: WIP: Detecting SSI conflicts before reporting constraint violations
Next
From: Simon Riggs
Date:
Subject: Re: PATCH: use foreign keys to improve join estimates v1