Hello Tom,
>> BTW, did you look at the question of the range of zipfian?
>
> Yep.
>
>> I confirmed here that as used in the test case, it's generating a range way
>> smaller than the other ones: repeating the insertion snippet 1000x produces
>> stats like this: [...]
>
>> I have no idea whether that indicates an actual bug, or just poor
>> choice of parameter in the test's call. But the very small number
>> of distinct outputs is disheartening at least.
>
> Zipf distribution is highly skewed, somehow close to an exponential. To
> reduce the decreasing probability the parameter must be closer to 1, eg 1.05
> or something. However as far as the test is concerned I do not see this as a
> significant issue. I was rather planning to submit a documentation
> improvement to provide more precise hints about how the distribution behaves
> depending on the parameter, and possibly reduce the parameter used in the
> test in passing, but I see this as not very urgent.
Attached a documentation patch and a scripts to check the distribution
(here for N = 10 & s = 2.5), the kind of thing I used when checking the
initial patch:
sh> psql < zipf_init.sql
sh> pgbench -t 500000 -c 2 -M prepared -f zipf_test.sql -P 1
-- close to 29000 tps on my laptop
sh> psql < zipf_end.sql
┌────┬────────┬────────────────────┬────────────────────────┐
│ i │ cnt │ ratio │ expected │
├────┼────────┼────────────────────┼────────────────────────┤
│ 1 │ 756371 │ • │ • │
│ 2 │ 133431 │ 5.6686302283577280 │ 5.65685424949238019521 │
│ 3 │ 48661 │ 2.7420521567579787 │ 2.7556759606310754 │
│ 4 │ 23677 │ 2.0552012501583816 │ 2.0528009571186693 │
│ 5 │ 13534 │ 1.7494458401063987 │ 1.7469281074217107 │
│ 6 │ 8773 │ 1.5426877920893651 │ 1.5774409656148784 │
│ 7 │ 5709 │ 1.5366964442108951 │ 1.4701680288054869 │
│ 8 │ 4247 │ 1.3442429950553332 │ 1.3963036312159316 │
│ 9 │ 3147 │ 1.3495392437241818 │ 1.3423980299088363 │
│ 10 │ 2450 │ 1.2844897959183673 │ 1.3013488313450120 │
└────┴────────┴────────────────────┴────────────────────────┘
sh> psql < zipf_clean.sql
Given these results, I do not think that it is useful to change
random_zipfian TAP test parameter from 2.5 to something else.
--
Fabien.