FWIW I tried running the benchmarks again, with some minor changes in
the extension code - most importantly, the time is counted in microsecs
(instead of milisecs).
I suspected the rounding might have been causing some rounding errors
(essentially not counting anything below 1ms, because it rounds to 0),
and the results are a bit different.
On the i5-2500k machine it's an improvement across the board, while on
the bigger Xeon e5-2620v3 machine it shows roughly the same regression
for the "decreasing" allocation pattern.
There's another issue in the benchmarking script - the queries are meant
to do multiple runs for each combination of parameters, but it's written
in a way that simply runs it once and then does cross product with the
generate_sequence(1,5). I'll look into fixing that, but judging by the
stability of results for similar chunk sizes it won't change much.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company