On Tue, Dec 10, 2019 at 02:49:42PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> As for the performance impact, I did this:
>
>> create table dim (id int, val text);
>> insert into dim select i, md5(i::text) from generate_series(1,1000000) s(i);
>> create table fact (id int, val text);
>> insert into fact select mod(i,1000000)+1, md5(i::text) from generate_series(1,25000000) s(i);
>> set max_parallel_workers_per_gather = 0;
>> select count(*) from fact join dim using (id);
>
>> So a perfectly regular join between 1M and 25M table. On my machine,
>> this takes ~8851ms on master and 8979ms with the patch (average of about
>> 20 runs with minimal variability). That's ~1.4% regression, so a bit
>> more than the 0.4% mentioned before. Not a huge difference though, and
>> some of it might be due to different binary layout etc.
>
>Hmm ... I replicated this experiment here, using my usual precautions
>to get more-or-less-reproducible numbers [1]. I concur that the
>patch seems to be slower, but only by around half a percent on the
>median numbers, which is much less than the run-to-run variation.
>
Sounds good.
>So that would be fine --- except that in my first set of runs,
>I forgot the "set max_parallel_workers_per_gather" step and hence
>tested this same data set with a parallel hash join. And in that
>scenario, I got a repeatable slowdown of around 7.5%, which is far
>above the noise floor. So that's not good --- why does this change
>make PHJ worse?
>
Hmmm, I can't reproduce this. For me the timings from 20 runs look like
this:
master | patched
workers=2 workers=5 | workers=2 workers=5
-------------------------|--------------------------
3153 1785 | 3185 1790
3167 1783 | 3220 1783
I haven't done the extra steps with cpupower/taskset, but the numbers
seem pretty consistent. I'll try on another machine tomorrow.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services