On Fri, Nov 2, 2018 at 2:29 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Nov 2, 2018 at 2:07 PM Andrew Gierth
> <andrew@tao11.riddles.org.uk> wrote:
> > >>>>> "Paul" == Paul Schaap <ps@ipggroup.com> writes:
> > Paul> Hi Andrew,
> > Paul> Bingo, set enable_parallel_hash=false; gets a correct result
> > Paul> whereas set enable_parallel_hash=true; gets 0.
> >
> > Paul> Yes I might have reversed some of the explains, my excuse its
> > Paul> Friday and I went to bed late and am burnt out today :-)
> >
> > Are all the values of the my_citext column actually null?
>
> Thanks for the report Paul and the analysis Andrew. Discussed with
> Andrew a bit on IRC. Summary: multi-batch left joins are not handling
> NULLs correctly in the left table when partitioning. Looking into
> this now.
Here's a repro.
create table r as select generate_series(1, 1000000) i, null::int j;
update r set j = i where i <= 10;
create table s as select generate_series(1, 1000000) i;
analyze;
select count(*), count(r.j) from r left join s on r.j = s.i;
Unpatched master gives me a 16 batch Parallel Hash Join with the
incorrect answer:
count | count
-------+-------
10 | 10
With the attached patch the answer is correct:
count | count
---------+-------
1000000 | 10
The brown-paper-bag level fix is:
- false, /* outer join, currently unsupported */
+ HJ_FILL_OUTER(hjstate),
It is right and full outer joins that are not yet supported by
Parallel Hash Join. Left outer joins *are* supported. The effect of that
thinko is to make them behave like inner joins (but only in multi-batch
joins, ie when work_mem is exceeded).
--
Thomas Munro
http://www.enterprisedb.com