I was able to reproduce with this set of data.
create table users (id integer);
create table address (id integer, users_id integer);
insert into users select s from generate_series(1,1000000) s;
insert into address select s, s/2 from generate_series(1,2000000) s;
analyze users;
analyze address;
set max_parallel_workers_per_gather =3D 0;
select count(*)
from users u=20
join address a on (a.users_id =3D u.id)=20
where exists (select 1 from address where users_id =3D u.id);
set max_parallel_workers_per_gather =3D 1;
select count(*)
from users u=20
join address a on (a.users_id =3D u.id)=20
where exists (select 1 from address where users_id =3D u.id);
On 11/29/16, 11:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Michael Day <blake@rcmail.com> writes:
> I have found a nasty bug when using parallel sequential scans with an=
exists clause on postgresql 9.6.1. I have found that the rows returned usi=
ng parallel sequential scan plans are incorrect (though I haven=E2=80=99t dug suff=
iciently to know in what ways). See below for an example of the issue.
=20
Hm, looks like a planner error: it seems to be forgetting that the join
to "address" should be a semijoin. "address" should either be on the
inside of a "Semi" join (as in your first, correct-looking plan) or be
passed through a unique-ification stage such as a HashAgg. Clearly,
neither thing is happening in the second plan.
=20
I couldn't reproduce this in a bit of trying, however. Can you come
up with a self-contained test case?
=20
regards, tom lane
=20