Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts - Mailing list pgsql-bugs
From | Dmytro Astapov |
---|---|
Subject | Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts |
Date | |
Msg-id | CAFQUnFhhSbLO-r-xkTnqiDLuJzOxKBLZzMFwKMO1uOUrinH7CQ@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
|
List | pgsql-bugs |
Hi!
QUERY PLAN
---------------------------------------------------------------------------------------------
Hash Join (cost=1.11..3798.30 rows=712 width=16)
Hash Cond: (huge.id = tiny.id)
-> Append (cost=0.00..3683.32 rows=28466 width=8)
-> Append (cost=0.00..3222.91 rows=28181 width=8)
-> Seq Scan on huge huge_1 (cost=0.00..0.00 rows=1 width=8)
Filter: filter_out
-> Seq Scan on huge_partition1 huge_2 (cost=0.00..1541.00 rows=13987 width=8)
Filter: filter_out
-> Seq Scan on huge_partition2 huge_3 (cost=0.00..1541.00 rows=14193 width=8)
Filter: filter_out
-> Append (cost=0.00..33.42 rows=285 width=8)
-> Seq Scan on medium medium_1 (cost=0.00..0.00 rows=1 width=8)
Filter: filter_out
-> Seq Scan on medium_partition1 medium_2 (cost=0.00..16.00 rows=142 width=8)
Filter: filter_out
-> Seq Scan on medium_partition2 medium_3 (cost=0.00..16.00 rows=142 width=8)
Filter: filter_out
-> Hash (cost=1.05..1.05 rows=5 width=8)
-> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8)
/* This query DOES use nested loops as expected */
To quickly address the valid point about non-self-contained bug reports, here is the export of the schema from the DB Fiddle:
/* Huge inheritance-partitioned table */
create table huge(id bigint, filter_out bool);
create table huge_partition1(id bigint, filter_out bool);
create table huge_partition2(id bigint, filter_out bool);
alter table huge_partition1 inherit huge;
alter table huge_partition2 inherit huge;
insert into huge_partition1(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,100000) id;
insert into huge_partition2(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,100000) id;
create index on huge_partition1(id);
create index on huge_partition2(id);
analyze huge_partition1;
analyze huge_partition2;
/* Medium inheritance-partitioned table (same structure, but 100x smaller) */
create table medium(id bigint, filter_out bool);
create table medium_partition1(id bigint, filter_out bool);
create table medium_partition2(id bigint, filter_out bool);
alter table medium_partition1 inherit medium;
alter table medium_partition2 inherit medium;
insert into medium_partition1(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,1000) id;
insert into medium_partition2(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,1000) id;
create index on medium_partition1(id);
create index on medium_partition2(id);
analyze medium_partition1;
analyze medium_partition2;
/* Tiny table of just 5 values */
create table tiny(id bigint);
insert into tiny(id) values (100),(200),(300),(400),(500);
analyze tiny;
/* Views that UNION ALL all non-filtered rows of HUGE and MEDIUM */
create view vw_broken as
select id from huge where filter_out
union all
select id from medium where filter_out;
create view vw_not_broken as
select id,filter_out from (
select id,filter_out from huge
union all
select id,filter_out from medium
) q
where filter_out;
create table huge(id bigint, filter_out bool);
create table huge_partition1(id bigint, filter_out bool);
create table huge_partition2(id bigint, filter_out bool);
alter table huge_partition1 inherit huge;
alter table huge_partition2 inherit huge;
insert into huge_partition1(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,100000) id;
insert into huge_partition2(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,100000) id;
create index on huge_partition1(id);
create index on huge_partition2(id);
analyze huge_partition1;
analyze huge_partition2;
/* Medium inheritance-partitioned table (same structure, but 100x smaller) */
create table medium(id bigint, filter_out bool);
create table medium_partition1(id bigint, filter_out bool);
create table medium_partition2(id bigint, filter_out bool);
alter table medium_partition1 inherit medium;
alter table medium_partition2 inherit medium;
insert into medium_partition1(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,1000) id;
insert into medium_partition2(id, filter_out)
select id, mod(id,7) = 0
from generate_series(1,1000) id;
create index on medium_partition1(id);
create index on medium_partition2(id);
analyze medium_partition1;
analyze medium_partition2;
/* Tiny table of just 5 values */
create table tiny(id bigint);
insert into tiny(id) values (100),(200),(300),(400),(500);
analyze tiny;
/* Views that UNION ALL all non-filtered rows of HUGE and MEDIUM */
create view vw_broken as
select id from huge where filter_out
union all
select id from medium where filter_out;
create view vw_not_broken as
select id,filter_out from (
select id,filter_out from huge
union all
select id,filter_out from medium
) q
where filter_out;
/* This query does NOT use nested loops unexpectedly */
explain select * from tiny join vw_broken on tiny.id = vw_broken.id;
explain select * from tiny join vw_broken on tiny.id = vw_broken.id;
QUERY PLAN
---------------------------------------------------------------------------------------------
Hash Join (cost=1.11..3798.30 rows=712 width=16)
Hash Cond: (huge.id = tiny.id)
-> Append (cost=0.00..3683.32 rows=28466 width=8)
-> Append (cost=0.00..3222.91 rows=28181 width=8)
-> Seq Scan on huge huge_1 (cost=0.00..0.00 rows=1 width=8)
Filter: filter_out
-> Seq Scan on huge_partition1 huge_2 (cost=0.00..1541.00 rows=13987 width=8)
Filter: filter_out
-> Seq Scan on huge_partition2 huge_3 (cost=0.00..1541.00 rows=14193 width=8)
Filter: filter_out
-> Append (cost=0.00..33.42 rows=285 width=8)
-> Seq Scan on medium medium_1 (cost=0.00..0.00 rows=1 width=8)
Filter: filter_out
-> Seq Scan on medium_partition1 medium_2 (cost=0.00..16.00 rows=142 width=8)
Filter: filter_out
-> Seq Scan on medium_partition2 medium_3 (cost=0.00..16.00 rows=142 width=8)
Filter: filter_out
-> Hash (cost=1.05..1.05 rows=5 width=8)
-> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8)
/* This query DOES use nested loops as expected */
explain select * from tiny join vw_not_broken on tiny.id = vw_not_broken.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..151.53 rows=712 width=17)
-> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8)
-> Append (cost=0.00..30.04 rows=6 width=9)
-> Seq Scan on huge (cost=0.00..0.00 rows=1 width=9)
Filter: (filter_out AND (tiny.id = id))
-> Index Scan using huge_partition1_id_idx on huge_partition1 huge_1 (cost=0.29..8.31 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
-> Index Scan using huge_partition2_id_idx on huge_partition2 huge_2 (cost=0.29..8.31 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
-> Seq Scan on medium (cost=0.00..0.00 rows=1 width=9)
Filter: (filter_out AND (tiny.id = id))
-> Index Scan using medium_partition1_id_idx on medium_partition1 medium_1 (cost=0.28..6.69 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
-> Index Scan using medium_partition2_id_idx on medium_partition2 medium_2 (cost=0.28..6.69 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
---------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..151.53 rows=712 width=17)
-> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8)
-> Append (cost=0.00..30.04 rows=6 width=9)
-> Seq Scan on huge (cost=0.00..0.00 rows=1 width=9)
Filter: (filter_out AND (tiny.id = id))
-> Index Scan using huge_partition1_id_idx on huge_partition1 huge_1 (cost=0.29..8.31 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
-> Index Scan using huge_partition2_id_idx on huge_partition2 huge_2 (cost=0.29..8.31 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
-> Seq Scan on medium (cost=0.00..0.00 rows=1 width=9)
Filter: (filter_out AND (tiny.id = id))
-> Index Scan using medium_partition1_id_idx on medium_partition1 medium_1 (cost=0.28..6.69 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
-> Index Scan using medium_partition2_id_idx on medium_partition2 medium_2 (cost=0.28..6.69 rows=1 width=9)
Index Cond: (id = tiny.id)
Filter: filter_out
--
Best regards, Dmytro
On Thu, Dec 7, 2023 at 9:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Summary of the issue: for a (5-row recordset) JOIN (massive partitioned
> recordset indexed by id) USING (id), the (Nested Loop over 5 values)
> strategy is completely ignored, and Hash Join or Merge Join is done instead,
> which does SeqScan over the "massive recordset".
> Reproduction in DB Fiddle:
> https://www.db-fiddle.com/f/sJUUWNgW7pqPWcJwihVoj5/1 (this demonstrates both
> the bad behaviour and a way to work around it)
We are generally not too happy with non-self-contained bug reports.
Once that DB Fiddle entry disappears, this bug report will be useless.
However ...
> 1)The massive recordset on the right side of the JOIN must come from the
> UNION ALL of two parts, both of which have a filter, like this view in my
> reproduction:
> create view vw_broken as
> select id from huge where filter_out
> union all
> select id from medium where filter_out;
I suspect the WHERE clauses trigger the problem because the resulting
sub-selects can't be pulled up to become an "appendrel", per
is_safe_append_member:
* It's only safe to pull up the child if its jointree contains exactly
* one RTE, else the AppendRelInfo data structure breaks. The one base RTE
* could be buried in several levels of FromExpr, however. Also, if the
* child's jointree is completely empty, we can pull up because
* pull_up_simple_subquery will insert a single RTE_RESULT RTE instead.
*
* Also, the child can't have any WHERE quals because there's no place to
* put them in an appendrel. (This is a bit annoying...)
That means the sub-selects will be planned independently and there's
no chance to consider the nestloop-with-inner-indexscan plan you are
hoping for.
This is a longstanding wart, but improving matters would require some
fairly painstaking work. The "appendrel" mechanism is core to both
traditional inheritance and partitioning; I don't recommend trying
to blow it up and start over. I vaguely recall previous discussions
that identified some semantic issues with trying to just attach
WHERE clauses to appendrel members, but it was a long time ago and
the details escape me.
regards, tom lane
pgsql-bugs by date: