Thread: Plan not skipping unnecessary inner join
I noticed something peculiar while optimizing complex views today. The query planner does not skip inner joins that, to myunderstanding, can have no impact on the result. Am I missing a situation where these joins could impact the result? The following demonstrates the problem without the complex views. It also demonstrates how the planner simplifies a LEFTJOIN in the same situation. The left and right sides of an inner join could be swapped, obviously, but here I kept theunique constraint on the right. CREATE TABLE foo ( id INTEGER PRIMARY KEY ); CREATE TABLE bar ( foo_id INTEGER NOT NULL REFERENCES foo ); -- This simplifies to SELECT COUNT(*) FROM bar; EXPLAIN SELECT COUNT(*) FROM bar LEFT JOIN foo ON bar.foo_id = foo.id; -- This should simplify to SELECT COUNT(*) FROM bar WHERE foo_id IS NOT NULL; -- The presence of a NOT NULL constraint on foo_id has no effect. EXPLAIN SELECT COUNT(*) FROM bar INNER JOIN foo ON bar.foo_id = foo.id; QUERY PLAN ------------------------------------------------------------- Aggregate (cost=38.25..38.26 rows=1 width=8) -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=0) (2 rows) QUERY PLAN ------------------------------------------------------------------------- Aggregate (cost=111.57..111.58 rows=1 width=8) -> Hash Join (cost=67.38..105.92 rows=2260 width=0) Hash Cond: (bar.foo_id_not_null = foo.id) -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4) -> Hash (cost=35.50..35.50 rows=2550 width=4) -> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (6 rows) version ------------------------------------------------------------------------------------------------------------------- PostgreSQL 12.2 on x86_64-apple-darwin19.4.0, compiled by Apple clang version 11.0.3 (clang-1103.0.32.59), 64-bit (1 row)
> Am I missing a situation where these joins could impact the result? Yes it will impact the number of rows in the result. for example if foo is empty then postgres is required to return no results,regardless of how many rows are in bar. This is why it can ignore the table in the left join — David > On 14 May 2020, at 1:44 pm, Matthew Nelson <postgres@binarykeep.com> wrote: > > I noticed something peculiar while optimizing complex views today. The query planner does not skip inner joins that, tomy understanding, can have no impact on the result. Am I missing a situation where these joins could impact the result? > > The following demonstrates the problem without the complex views. It also demonstrates how the planner simplifies a LEFTJOIN in the same situation. The left and right sides of an inner join could be swapped, obviously, but here I kept theunique constraint on the right. > > > > CREATE TABLE foo ( > id INTEGER PRIMARY KEY > ); > > CREATE TABLE bar ( > foo_id INTEGER NOT NULL REFERENCES foo > ); > > -- This simplifies to SELECT COUNT(*) FROM bar; > EXPLAIN SELECT COUNT(*) > FROM bar > LEFT JOIN foo ON bar.foo_id = foo.id; > > -- This should simplify to SELECT COUNT(*) FROM bar WHERE foo_id IS NOT NULL; > -- The presence of a NOT NULL constraint on foo_id has no effect. > EXPLAIN SELECT COUNT(*) > FROM bar > INNER JOIN foo ON bar.foo_id = foo.id; > > > > QUERY PLAN > ------------------------------------------------------------- > Aggregate (cost=38.25..38.26 rows=1 width=8) > -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=0) > (2 rows) > > QUERY PLAN > ------------------------------------------------------------------------- > Aggregate (cost=111.57..111.58 rows=1 width=8) > -> Hash Join (cost=67.38..105.92 rows=2260 width=0) > Hash Cond: (bar.foo_id_not_null = foo.id) > -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4) > -> Hash (cost=35.50..35.50 rows=2550 width=4) > -> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) > (6 rows) > > version > ------------------------------------------------------------------------------------------------------------------- > PostgreSQL 12.2 on x86_64-apple-darwin19.4.0, compiled by Apple clang version 11.0.3 (clang-1103.0.32.59), 64-bit > (1 row) > >
But foo cannot be empty given the foreign key constraint on bar.foo_id. I suppose that explains the planner's behavior. While today's planner can take a shortcut when it finds a unique constraint,it does not check for foreign key constraints matching the join condition? So the LEFT JOIN simplification ispossible but not the INNER JOIN simplification. Matthew On Wed, May 13, 2020, at 11:45 PM, David Wheeler wrote: > > Am I missing a situation where these joins could impact the result? > > Yes it will impact the number of rows in the result. for example if foo > is empty then postgres is required to return no results, regardless of > how many rows are in bar. This is why it can ignore the table in the > left join > > — David > > > On 14 May 2020, at 1:44 pm, Matthew Nelson <postgres@binarykeep.com> wrote: > > > > I noticed something peculiar while optimizing complex views today. The query planner does not skip inner joins that,to my understanding, can have no impact on the result. Am I missing a situation where these joins could impact the result? > > > > The following demonstrates the problem without the complex views. It also demonstrates how the planner simplifies a LEFTJOIN in the same situation. The left and right sides of an inner join could be swapped, obviously, but here I kept theunique constraint on the right. > > > > > > > > CREATE TABLE foo ( > > id INTEGER PRIMARY KEY > > ); > > > > CREATE TABLE bar ( > > foo_id INTEGER NOT NULL REFERENCES foo > > ); > > > > -- This simplifies to SELECT COUNT(*) FROM bar; > > EXPLAIN SELECT COUNT(*) > > FROM bar > > LEFT JOIN foo ON bar.foo_id = foo.id; > > > > -- This should simplify to SELECT COUNT(*) FROM bar WHERE foo_id IS NOT NULL; > > -- The presence of a NOT NULL constraint on foo_id has no effect. > > EXPLAIN SELECT COUNT(*) > > FROM bar > > INNER JOIN foo ON bar.foo_id = foo.id; > > > > > > > > QUERY PLAN > > ------------------------------------------------------------- > > Aggregate (cost=38.25..38.26 rows=1 width=8) > > -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=0) > > (2 rows) > > > > QUERY PLAN > > ------------------------------------------------------------------------- > > Aggregate (cost=111.57..111.58 rows=1 width=8) > > -> Hash Join (cost=67.38..105.92 rows=2260 width=0) > > Hash Cond: (bar.foo_id = foo.id) > > -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=4) > > -> Hash (cost=35.50..35.50 rows=2550 width=4) > > -> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) > > (6 rows) > > > > version > > ------------------------------------------------------------------------------------------------------------------- > > PostgreSQL 12.2 on x86_64-apple-darwin19.4.0, compiled by Apple clang version 11.0.3 (clang-1103.0.32.59), 64-bit > > (1 row) > > > > > >
I redid the same tests with vanila postgres and with empty tables.
I'm surprised, why does the plan have 2550 rows in explain?
regards,
Ranier Vilela
On Sun, May 17, 2020 at 09:32:47AM -0300, Ranier Vilela wrote: > I redid the same tests with vanila postgres and with empty tables. > I'm surprised, why does the plan have 2550 rows in explain? That's the *estimated* rowcount. The planner tends to ignore table statistics which say the table is empty, since that can lead to a terrible plan if it's not true (stats are out of date or autovacuum threshold not hit). See also here https://www.postgresql.org/message-id/20171110204043.GS8563%40telsasoft.com -- Justin
Em dom., 17 de mai. de 2020 às 10:31, Justin Pryzby <pryzby@telsasoft.com> escreveu:
On Sun, May 17, 2020 at 09:32:47AM -0300, Ranier Vilela wrote:
> I redid the same tests with vanila postgres and with empty tables.
> I'm surprised, why does the plan have 2550 rows in explain?
That's the *estimated* rowcount.
The planner tends to ignore table statistics which say the table is empty,
since that can lead to a terrible plan if it's not true (stats are out of date
or autovacuum threshold not hit).
Thanks for the explanation.
regards,
Ranier Vilela