PSQL does not remove obvious useless joins - Mailing list pgsql-general

From Sfiligoi, Igor
Subject PSQL does not remove obvious useless joins
Date
Msg-id 11675d0d21d14c2898d79736eb07279e@ASGEXCPWP06.ga.com
Whole thread Raw
Responses Re: PSQL does not remove obvious useless joins  (Merlin Moncure <mmoncure@gmail.com>)
Re: PSQL does not remove obvious useless joins  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-general

Hello.

 

We have a view that is very generic, and we noticed that PostgreSQL is not very good at removing useless joins, which makes our queries very slow.

We could change our code to avoid the view and write ad-hoc queries to the underlying tables, but would prefer not to, if there is a way around it.

(BTW: We are currently using psql 9.4)

 

Here is a simplified implementation:

# create table a (id int primary key, name varchar(128));

# create table b (id int primary key, name varchar(128));

# create table c (id int primary key, a_id int references a(id), b1_id int references b(id), b2_id int references b(id), b3_id int references b(id));

# create view v as select c.id, c.a_id, c.b1_id, c.b2_id , c.b3_id, a.name a_name, b1.name b1_name, b2.name b2_name, b3.name b3_name from c,  a, b b1, b b2, b b3 where c.a_id=a.id and c.b1_id=b1.id and c.b2_id=b2.id and c.b3_id=b3.id;

 

When I try to get just info from tables c and b1:

# select id, b1_name from v

it still does all the joins (see below).

 

I would expect just one join (due to the request of columns from the two tables),

since all joins are on foreign constrains referencing primary keys,

there are no filters on the other tables,
so it is guaranteed that the useless joins will always return exactly one answer.

 

Is there a way to tweak the PostgreSQL optimizer to do the proper join removal during the planning?

Perhaps tweaking somehow either our schema or our queries (while still keeping a generic view)?

 

Thank you,

  Igor Sfiligoi

 

 

 

# explain select id, b1_name from v;

                                   QUERY PLAN                                  

--------------------------------------------------------------------------------

Nested Loop  (cost=1.02..5.45 rows=1 width=6)

   Join Filter: (c.b3_id = b3.id)

   ->  Nested Loop  (cost=1.02..4.32 rows=1 width=10)

         Join Filter: (c.a_id = a.id)

         ->  Nested Loop  (cost=1.02..3.25 rows=1 width=14)

               Join Filter: (c.b2_id = b2.id)

               ->  Hash Join  (cost=1.02..2.12 rows=1 width=18)

                     Hash Cond: (b1.id = c.b1_id)

                     ->  Seq Scan on b b1  (cost=0.00..1.06 rows=6 width=6)

                     ->  Hash  (cost=1.01..1.01 rows=1 width=20)

                           ->  Seq Scan on c  (cost=0.00..1.01 rows=1 width=20)

               ->  Seq Scan on b b2  (cost=0.00..1.06 rows=6 width=4)

         ->  Seq Scan on a  (cost=0.00..1.03 rows=3 width=4)

   ->  Seq Scan on b b3  (cost=0.00..1.06 rows=6 width=4)

(14 rows)

 

PS: The tables were very small in this example, but are quite big in the production environment.

 

pgsql-general by date:

Previous
From: Chris Lewis
Date:
Subject: Re: Log archiving failing. Seems to be wrong timeline
Next
From: shankha
Date:
Subject: Re: Update multiple rows in a table with different values