Thread: lost "left join"
Hi, all. I got some mystic behaviour of PostgreSQL, perhaps this is a bug. Tables is a model of a simple graph where parents and children related as «many to many». And there is a field «valid» tovalidate for current moment. set search_path = 'left_join'; begin; drop schema if exists left_join cascade; create schema left_join; create table node ( node_id integer primary key generated always as identity, valid daterange not null ); create table link ( parent integer not null references node(node_id), child integer not null references node(node_id), valid daterange not null, primary key (parent,child) ); insert into node (node_id,valid) values (default,daterange('2019-01-01','2020-01-01')); commit; Now I want to get a one node and all children, if they exists: => explain select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_idwhere parent.node_id=1; QUERY PLAN ------------------------------------------------------------------------------------------ Nested Loop Left Join (cost=4.50..26.76 rows=1 width=112) -> Nested Loop Left Join (cost=4.35..21.91 rows=1 width=76) Join Filter: (parent.node_id = link.parent) -> Index Scan using node_pkey on node parent (cost=0.15..8.17 rows=1 width=36) Index Cond: (node_id = 1) -> Bitmap Heap Scan on link (cost=4.20..13.67 rows=6 width=40) Recheck Cond: (parent = 1) -> Bitmap Index Scan on link_pkey (cost=0.00..4.20 rows=6 width=0) Index Cond: (parent = 1) -> Index Scan using node_pkey on node child (cost=0.15..4.84 rows=1 width=36) Index Cond: (link.child = node_id) (11 rows) All fine, there is «Left Join» in the planner. And the result is exactly what I want: => select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_idwhere parent.node_id=1; node_id | valid | parent | child | valid | node_id | valid ---------+-------------------------+--------+-------+-------+---------+------- 1 | [2019-01-01,2020-01-01) | | | | | (1 row) But things begin be strange if I add validation by time. => explain select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_idwhere parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and current_date<@ child.valid; QUERY PLAN ------------------------------------------------------------------------------------------ Nested Loop (cost=4.50..32.35 rows=1 width=112) -> Nested Loop (cost=4.35..21.88 rows=1 width=76) -> Index Scan using node_pkey on node parent (cost=0.15..8.18 rows=1 width=36) Index Cond: (node_id = 1) Filter: (CURRENT_DATE <@ valid) -> Bitmap Heap Scan on link (cost=4.20..13.70 rows=1 width=40) Recheck Cond: (parent = 1) Filter: (CURRENT_DATE <@ valid) -> Bitmap Index Scan on link_pkey (cost=0.00..4.20 rows=6 width=0) Index Cond: (parent = 1) -> Index Scan using node_pkey on node child (cost=0.15..8.18 rows=1 width=36) Index Cond: (node_id = link.child) Filter: (CURRENT_DATE <@ valid) (13 rows) «Left Join»’s are lost. And in the result too: => select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_idwhere parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and current_date<@ child.valid; node_id | valid | parent | child | valid | node_id | valid ---------+-------+--------+-------+-------+---------+------- (0 rows)
På onsdag 16. januar 2019 kl. 11:54:21, skrev Олег Самойлов <splarv@ya.ru>:
Hi, all.
I got some mystic behaviour of PostgreSQL, perhaps this is a bug.[snip]
But things begin be strange if I add validation by time.
=> explain select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_id where parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and current_date <@ child.valid;
QUERY PLAN
------------------------------------------------------------------------------------------
Nested Loop (cost=4.50..32.35 rows=1 width=112)
-> Nested Loop (cost=4.35..21.88 rows=1 width=76)
-> Index Scan using node_pkey on node parent (cost=0.15..8.18 rows=1 width=36)
Index Cond: (node_id = 1)
Filter: (CURRENT_DATE <@ valid)
-> Bitmap Heap Scan on link (cost=4.20..13.70 rows=1 width=40)
Recheck Cond: (parent = 1)
Filter: (CURRENT_DATE <@ valid)
-> Bitmap Index Scan on link_pkey (cost=0.00..4.20 rows=6 width=0)
Index Cond: (parent = 1)
-> Index Scan using node_pkey on node child (cost=0.15..8.18 rows=1 width=36)
Index Cond: (node_id = link.child)
Filter: (CURRENT_DATE <@ valid)
(13 rows)
«Left Join»’s are lost. And in the result too:
=> select * from node as parent left join link on parent.node_id=link.parent left join node as child on link.child=child.node_id where parent.node_id=1 and current_date <@ parent.valid and current_date <@ link.valid and current_date <@ child.valid;
node_id | valid | parent | child | valid | node_id | valid
---------+-------+--------+-------+-------+---------+-------
(0 rows)
The moment you involve columns on "left joined" relations this way in the WHERE-clause, it effectively becomes a right join.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
>>>>> "Олег" == Олег Самойлов <splarv@ya.ru> writes: Олег> Hi, all. Олег> I got some mystic behaviour of PostgreSQL, perhaps this is a bug. Feature, actually. Олег> But things begin be strange if I add validation by time. Олег> => explain select * from node as parent left join link on Олег> parent.node_id=link.parent left join node as child on Олег> link.child=child.node_id where parent.node_id=1 and current_date Олег> <@ parent.valid and current_date <@ link.valid and current_date Олег> <@ child.valid; The problem here is that (for example) child.valid is null if there was no matching child row in the join, and the planner knows that x <@ NULL is not true (since the chosen <@ operator is defined as strict), and therefore it knows that the left join is unnecessary and can be reduced to an inner join. At least 90% of the time when you refer to values from the nullable side of a join in a WHERE clause, you're making a mistake (e.g. the condition should have been in the ON clause instead). The other 10% or less of the time, you have to make sure you use non-strict conditions, i.e. take account of the fact that the values might be null. -- Andrew (irc:RhodiumToad)