Thread: Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions
Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions
From
"John Koerber"
Date:
Hello, FULL JOIN requiring merge-joinable conditions has been covered previously in http://archives.postgresql.org/pgsql-sql/2006-01/msg00080.php for a different application; here is a new context in which FULL JOIN in non-merge-joinable conditions might be useful. In this example, events have a start_time and an end_time. I want a time-aligned listing of all events from two tables A and B, joined up where there is any time overlap between A's events with B's events. Here are the two tables: CREATE TABLE public.events_a AS SELECT 'A1' as event_a, '2009-01-03'::date as start_a, '2009-01-03'::date as end_a UNION SELECT 'A2' , '2009-01-07'::date , '2009-01-08'::date UNION SELECT 'A3' , '2009-01-14'::date , '2009-01-15'::date ; CREATE TABLE public.events_b AS SELECT 'B1' as event_b, '2009-01-04'::date as start_b, '2009-01-06'::date as end_b UNION SELECT 'B2' , '2009-01-07'::date , '2009-01-12'::date UNION SELECT 'B3' , '2009-01-13'::date , '2009-01-13'::date ; A FULL JOIN would be good because presumably it would require only two sequential scans, but it appears the OVERLAPS operator creates a non-merge-joinable condition; this query returns ERROR: FULL JOIN is only supported with merge-joinable join conditions / SQL state: 0A000: SELECT * FROM public.events_a A FULL JOIN public.events_b B ON (start_a, end_a) OVERLAPS (start_b, end_b) ORDER BY coalesce(start_a, start_b), start_b ; As noted in other posts to the archive listing above, the answer I am looking for can be had by UNIONing a LEFT join and a RIGHT JOIN: SELECT * FROM ( SELECT * FROM public.events_a A LEFT JOIN public.events_b B ON (start_a, end_a) OVERLAPS (start_b, end_b) UNION SELECT * FROM public.events_a A RIGHT OUTER JOIN public.events_b B ON (start_a, end_a) OVERLAPS (start_b, end_b) ) U ORDER BY COALESCE(start_a, start_b), start_b; but this requires 4 sequential scans: EXPLAIN above query: QUERY PLAN Sort (cost=298586.19..300828.86 rows=897066 width=80) Sort Key: (COALESCE(u.start_a, u.start_b)), u.start_b -> Subquery Scan u (cost=185220.13..209889.44 rows=897066 width=80) -> Unique (cost=185220.13..200918.78 rows=897066 width=80) -> Sort (cost=185220.13..187462.79 rows=897066 width=80) Sort Key: a.event_a, a.start_a, a.end_a, b.event_b, b.start_b, b.end_b -> Append (cost=22.76..96523.38 rows=897066 width=80) -> Nested Loop Left Join (cost=22.76..43776.36 rows=448533 width=80) Join Filter: "overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp with time zone, (b.start_b)::timestamp with time zone, (b.end_b)::timestamp with time zone) -> Seq Scan on events_a a (cost=0.00..21.60 rows=1160 width=40) -> Materialize (cost=22.76..34.36 rows=1160 width=40) -> Seq Scan on events_b b (cost=0.00..21.60 rows=1160 width=40) -> Nested Loop Left Join (cost=22.76..43776.36 rows=448533 width=80) Join Filter: "overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp with time zone, (b.start_b)::timestamp with time zone, (b.end_b)::timestamp with time zone) -> Seq Scan on events_b b (cost=0.00..21.60 rows=1160 width=40) -> Materialize (cost=22.76..34.36 rows=1160 width=40) -> Seq Scan on events_a a (cost=0.00..21.60 rows=1160 width=40) In practice, is there: * A different structuring the time spans of the 'events' such that a merge-joinable condition can be found? * A possibility of teaching Hash Join to do the FULL JOIN on non-merge-joinable conditions? Thanks, John Koerber Senior Systems Developer / Analyst
Re: Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions
From
Tom Lane
Date:
"John Koerber" <johnk@musicreports.com> writes: > SELECT * > FROM public.events_a A > FULL JOIN public.events_b B > ON (start_a, end_a) OVERLAPS (start_b, end_b) > ORDER BY coalesce(start_a, start_b), start_b ; > [ doesn't work ] > In practice, is there: > * A different structuring the time spans of the 'events' such that a > merge-joinable condition can be found? > * A possibility of teaching Hash Join to do the FULL JOIN on > non-merge-joinable conditions? Even if we could do full joins by hashing, that wouldn't help you since OVERLAPS is no more hashable than it is mergeable. The only possible join plan would be nestloop, with a work table the size of the inner input to keep track of which inner rows hadn't been joined to anything :-( regards, tom lane