Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions - Mailing list pgsql-novice
From | John Koerber |
---|---|
Subject | Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions |
Date | |
Msg-id | 1924389003.228191257457692836.JavaMail.root@mri-mail Whole thread Raw |
Responses |
Re: Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions
|
List | pgsql-novice |
Hello, FULL JOIN requiring merge-joinable conditions has been covered previously in http://archives.postgresql.org/pgsql-sql/2006-01/msg00080.php for a different application; here is a new context in which FULL JOIN in non-merge-joinable conditions might be useful. In this example, events have a start_time and an end_time. I want a time-aligned listing of all events from two tables A and B, joined up where there is any time overlap between A's events with B's events. Here are the two tables: CREATE TABLE public.events_a AS SELECT 'A1' as event_a, '2009-01-03'::date as start_a, '2009-01-03'::date as end_a UNION SELECT 'A2' , '2009-01-07'::date , '2009-01-08'::date UNION SELECT 'A3' , '2009-01-14'::date , '2009-01-15'::date ; CREATE TABLE public.events_b AS SELECT 'B1' as event_b, '2009-01-04'::date as start_b, '2009-01-06'::date as end_b UNION SELECT 'B2' , '2009-01-07'::date , '2009-01-12'::date UNION SELECT 'B3' , '2009-01-13'::date , '2009-01-13'::date ; A FULL JOIN would be good because presumably it would require only two sequential scans, but it appears the OVERLAPS operator creates a non-merge-joinable condition; this query returns ERROR: FULL JOIN is only supported with merge-joinable join conditions / SQL state: 0A000: SELECT * FROM public.events_a A FULL JOIN public.events_b B ON (start_a, end_a) OVERLAPS (start_b, end_b) ORDER BY coalesce(start_a, start_b), start_b ; As noted in other posts to the archive listing above, the answer I am looking for can be had by UNIONing a LEFT join and a RIGHT JOIN: SELECT * FROM ( SELECT * FROM public.events_a A LEFT JOIN public.events_b B ON (start_a, end_a) OVERLAPS (start_b, end_b) UNION SELECT * FROM public.events_a A RIGHT OUTER JOIN public.events_b B ON (start_a, end_a) OVERLAPS (start_b, end_b) ) U ORDER BY COALESCE(start_a, start_b), start_b; but this requires 4 sequential scans: EXPLAIN above query: QUERY PLAN Sort (cost=298586.19..300828.86 rows=897066 width=80) Sort Key: (COALESCE(u.start_a, u.start_b)), u.start_b -> Subquery Scan u (cost=185220.13..209889.44 rows=897066 width=80) -> Unique (cost=185220.13..200918.78 rows=897066 width=80) -> Sort (cost=185220.13..187462.79 rows=897066 width=80) Sort Key: a.event_a, a.start_a, a.end_a, b.event_b, b.start_b, b.end_b -> Append (cost=22.76..96523.38 rows=897066 width=80) -> Nested Loop Left Join (cost=22.76..43776.36 rows=448533 width=80) Join Filter: "overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp with time zone, (b.start_b)::timestamp with time zone, (b.end_b)::timestamp with time zone) -> Seq Scan on events_a a (cost=0.00..21.60 rows=1160 width=40) -> Materialize (cost=22.76..34.36 rows=1160 width=40) -> Seq Scan on events_b b (cost=0.00..21.60 rows=1160 width=40) -> Nested Loop Left Join (cost=22.76..43776.36 rows=448533 width=80) Join Filter: "overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp with time zone, (b.start_b)::timestamp with time zone, (b.end_b)::timestamp with time zone) -> Seq Scan on events_b b (cost=0.00..21.60 rows=1160 width=40) -> Materialize (cost=22.76..34.36 rows=1160 width=40) -> Seq Scan on events_a a (cost=0.00..21.60 rows=1160 width=40) In practice, is there: * A different structuring the time spans of the 'events' such that a merge-joinable condition can be found? * A possibility of teaching Hash Join to do the FULL JOIN on non-merge-joinable conditions? Thanks, John Koerber Senior Systems Developer / Analyst
pgsql-novice by date: