Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions - Mailing list pgsql-novice

From John Koerber
Subject Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions
Date
Msg-id 1924389003.228191257457692836.JavaMail.root@mri-mail
Whole thread Raw
Responses Re: Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions
List pgsql-novice
Hello,

FULL JOIN requiring merge-joinable conditions has been covered previously
in http://archives.postgresql.org/pgsql-sql/2006-01/msg00080.php for a
different application; here is a new context in which FULL JOIN in
non-merge-joinable conditions might be useful.

In this example, events have a start_time and an end_time.  I want a
time-aligned listing of all events from two tables A and B, joined up
where there is any time overlap between A's events with B's events.  Here
are the two tables:

CREATE TABLE public.events_a AS
      SELECT 'A1' as event_a, '2009-01-03'::date as start_a,
'2009-01-03'::date as end_a
UNION SELECT 'A2'           , '2009-01-07'::date           ,
'2009-01-08'::date
UNION SELECT 'A3'           , '2009-01-14'::date           ,
'2009-01-15'::date
;
CREATE TABLE public.events_b AS
      SELECT 'B1' as event_b, '2009-01-04'::date as start_b,
'2009-01-06'::date as end_b
UNION SELECT 'B2'           , '2009-01-07'::date           ,
'2009-01-12'::date
UNION SELECT 'B3'           , '2009-01-13'::date           ,
'2009-01-13'::date
;
A FULL JOIN would be good because presumably it would require only two
sequential scans, but it appears the OVERLAPS operator creates a
non-merge-joinable condition; this query returns ERROR: FULL JOIN is only
supported with merge-joinable join conditions / SQL state: 0A000:

SELECT *
  FROM public.events_a A
  FULL JOIN public.events_b B
    ON (start_a, end_a) OVERLAPS (start_b, end_b)
 ORDER BY coalesce(start_a, start_b), start_b  ;

As noted in other posts to the archive listing above, the answer I am
looking for can be had by UNIONing a LEFT join and a RIGHT JOIN:

SELECT *
FROM
(
SELECT *
  FROM public.events_a A
  LEFT JOIN public.events_b B
    ON (start_a, end_a) OVERLAPS (start_b, end_b)
 UNION SELECT *
  FROM public.events_a A
  RIGHT OUTER JOIN public.events_b B
    ON (start_a, end_a) OVERLAPS (start_b, end_b)
) U
ORDER BY COALESCE(start_a, start_b), start_b;

but this requires 4 sequential scans:

EXPLAIN above query:
QUERY PLAN
Sort  (cost=298586.19..300828.86 rows=897066 width=80)
  Sort Key: (COALESCE(u.start_a, u.start_b)), u.start_b
  ->  Subquery Scan u  (cost=185220.13..209889.44 rows=897066 width=80)
        ->  Unique  (cost=185220.13..200918.78 rows=897066 width=80)
              ->  Sort  (cost=185220.13..187462.79 rows=897066 width=80)
                    Sort Key: a.event_a, a.start_a, a.end_a, b.event_b,
b.start_b, b.end_b
                    ->  Append  (cost=22.76..96523.38 rows=897066
width=80)
                          ->  Nested Loop Left Join  (cost=22.76..43776.36
rows=448533 width=80)
                                Join Filter:
"overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp
with time zone, (b.start_b)::timestamp with time zone,
(b.end_b)::timestamp with time zone)
                                ->  Seq Scan on events_a a
(cost=0.00..21.60 rows=1160 width=40)
                                ->  Materialize  (cost=22.76..34.36
rows=1160 width=40)
                                      ->  Seq Scan on events_b b
(cost=0.00..21.60 rows=1160 width=40)
                          ->  Nested Loop Left Join  (cost=22.76..43776.36
rows=448533 width=80)
                                Join Filter:
"overlaps"((a.start_a)::timestamp with time zone, (a.end_a)::timestamp
with time zone, (b.start_b)::timestamp with time zone,
(b.end_b)::timestamp with time zone)
                                ->  Seq Scan on events_b b
(cost=0.00..21.60 rows=1160 width=40)
                                ->  Materialize  (cost=22.76..34.36
rows=1160 width=40)
                                      ->  Seq Scan on events_a a
(cost=0.00..21.60 rows=1160 width=40)

In practice, is there:
* A different structuring the time spans of the 'events' such that a
merge-joinable condition can be found?
* A possibility of teaching Hash Join to do the FULL JOIN on
non-merge-joinable conditions?

Thanks,

John Koerber
Senior Systems Developer / Analyst


pgsql-novice by date:

Previous
From: bob.vanslobig@att.net
Date:
Subject: SQL Server
Next
From: Tom Lane
Date:
Subject: Re: Cannot do time-aligned listing with FULL JOIN, get ERROR: FULL JOIN is only supported with merge-joinable join conditions