Thread: BUG #9896: Bug in FULL OUTER JOIN

From:
sqlpro@sqlspot.com
Date:

The following bug has been logged on the website:

Bug reference:      9896
Logged by:          SQLpro
Email address:      
PostgreSQL version: 9.3.4
Operating system:   Windows
Description:

According to ISO standard SQL there is no limitation in predicat ON for JOIN
operator.
But PostGreSQL is unable to join tables in the FULL OUTER mode when
predicate ON is not sargable.
here is an example :

CREATE TABLE T_CLIENT_CLI --> customers
(CLI_ID       INT PRIMARY KEY,
 CLI_SIREN    CHAR(9) NOT NULL UNIQUE,
 CLI_ENSEIGNE VARCHAR(16) NOT NULL);

CREATE TABLE T_PROSPECT_PSP --> sale perspectives
(PSP_ID       INT PRIMARY KEY,
 PSP_SIREN    CHAR(9) NOT NULL UNIQUE,
 PSP_ENSEIGNE VARCHAR(16) NOT NULL);

We want to join on SIREN (national French firm number) or, in the case of
mistake in this number, on the name (enseigne).

Some datas :

INSERT INTO T_CLIENT_CLI VALUES
(1, '123456789', 'IBM'),
(2, '111111111', 'Microsoft'),
(3, '999555111', 'SAP');

INSERT INTO T_PROSPECT_PSP VALUES
(101, '123456789', 'IBM'),
(102, '555555555', 'Microsoft'),
(103, '777777777', 'Amazon'),
(104, '444444444', 'Google');

This query answer exactly to the question :

SELECT *
FROM   T_CLIENT_CLI AS C
       FULL OUTER JOIN T_PROSPECT_PSP AS P
            ON C.CLI_SIREN = P.PSP_SIREN
            OR C.CLI_ENSEIGNE = P.PSP_ENSEIGNE;

The result would be :

CLI_ID      CLI_SIREN CLI_ENSEIGNE     PSP_ID      PSP_SIREN PSP_ENSEIGNE
----------- --------- ---------------- ----------- ---------
----------------
1           123456789 IBM              101         123456789 IBM
2           111111111 Microsoft        102         555555555 Microsoft
3           999555111 SAP              NULL        NULL      NULL
NULL        NULL      NULL             103         777777777 Amazon
NULL        NULL      NULL             104         444444444 Google

But PG throw an exception :

ERREUR:  FULL JOIN is only supported with merge-joinable or hash-joinable
join conditions
État SQL :0A000

This is the case in version 9.1.2 too.
It seems that is the case in any version of PG...

By comparizon, this query guive the exact answer in :
- MS SQL Server since version 2000 to 2014 (2005, 2008/2008R2, 2012).
- Oracle 11G

Thanks
From:
Tom Lane
Date:

 writes:
> SELECT *
> FROM   T_CLIENT_CLI AS C
>        FULL OUTER JOIN T_PROSPECT_PSP AS P
>             ON C.CLI_SIREN = P.PSP_SIREN
>             OR C.CLI_ENSEIGNE = P.PSP_ENSEIGNE;

> But PG throw an exception :

> ERREUR:  FULL JOIN is only supported with merge-joinable or hash-joinable
> join conditions

Yeah.  The problem is the OR condition, which doesn't provide anything
that will work with either of those methods.

> By comparizon, this query guive the exact answer in :
> - MS SQL Server since version 2000 to 2014 (2005, 2008/2008R2, 2012).
> - Oracle 11G

Interesting.  I wonder where they keep the per-row match status?

Can you show us the query plans used by those systems?

In any case, don't hold your breath waiting for a fix; I'm just asking
to gather information for possible future work.  This is unlikely to
be simple to fix, and it's not going to be very high on anyone's priority
list either, given that few people use FULL JOIN as far as I've heard.

            regards, tom lane
From:
David Johnston
Date:

Tom Lane-2 wrote
> In any case, don't hold your breath waiting for a fix; I'm just asking
> to gather information for possible future work.  This is unlikely to
> be simple to fix, and it's not going to be very high on anyone's priority
> list either, given that few people use FULL JOIN as far as I've heard.

Infrequently, maybe, but especially for dataset comparisons FULL JOIN is
convenient.  That said, typically reformulation of the data and/or query is
possible, even if somewhat slower, so it isn't like getting a result is
impossible given this limitation.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-9896-Bug-in-FULL-OUTER-JOIN-tp5799008p5799042.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
From:
Tom Lane
Date:

I wrote:
>  writes:
>> SELECT *
>> FROM   T_CLIENT_CLI AS C
>> FULL OUTER JOIN T_PROSPECT_PSP AS P
>> ON C.CLI_SIREN = P.PSP_SIREN
>> OR C.CLI_ENSEIGNE = P.PSP_ENSEIGNE;

> Can you show us the query plans used by those systems?

For the archives' sake: the OP sent me a not-too-useful screen shot
in which SQL Server claims it's using a merge join for this query.

I find this less than credible: what linear sort order would bring
together all the potentially joinable rows?  If they're getting the
right answer at all, there must be some secret sauce in there someplace.
Perhaps they're just Doing It The Hard Way with state storage
proportional to the size of the relations?

            regards, tom lane
From:
Vik Fearing
Date:

On 04/07/2014 10:17 PM, Tom Lane wrote:
> I wrote:
>>  writes:
>>> SELECT *
>>> FROM   T_CLIENT_CLI AS C
>>> FULL OUTER JOIN T_PROSPECT_PSP AS P
>>> ON C.CLI_SIREN = P.PSP_SIREN
>>> OR C.CLI_ENSEIGNE = P.PSP_ENSEIGNE;
>> Can you show us the query plans used by those systems?
> For the archives' sake: the OP sent me a not-too-useful screen shot
> in which SQL Server claims it's using a merge join for this query.
>
> I find this less than credible: what linear sort order would bring
> together all the potentially joinable rows?  If they're getting the
> right answer at all, there must be some secret sauce in there someplace.
> Perhaps they're just Doing It The Hard Way with state storage
> proportional to the size of the relations?

And Oracle does it in two steps: http://www.sqlfiddle.com/#!4/607e0/1/0

--
Vik