Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables - Mailing list pgsql-sql

From David G Johnston
Subject Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Date
Msg-id CAKFQuwYbHmp5sr-oTn0SEaTq-AcuA3H3gA1AOSDSviGXrRR_qA@mail.gmail.com
Whole thread Raw
In response to Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables  (Andreas Joseph Krogh <andreas@visena.com>)
Responses Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables  (Andreas Joseph Krogh <andreas@visena.com>)
List pgsql-sql
On Thu, Jun 12, 2014 at 4:15 PM, Andreas Joseph Krogh-2 [via PostgreSQL] <[hidden email]> wrote:
På torsdag 12. juni 2014 kl. 21:52:22, skrev David G Johnston <[hidden email]>:
On Thu, Jun 12, 2014 at 3:49 PM, David Johnston <[hidden email]> wrote:
 
SELECT
    coalesce(apl.company_name, bns.company_name, pns.company_name, prs.company_name) AS company_name,
    coalesce(apl.project_name, bns.project_name, pns.project_name, prs.project_name) AS project_name,
    apl.qty                                                        AS num_apples,
    bns.qty                                                        AS num_bananas,
    pns.qty                                                        AS num_pineapples,
    prs.qty                                                        AS num_pears
FROM
 
    company_apples AS apl
 
    FULL OUTER JOIN company_bananas AS bns
        ON apl.company_id = bns.company_id
           AND (apl.project_id IS NULL AND bns.project_id IS NULL OR apl.project_id = bns.project_id)
 
    FULL OUTER JOIN company_pineapples AS pns
            ON apl.company_id = pns.company_id
               AND (apl.project_id IS NULL AND pns.project_id IS NULL OR apl.project_id = pns.project_id)
 
    FULL OUTER JOIN company_pears AS prs
            ON apl.company_id = prs.company_id
               AND (apl.project_id IS NULL AND prs.project_id IS NULL OR apl.project_id = prs.project_id)
 
ORDER BY company_name ASC, project_name ASC NULLS FIRST;
 
 
This gives the result:
 
#company_nameproject_namenum_applesnum_bananasnum_pineapplesnum_pears
1C1NULL2210NULL
2C1P1512NULLNULL
3C1P22NULLNULLNULL
4C2NULL3NULL10NULL
5C2P13NULLNULLNULL
6C2P23NULLNULLNULL
7C3NULLNULL8NULLNULL
8C3NULLNULLNULLNULL7
As you see, there are two rows for C3, which should have been 1, with num_bananas=8 and num_pears=7. There has to be something wrong with my FULL OUTER JOINs but I don't know what....
 
 
 
​Row #7 is from the right side of the outer join between apples and bananas
Row #8 is from the right side of the outer join between apples and pears
 
Since you are only using apple as the source of valid company-project pairs only those records will be "correct" per your definition.
 
Even if you do not use a CROSS JOIN between company/product you have to obtain a master list of ​valid company-project pairs from ALL of the target tables.  You can then LEFT JOIN that master against each of the target tables and be sure that you have a valid master record to attach to.
 
The direct way to do this is:
 
SELECT DISTINCT company_id, project_id FROM apples
UNION DISTINCT
SELECT DISTINCT company_id, project_id FROM pears
[and so forth]
 
David J.
 
​Though I guess you could also simply chain together the FULL OUTER join:
 
FROM ( ( (apple OUTER bananas)​ AS ab OUTER pears ) AS abp OUTER pineapples ) AS abpp
 
David J.
 
By OUTER, do you mean FULL OUTER JOIN here?

​Yes

 
I'm unsure how to write the correct ON-clause of my FULL OUTER JOINs. Do you know how?
 

​You ON-clause was just fine...
There will be more derived tables to FULL OUTER JOIN with so I need something robust, just don't know how to do it.
 

​WITH 
a_src (companyid, projectid, a_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), a_count FROM company_a)
, b_src​ (companyid, projectid, b_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), b_count FROM company_b)
, left_master AS ( SELECT DISTINCT companyid, projectid FROM a_src UNION DISTINCT SELECT DISTINCT companyid, projectid FROM b_src)
SELECT companyid, projectid, a_count, b_count
FROM left_master
LEFT JOIN a_src USING (companyid, projectid)
LEFT JOIN b_src USING (companyid, projectid)
;

If it is too slow to derive left_master you can consider adding triggers to the company_product tables to maintain a separate table of known combinations.

The chaining version:

​WITH 
a_src (companyid, projectid, a_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), a_count FROM company_a)
, b_src​ (companyid, projectid, b_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), b_count FROM company_b)
, c_src​ (companyid, projectid, c_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), c_count FROM company_c)
SELECT companyid, projectid, a_count, b_count, c_count
FROM (a_src FULL JOIN b_src USING (companyid, projectid)) ab_src FULL JOIN c_src USING (companyid, projectid)) abc_src
;

Though you could also test whether the following is faster:

a_raw FULL JOIN b_raw ON ((a_raw.companyid, COALESCE(a_raw.projectid, 'N/A')) = (b_raw.companyid, COALESCE(b_raw.projectid, 'N/A')))



Alternatively...go vertical:

WITH
a_src (companyid, projectid, item_count, item_type) AS ( SELECT companyid, COALESCE(projectid,'N/A'), a_count, 'A' FROM company_a)
, b_src​ (companyid, projectid, item_count, item_type) AS ( SELECT companyid, COALESCE(projectid,'N/A'), b_count, 'B' FROM company_b)

SELECT * 
FROM a_src

UNION ALL

SELECT *
FROM b_src
;

David J.



View this message in context: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

pgsql-sql by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables