Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables - Mailing list pgsql-sql
From | David G Johnston |
---|---|
Subject | Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables |
Date | |
Msg-id | CAKFQuwYbHmp5sr-oTn0SEaTq-AcuA3H3gA1AOSDSviGXrRR_qA@mail.gmail.com Whole thread Raw |
In response to | Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables (Andreas Joseph Krogh <andreas@visena.com>) |
Responses |
Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
(Andreas Joseph Krogh <andreas@visena.com>)
|
List | pgsql-sql |
På torsdag 12. juni 2014 kl. 21:52:22, skrev David G Johnston <[hidden email]>:SELECT
coalesce(apl.company_name, bns.company_name, pns.company_name, prs.company_name) AS company_name,
coalesce(apl.project_name, bns.project_name, pns.project_name, prs.project_name) AS project_name,
apl.qty AS num_apples,
bns.qty AS num_bananas,
pns.qty AS num_pineapples,
prs.qty AS num_pearsFROMcompany_apples AS aplFULL OUTER JOIN company_bananas AS bns
ON apl.company_id = bns.company_id
AND (apl.project_id IS NULL AND bns.project_id IS NULL OR apl.project_id = bns.project_id)FULL OUTER JOIN company_pineapples AS pns
ON apl.company_id = pns.company_id
AND (apl.project_id IS NULL AND pns.project_id IS NULL OR apl.project_id = pns.project_id)FULL OUTER JOIN company_pears AS prs
ON apl.company_id = prs.company_id
AND (apl.project_id IS NULL AND prs.project_id IS NULL OR apl.project_id = prs.project_id)ORDER BY company_name ASC, project_name ASC NULLS FIRST;This gives the result:
# company_name project_name num_apples num_bananas num_pineapples num_pears 1 C1 NULL 2 2 10 NULL 2 C1 P1 5 12 NULL NULL 3 C1 P2 2 NULL NULL NULL 4 C2 NULL 3 NULL 10 NULL 5 C2 P1 3 NULL NULL NULL 6 C2 P2 3 NULL NULL NULL 7 C3 NULL NULL 8 NULL NULL 8 C3 NULL NULL NULL NULL 7 As you see, there are two rows for C3, which should have been 1, with num_bananas=8 and num_pears=7. There has to be something wrong with my FULL OUTER JOINs but I don't know what....Row #7 is from the right side of the outer join between apples and bananasRow #8 is from the right side of the outer join between apples and pearsSince you are only using apple as the source of valid company-project pairs only those records will be "correct" per your definition.Even if you do not use a CROSS JOIN between company/product you have to obtain a master list of valid company-project pairs from ALL of the target tables. You can then LEFT JOIN that master against each of the target tables and be sure that you have a valid master record to attach to.The direct way to do this is:SELECT DISTINCT company_id, project_id FROM applesUNION DISTINCTSELECT DISTINCT company_id, project_id FROM pears[and so forth]David J.Though I guess you could also simply chain together the FULL OUTER join:FROM ( ( (apple OUTER bananas) AS ab OUTER pears ) AS abp OUTER pineapples ) AS abppDavid J.By OUTER, do you mean FULL OUTER JOIN here?
Yes
I'm unsure how to write the correct ON-clause of my FULL OUTER JOINs. Do you know how?
You ON-clause was just fine...
There will be more derived tables to FULL OUTER JOIN with so I need something robust, just don't know how to do it.
WITH
a_src (companyid, projectid, a_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), a_count FROM company_a)
, b_src (companyid, projectid, b_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), b_count FROM company_b)
, left_master AS ( SELECT DISTINCT companyid, projectid FROM a_src UNION DISTINCT SELECT DISTINCT companyid, projectid FROM b_src)
SELECT companyid, projectid, a_count, b_count
FROM left_master
LEFT JOIN a_src USING (companyid, projectid)
LEFT JOIN b_src USING (companyid, projectid)
;
If it is too slow to derive left_master you can consider adding triggers to the company_product tables to maintain a separate table of known combinations.
The chaining version:
WITH
a_src (companyid, projectid, a_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), a_count FROM company_a)
, b_src (companyid, projectid, b_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), b_count FROM company_b)
, c_src (companyid, projectid, c_count) AS ( SELECT companyid, COALESCE(projectid,'N/A'), c_count FROM company_c)
SELECT companyid, projectid, a_count, b_count, c_count
FROM (a_src FULL JOIN b_src USING (companyid, projectid)) ab_src FULL JOIN c_src USING (companyid, projectid)) abc_src
;
Though you could also test whether the following is faster:
a_raw FULL JOIN b_raw ON ((a_raw.companyid, COALESCE(a_raw.projectid, 'N/A')) = (b_raw.companyid, COALESCE(b_raw.projectid, 'N/A')))
Alternatively...go vertical:
WITH
a_src (companyid, projectid, item_count, item_type) AS ( SELECT companyid, COALESCE(projectid,'N/A'), a_count, 'A' FROM company_a)
, b_src (companyid, projectid, item_count, item_type) AS ( SELECT companyid, COALESCE(projectid,'N/A'), b_count, 'B' FROM company_b)
SELECT *
FROM a_src
UNION ALL
SELECT *
FROM b_src
;
David J.
View this message in context: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.