Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables - Mailing list pgsql-sql

From Andreas Joseph Krogh
Subject Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Date
Msg-id VisenaEmail.20c.1d6f9efc3aa09035.14691911e5e@tc7-on
Whole thread Raw
In response to Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables  (David G Johnston <david.g.johnston@gmail.com>)
Responses Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
List pgsql-sql
På torsdag 12. juni 2014 kl. 16:02:39, skrev David G Johnston <david.g.johnston@gmail.com>:
Note - numbers do not relate to your questions. This pretty much answers 1
and 3.

1. I'd ensure that project name/id on a table can never be null by creating
a dummy project that means "none assigned".
2. I would avoid full join.  To do so I'd cross join a distinct list of
companies with a distinct list of projects.
3. Against the join in 2 you then left join three times, once for each
product table.
4. In the final result any product not having a correspond company/project
would have its value coalesced to zero.

The end result is a table without any NULL and, in the case of your
explanatory data, 9 rows - two of which would be all zeros: (c3,p1) and
(c3,p,2)

If you cannot change the raw data I'd use CTE/WITH to normalize the data
according to 1 and the use these CTEs in the rest of the query.  I would
also do 2 is a CTE then 3 would be normal subqueries or relation references
as necessary.

David J.
 
Hi and thanks for input.
 
I cannot change the data so I have to deal with NULLs.
 
Not all companies or projects are in the result, only the ones involved having fruits, by combining the derived tables. I therefore don't see how I effectively can build up a list of distinct companies and projects to join with.
 
I have a simplified version (full schema below) here which shows that there's an error in the query (when one extra derived table is added to the query):
 
SELECT
    coalesce(apl.company_name, bns.company_name, pns.company_name, prs.company_name) AS company_name,
    coalesce(apl.project_name, bns.project_name, pns.project_name, prs.project_name) AS project_name,
    apl.qty                                                        AS num_apples,
    bns.qty                                                        AS num_bananas,
    pns.qty                                                        AS num_pineapples,
    prs.qty                                                        AS num_pears
FROM
 
    company_apples AS apl
 
    FULL OUTER JOIN company_bananas AS bns
        ON apl.company_id = bns.company_id
           AND (apl.project_id IS NULL AND bns.project_id IS NULL OR apl.project_id = bns.project_id)
 
    FULL OUTER JOIN company_pineapples AS pns
            ON apl.company_id = pns.company_id
               AND (apl.project_id IS NULL AND pns.project_id IS NULL OR apl.project_id = pns.project_id)
 
    FULL OUTER JOIN company_pears AS prs
            ON apl.company_id = prs.company_id
               AND (apl.project_id IS NULL AND prs.project_id IS NULL OR apl.project_id = prs.project_id)
 
ORDER BY company_name ASC, project_name ASC NULLS FIRST;
 
 
This gives the result:
 
#company_nameproject_namenum_applesnum_bananasnum_pineapplesnum_pears
1C1NULL2210NULL
2C1P1512NULLNULL
3C1P22NULLNULLNULL
4C2NULL3NULL10NULL
5C2P13NULLNULLNULL
6C2P23NULLNULLNULL
7C3NULLNULL8NULLNULL
8C3NULLNULLNULLNULL7
As you see, there are two rows for C3, which should have been 1, with num_bananas=8 and num_pears=7. There has to be something wrong with my FULL OUTER JOINs but I don't know what....
 
Anyone knows how to write the correct query for this?
 
Thanks.
 
 
Full schame and example-data:
 
drop table if exists company_apples;
drop table if exists company_bananas;
drop table if exists company_pineapples;
CREATE TABLE company_apples
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
);
CREATE TABLE company_bananas
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
);
CREATE TABLE company_pineapples
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
);
CREATE TABLE company_pears
(
    company_id INT,
    company_name VARCHAR,
    project_id INT,
    project_name VARCHAR,
    qty BIGINT
);

INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', null, null, 2);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 1, 'P1', 5);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 2, 'P2', 2);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', null, null, 3);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', 1, 'P1', 3);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', 2, 'P2', 3);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', null, null, 2);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 1, 'P1', 12);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (3, 'C3', null, null, 8);
INSERT INTO company_pineapples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', null, null, 10);
INSERT INTO company_pineapples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', null, null, 10);
INSERT INTO company_pears (company_id, company_name, project_id, project_name, qty) VALUES (3, 'C3', null, null, 7);
 
select company_name, project_name, qty from company_apples order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_bananas order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pineapples order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pears order by company_name ASC, project_name ASC nulls first;
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

pgsql-sql by date:

Previous
From: David G Johnston
Date:
Subject: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables
Next
From: David G Johnston
Date:
Subject: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables