Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables - Mailing list pgsql-sql
From | Andreas Joseph Krogh |
---|---|
Subject | Re: Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables |
Date | |
Msg-id | VisenaEmail.20c.1d6f9efc3aa09035.14691911e5e@tc7-on Whole thread Raw |
In response to | Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables (David G Johnston <david.g.johnston@gmail.com>) |
Responses |
Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived
tables
Re: Problem with duplicate rows when FULL OUTER JOIN'ing 3 derived tables |
List | pgsql-sql |
På torsdag 12. juni 2014 kl. 16:02:39, skrev David G Johnston <david.g.johnston@gmail.com>:
Note - numbers do not relate to your questions. This pretty much answers 1
and 3.
1. I'd ensure that project name/id on a table can never be null by creating
a dummy project that means "none assigned".
2. I would avoid full join. To do so I'd cross join a distinct list of
companies with a distinct list of projects.
3. Against the join in 2 you then left join three times, once for each
product table.
4. In the final result any product not having a correspond company/project
would have its value coalesced to zero.
The end result is a table without any NULL and, in the case of your
explanatory data, 9 rows - two of which would be all zeros: (c3,p1) and
(c3,p,2)
If you cannot change the raw data I'd use CTE/WITH to normalize the data
according to 1 and the use these CTEs in the rest of the query. I would
also do 2 is a CTE then 3 would be normal subqueries or relation references
as necessary.
David J.
Hi and thanks for input.
I cannot change the data so I have to deal with NULLs.
Not all companies or projects are in the result, only the ones involved having fruits, by combining the derived tables. I therefore don't see how I effectively can build up a list of distinct companies and projects to join with.
I have a simplified version (full schema below) here which shows that there's an error in the query (when one extra derived table is added to the query):
SELECT
coalesce(apl.company_name, bns.company_name, pns.company_name, prs.company_name) AS company_name,
coalesce(apl.project_name, bns.project_name, pns.project_name, prs.project_name) AS project_name,
apl.qty AS num_apples,
bns.qty AS num_bananas,
pns.qty AS num_pineapples,
prs.qty AS num_pears
coalesce(apl.company_name, bns.company_name, pns.company_name, prs.company_name) AS company_name,
coalesce(apl.project_name, bns.project_name, pns.project_name, prs.project_name) AS project_name,
apl.qty AS num_apples,
bns.qty AS num_bananas,
pns.qty AS num_pineapples,
prs.qty AS num_pears
FROM
company_apples AS apl
FULL OUTER JOIN company_bananas AS bns
ON apl.company_id = bns.company_id
AND (apl.project_id IS NULL AND bns.project_id IS NULL OR apl.project_id = bns.project_id)
ON apl.company_id = bns.company_id
AND (apl.project_id IS NULL AND bns.project_id IS NULL OR apl.project_id = bns.project_id)
FULL OUTER JOIN company_pineapples AS pns
ON apl.company_id = pns.company_id
AND (apl.project_id IS NULL AND pns.project_id IS NULL OR apl.project_id = pns.project_id)
ON apl.company_id = pns.company_id
AND (apl.project_id IS NULL AND pns.project_id IS NULL OR apl.project_id = pns.project_id)
FULL OUTER JOIN company_pears AS prs
ON apl.company_id = prs.company_id
AND (apl.project_id IS NULL AND prs.project_id IS NULL OR apl.project_id = prs.project_id)
ON apl.company_id = prs.company_id
AND (apl.project_id IS NULL AND prs.project_id IS NULL OR apl.project_id = prs.project_id)
ORDER BY company_name ASC, project_name ASC NULLS FIRST;
This gives the result:
# | company_name | project_name | num_apples | num_bananas | num_pineapples | num_pears |
1 | C1 | NULL | 2 | 2 | 10 | NULL |
2 | C1 | P1 | 5 | 12 | NULL | NULL |
3 | C1 | P2 | 2 | NULL | NULL | NULL |
4 | C2 | NULL | 3 | NULL | 10 | NULL |
5 | C2 | P1 | 3 | NULL | NULL | NULL |
6 | C2 | P2 | 3 | NULL | NULL | NULL |
7 | C3 | NULL | NULL | 8 | NULL | NULL |
8 | C3 | NULL | NULL | NULL | NULL | 7 |
As you see, there are two rows for C3, which should have been 1, with num_bananas=8 and num_pears=7. There has to be something wrong with my FULL OUTER JOINs but I don't know what....
Anyone knows how to write the correct query for this?
Thanks.
Full schame and example-data:
drop table if exists company_apples;
drop table if exists company_bananas;
drop table if exists company_pineapples;
drop table if exists company_bananas;
drop table if exists company_pineapples;
CREATE TABLE company_apples
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
CREATE TABLE company_bananas
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
CREATE TABLE company_pineapples
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
CREATE TABLE company_pears
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
(
company_id INT,
company_name VARCHAR,
project_id INT,
project_name VARCHAR,
qty BIGINT
);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', null, null, 2);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 1, 'P1', 5);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 2, 'P2', 2);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', null, null, 3);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', 1, 'P1', 3);
INSERT INTO company_apples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', 2, 'P2', 3);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', null, null, 2);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 1, 'P1', 12);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (3, 'C3', null, null, 8);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', 1, 'P1', 12);
INSERT INTO company_bananas (company_id, company_name, project_id, project_name, qty) VALUES (3, 'C3', null, null, 8);
INSERT INTO company_pineapples (company_id, company_name, project_id, project_name, qty) VALUES (1, 'C1', null, null, 10);
INSERT INTO company_pineapples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', null, null, 10);
INSERT INTO company_pineapples (company_id, company_name, project_id, project_name, qty) VALUES (2, 'C2', null, null, 10);
INSERT INTO company_pears (company_id, company_name, project_id, project_name, qty) VALUES (3, 'C3', null, null, 7);
select company_name, project_name, qty from company_apples order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_bananas order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pineapples order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pears order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_bananas order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pineapples order by company_name ASC, project_name ASC nulls first;
select company_name, project_name, qty from company_pears order by company_name ASC, project_name ASC nulls first;
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963