Thread: Limitting full join to one match
Dear Experts, I have a couple of tables that I want to reconcile, finding rows that match and places where rows are missing from one table or the other: db=> select * from a; +------------+--------+ | date | amount | +------------+--------+ | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | <-- missing from b | 2018-04-01 | 5.00 | +------------+--------+ db=> select * from b; +------------+--------+ | date | amount | +------------+--------+ | 2018-01-01 | 10.00 | | 2018-03-01 | 8.00 | <-- missing from a | 2018-04-01 | 5.00 | +------------+--------+ db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); +------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | +------------+--------+------------+--------+ This works fine until I have multiple items with the same date and amount: db=> select * from a; +------------+--------+ | date | amount | +------------+--------+ | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | <-- | 2018-05-01 | 20.00 | <-- +------------+--------+ db=> select * from b; +------------+--------+ | date | amount | +------------+--------+ | 2018-01-01 | 10.00 | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | <-- | 2018-05-01 | 20.00 | <-- +------------+--------+ db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); +------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 3 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 4 +------------+--------+------------+--------+ It has, of course, put four rows in the output for the new items. So my question is: how can I modify my query to output only two rows, like this:? +------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 +------------+--------+------------+--------+ Any suggestions anyone? The best I have found so far is something involving EXCEPT ALL: db=> select * from a except all select * from b; db=> select * from b except all select * from a; That's not ideal, though, as what I ultimately want is something that lists everything with its status: +------------+--------+--------+ | date | amount | status | +------------+--------+--------+ | 2018-01-01 | 10.00 | OK | | 2018-02-01 | 5.00 | a_only | | 2018-03-01 | 8.00 | b_only | | 2018-04-01 | 5.00 | OK | | 2018-05-01 | 20.00 | OK | | 2018-05-01 | 20.00 | OK | +------------+--------+--------+ That would be easy enough to achieve from the JOIN. Thanks, Phil.
On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <spam_from_pgsql_lists@chezphil.org> wrote:
Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:
...
So my question is: how can I modify my query to output only two rows,
like this:?
+------------+--------+------------+--------+
| date | amount | date | amount |
+------------+--------+------------+--------+
| 2018-01-01 | 10.00 | 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | | |
| | | 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 | 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2
+------------+--------+------------+--------+
Evening Phil,
Window functions are your friend here. I prefer views for this stuff - but subqueries would work just fine.
create view a_rows as (select *,
row_number() OVER (PARTITION BY date, amount) AS pos from a);
create view b_rows as (select *,
row_number() OVER (PARTITION BY date, amount) AS pos from b);
select
a_rows.date,
a_rows.amount,
a_rows.pos,
b_rows.date,
b_rows.amount,
b_rows.pos
from
a_rows full join b_rows using (date,amount,pos);
Example here - http://sqlfiddle.com/#!17/305d6/3
John
Any suggestions anyone?
The best I have found so far is something involving EXCEPT ALL:
db=> select * from a except all select * from b;
db=> select * from b except all select * from a;
That's not ideal, though, as what I ultimately want is something
that lists everything with its status:
+------------+--------+--------+
| date | amount | status |
+------------+--------+--------+
| 2018-01-01 | 10.00 | OK |
| 2018-02-01 | 5.00 | a_only |
| 2018-03-01 | 8.00 | b_only |
| 2018-04-01 | 5.00 | OK |
| 2018-05-01 | 20.00 | OK |
| 2018-05-01 | 20.00 | OK |
+------------+--------+--------+
That would be easy enough to achieve from the JOIN.
Thanks, Phil.
On 12/05/2018 06:34 PM, Phil Endecott wrote: > Dear Experts, > > I have a couple of tables that I want to reconcile, finding rows > that match and places where rows are missing from one table or the > other: > > db=> select * from a; > +------------+--------+ > | date | amount | > +------------+--------+ > | 2018-01-01 | 10.00 | > | 2018-02-01 | 5.00 | <-- missing from b > | 2018-04-01 | 5.00 | > +------------+--------+ > > db=> select * from b; > +------------+--------+ > | date | amount | > +------------+--------+ > | 2018-01-01 | 10.00 | > | 2018-03-01 | 8.00 | <-- missing from a > | 2018-04-01 | 5.00 | > +------------+--------+ > > db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); > +------------+--------+------------+--------+ > | date | amount | date | amount | > +------------+--------+------------+--------+ > | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | > | 2018-02-01 | 5.00 | | | > | | | 2018-03-01 | 8.00 | > | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | > +------------+--------+------------+--------+ > > This works fine until I have multiple items with the same date > and amount: > > db=> select * from a; > +------------+--------+ > | date | amount | > +------------+--------+ > | 2018-01-01 | 10.00 | > | 2018-02-01 | 5.00 | > | 2018-04-01 | 5.00 | > | 2018-05-01 | 20.00 | <-- > | 2018-05-01 | 20.00 | <-- > +------------+--------+ > > > db=> select * from b; > +------------+--------+ > | date | amount | > +------------+--------+ > | 2018-01-01 | 10.00 | > | 2018-03-01 | 8.00 | > | 2018-04-01 | 5.00 | > | 2018-05-01 | 20.00 | <-- > | 2018-05-01 | 20.00 | <-- > +------------+--------+ What's your PK on "a" and "b"? (Also, gmail seems to think that all -- or at least most -- of your email is spam.) > db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); > +------------+--------+------------+--------+ > | date | amount | date | amount | > +------------+--------+------------+--------+ > | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | > | 2018-02-01 | 5.00 | | | > | | | 2018-03-01 | 8.00 | > | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | > | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 > | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 > | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 3 > | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 4 > +------------+--------+------------+--------+ > > It has, of course, put four rows in the output for the new items. > > So my question is: how can I modify my query to output only two rows, > like this:? > > +------------+--------+------------+--------+ > | date | amount | date | amount | > +------------+--------+------------+--------+ > | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | > | 2018-02-01 | 5.00 | | | > | | | 2018-03-01 | 8.00 | > | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | > | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 > | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 > +------------+--------+------------+--------+ > > > Any suggestions anyone? > > > The best I have found so far is something involving EXCEPT ALL: > > db=> select * from a except all select * from b; > db=> select * from b except all select * from a; > > That's not ideal, though, as what I ultimately want is something > that lists everything with its status: > > +------------+--------+--------+ > | date | amount | status | > +------------+--------+--------+ > | 2018-01-01 | 10.00 | OK | > | 2018-02-01 | 5.00 | a_only | > | 2018-03-01 | 8.00 | b_only | > | 2018-04-01 | 5.00 | OK | > | 2018-05-01 | 20.00 | OK | > | 2018-05-01 | 20.00 | OK | > +------------+--------+--------+ > > That would be easy enough to achieve from the JOIN. > > > Thanks, Phil. > > > -- Angular momentum makes the world go 'round.
John W Higgins wrote: > On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott < > spam_from_pgsql_lists@chezphil.org> wrote: > >> Dear Experts, >> >> I have a couple of tables that I want to reconcile, finding rows >> that match and places where rows are missing from one table or the >> other: >> >> ... > > >> So my question is: how can I modify my query to output only two rows, >> like this:? >> >> +------------+--------+------------+--------+ >> | date | amount | date | amount | >> +------------+--------+------------+--------+ >> | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | >> | 2018-02-01 | 5.00 | | | >> | | | 2018-03-01 | 8.00 | >> | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | >> | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 >> | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 >> +------------+--------+------------+--------+ >> >> > Evening Phil, > > Window functions are your friend here. I prefer views for this stuff - but > subqueries would work just fine. > > create view a_rows as (select *, > row_number() OVER (PARTITION BY date, amount) AS pos > from a); > create view b_rows as (select *, > row_number() OVER (PARTITION BY date, amount) AS pos > from b); > > select > a_rows.date, > a_rows.amount, > a_rows.pos, > b_rows.date, > b_rows.amount, > b_rows.pos > from > a_rows full join b_rows using (date,amount,pos); Thanks John, that's great. I'm a little surprised that there isn't an easier way, but this certainly works. Regard, Phil.
Hi Ron, Ron wrote: > On 12/05/2018 06:34 PM, Phil Endecott wrote: >> Dear Experts, >> >> I have a couple of tables that I want to reconcile, finding rows >> that match and places where rows are missing from one table or the >> other: >> >> db=> select * from a; >> +------------+--------+ >> | date | amount | >> +------------+--------+ >> | 2018-01-01 | 10.00 | >> | 2018-02-01 | 5.00 | <-- missing from b >> | 2018-04-01 | 5.00 | >> +------------+--------+ >> >> db=> select * from b; >> +------------+--------+ >> | date | amount | >> +------------+--------+ >> | 2018-01-01 | 10.00 | >> | 2018-03-01 | 8.00 | <-- missing from a >> | 2018-04-01 | 5.00 | >> +------------+--------+ >> >> db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); >> +------------+--------+------------+--------+ >> | date | amount | date | amount | >> +------------+--------+------------+--------+ >> | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | >> | 2018-02-01 | 5.00 | | | >> | | | 2018-03-01 | 8.00 | >> | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | >> +------------+--------+------------+--------+ >> >> This works fine until I have multiple items with the same date >> and amount: >> >> db=> select * from a; >> +------------+--------+ >> | date | amount | >> +------------+--------+ >> | 2018-01-01 | 10.00 | >> | 2018-02-01 | 5.00 | >> | 2018-04-01 | 5.00 | >> | 2018-05-01 | 20.00 | <-- >> | 2018-05-01 | 20.00 | <-- >> +------------+--------+ >> >> >> db=> select * from b; >> +------------+--------+ >> | date | amount | >> +------------+--------+ >> | 2018-01-01 | 10.00 | >> | 2018-03-01 | 8.00 | >> | 2018-04-01 | 5.00 | >> | 2018-05-01 | 20.00 | <-- >> | 2018-05-01 | 20.00 | <-- >> +------------+--------+ > > What's your PK on "a" and "b"? These input tables can have duplicate rows, so defining a primary key requires something like a row ID or similar. > (Also, gmail seems to think that all -- or at least most -- of your email is > spam.) Yes, it is becoming increasingly difficult to persuade gmail etc. that you are not a spammer if you run your own mail server. If you have any interesting headers suggesting exactly what they disliked about my message, could you please forward them off-list? Thanks. Regards, Phil.
Yes, it is becoming increasingly difficult to persuade gmail etc. thatyou are not a spammer if you run your own mail server. If you have anyinteresting headers suggesting exactly what they disliked about my message,could you please forward them off-list? Thanks.
It is for this reason (and few others) I am off gmail and other free email accounts.
I have tried gmail / outlook / yahoo / aol and all of them mark many mails to this list
as spam.
I like fastmail a lot and $3 per month is practically free.