Thread: Limitting full join to one match

Limitting full join to one match

From
"Phil Endecott"
Date:
Dear Experts,

I have a couple of tables that I want to reconcile, finding rows 
that match and places where rows are missing from one table or the 
other:

db=> select * from a;
+------------+--------+
|    date    | amount |
+------------+--------+
| 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |  <-- missing from b
| 2018-04-01 |   5.00 |
+------------+--------+

db=> select * from b;
+------------+--------+
|    date    | amount |
+------------+--------+
| 2018-01-01 |  10.00 |
| 2018-03-01 |   8.00 |  <-- missing from a
| 2018-04-01 |   5.00 |
+------------+--------+

db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
+------------+--------+------------+--------+
|    date    | amount |    date    | amount |
+------------+--------+------------+--------+
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |            |        |
|            |        | 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
+------------+--------+------------+--------+

This works fine until I have multiple items with the same date 
and amount:

db=> select * from a;
+------------+--------+
|    date    | amount |
+------------+--------+
| 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |
| 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 |  <--
| 2018-05-01 |  20.00 |  <--
+------------+--------+


db=> select * from b;
+------------+--------+
|    date    | amount |
+------------+--------+
| 2018-01-01 |  10.00 |
| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 |  <--
| 2018-05-01 |  20.00 |  <--
+------------+--------+

db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
+------------+--------+------------+--------+
|    date    | amount |    date    | amount |
+------------+--------+------------+--------+
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |            |        |
|            |        | 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  3
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  4
+------------+--------+------------+--------+

It has, of course, put four rows in the output for the new items.

So my question is: how can I modify my query to output only two rows, 
like this:?

+------------+--------+------------+--------+
|    date    | amount |    date    | amount |
+------------+--------+------------+--------+
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |            |        |
|            |        | 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
+------------+--------+------------+--------+


Any suggestions anyone?


The best I have found so far is something involving EXCEPT ALL:

db=> select * from a except all select * from b;
db=> select * from b except all select * from a;

That's not ideal, though, as what I ultimately want is something 
that lists everything with its status:

+------------+--------+--------+
|    date    | amount | status |
+------------+--------+--------+
| 2018-01-01 |  10.00 |   OK   |
| 2018-02-01 |   5.00 | a_only |
| 2018-03-01 |   8.00 | b_only |
| 2018-04-01 |   5.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
+------------+--------+--------+

That would be easy enough to achieve from the JOIN.


Thanks, Phil.




Re: Limitting full join to one match

From
John W Higgins
Date:


On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <spam_from_pgsql_lists@chezphil.org> wrote:
Dear Experts,

I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:

...
 
So my question is: how can I modify my query to output only two rows,
like this:?

+------------+--------+------------+--------+
|    date    | amount |    date    | amount |
+------------+--------+------------+--------+
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |            |        |
|            |        | 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
+------------+--------+------------+--------+


Evening Phil,

Window functions are your friend here. I prefer views for this stuff - but subqueries would work just fine.

create view a_rows as (select *, 
                       row_number() OVER (PARTITION BY date, amount) AS pos from a);
create view b_rows as (select *, 
                       row_number() OVER (PARTITION BY date, amount) AS pos from b);

select 
  a_rows.date, 
  a_rows.amount, 
  a_rows.pos,
  b_rows.date, 
  b_rows.amount,
  b_rows.pos
from 
  a_rows full join b_rows using (date,amount,pos);


John 

Any suggestions anyone?


The best I have found so far is something involving EXCEPT ALL:

db=> select * from a except all select * from b;
db=> select * from b except all select * from a;

That's not ideal, though, as what I ultimately want is something
that lists everything with its status:

+------------+--------+--------+
|    date    | amount | status |
+------------+--------+--------+
| 2018-01-01 |  10.00 |   OK   |
| 2018-02-01 |   5.00 | a_only |
| 2018-03-01 |   8.00 | b_only |
| 2018-04-01 |   5.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
+------------+--------+--------+

That would be easy enough to achieve from the JOIN.


Thanks, Phil.



Re: Limitting full join to one match

From
Ron
Date:
On 12/05/2018 06:34 PM, Phil Endecott wrote:
> Dear Experts,
>
> I have a couple of tables that I want to reconcile, finding rows
> that match and places where rows are missing from one table or the
> other:
>
> db=> select * from a;
> +------------+--------+
> |    date    | amount |
> +------------+--------+
> | 2018-01-01 |  10.00 |
> | 2018-02-01 |   5.00 |  <-- missing from b
> | 2018-04-01 |   5.00 |
> +------------+--------+
>
> db=> select * from b;
> +------------+--------+
> |    date    | amount |
> +------------+--------+
> | 2018-01-01 |  10.00 |
> | 2018-03-01 |   8.00 |  <-- missing from a
> | 2018-04-01 |   5.00 |
> +------------+--------+
>
> db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
> +------------+--------+------------+--------+
> |    date    | amount |    date    | amount |
> +------------+--------+------------+--------+
> | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
> | 2018-02-01 |   5.00 |            |        |
> |            |        | 2018-03-01 |   8.00 |
> | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
> +------------+--------+------------+--------+
>
> This works fine until I have multiple items with the same date
> and amount:
>
> db=> select * from a;
> +------------+--------+
> |    date    | amount |
> +------------+--------+
> | 2018-01-01 |  10.00 |
> | 2018-02-01 |   5.00 |
> | 2018-04-01 |   5.00 |
> | 2018-05-01 |  20.00 |  <--
> | 2018-05-01 |  20.00 |  <--
> +------------+--------+
>
>
> db=> select * from b;
> +------------+--------+
> |    date    | amount |
> +------------+--------+
> | 2018-01-01 |  10.00 |
> | 2018-03-01 |   8.00 |
> | 2018-04-01 |   5.00 |
> | 2018-05-01 |  20.00 |  <--
> | 2018-05-01 |  20.00 |  <--
> +------------+--------+

What's your PK on "a" and "b"?

(Also, gmail seems to think that all -- or at least most -- of your email is 
spam.)

> db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
> +------------+--------+------------+--------+
> |    date    | amount |    date    | amount |
> +------------+--------+------------+--------+
> | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
> | 2018-02-01 |   5.00 |            |        |
> |            |        | 2018-03-01 |   8.00 |
> | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  3
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  4
> +------------+--------+------------+--------+
>
> It has, of course, put four rows in the output for the new items.
>
> So my question is: how can I modify my query to output only two rows,
> like this:?
>
> +------------+--------+------------+--------+
> |    date    | amount |    date    | amount |
> +------------+--------+------------+--------+
> | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
> | 2018-02-01 |   5.00 |            |        |
> |            |        | 2018-03-01 |   8.00 |
> | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
> +------------+--------+------------+--------+
>
>
> Any suggestions anyone?
>
>
> The best I have found so far is something involving EXCEPT ALL:
>
> db=> select * from a except all select * from b;
> db=> select * from b except all select * from a;
>
> That's not ideal, though, as what I ultimately want is something
> that lists everything with its status:
>
> +------------+--------+--------+
> |    date    | amount | status |
> +------------+--------+--------+
> | 2018-01-01 |  10.00 |   OK   |
> | 2018-02-01 |   5.00 | a_only |
> | 2018-03-01 |   8.00 | b_only |
> | 2018-04-01 |   5.00 |   OK   |
> | 2018-05-01 |  20.00 |   OK   |
> | 2018-05-01 |  20.00 |   OK   |
> +------------+--------+--------+
>
> That would be easy enough to achieve from the JOIN.
>
>
> Thanks, Phil.
>
>
>

-- 
Angular momentum makes the world go 'round.


Re: Limitting full join to one match

From
"Phil Endecott"
Date:
John W Higgins wrote:
> On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <
> spam_from_pgsql_lists@chezphil.org> wrote:
>
>> Dear Experts,
>>
>> I have a couple of tables that I want to reconcile, finding rows
>> that match and places where rows are missing from one table or the
>> other:
>>
>> ...
>
>
>> So my question is: how can I modify my query to output only two rows,
>> like this:?
>>
>> +------------+--------+------------+--------+
>> |    date    | amount |    date    | amount |
>> +------------+--------+------------+--------+
>> | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
>> | 2018-02-01 |   5.00 |            |        |
>> |            |        | 2018-03-01 |   8.00 |
>> | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
>> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
>> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
>> +------------+--------+------------+--------+
>>
>>
> Evening Phil,
>
> Window functions are your friend here. I prefer views for this stuff - but
> subqueries would work just fine.
>
> create view a_rows as (select *,
>                        row_number() OVER (PARTITION BY date, amount) AS pos
> from a);
> create view b_rows as (select *,
>                        row_number() OVER (PARTITION BY date, amount) AS pos
> from b);
>
> select
>   a_rows.date,
>   a_rows.amount,
>   a_rows.pos,
>   b_rows.date,
>   b_rows.amount,
>   b_rows.pos
> from
>   a_rows full join b_rows using (date,amount,pos);


Thanks John, that's great.  I'm a little surprised that there isn't an
easier way, but this certainly works.


Regard, Phil.









Re: Limitting full join to one match

From
"Phil Endecott"
Date:
Hi Ron,

Ron wrote:
> On 12/05/2018 06:34 PM, Phil Endecott wrote:
>> Dear Experts,
>>
>> I have a couple of tables that I want to reconcile, finding rows
>> that match and places where rows are missing from one table or the
>> other:
>>
>> db=> select * from a;
>> +------------+--------+
>> |    date    | amount |
>> +------------+--------+
>> | 2018-01-01 |  10.00 |
>> | 2018-02-01 |   5.00 |  <-- missing from b
>> | 2018-04-01 |   5.00 |
>> +------------+--------+
>>
>> db=> select * from b;
>> +------------+--------+
>> |    date    | amount |
>> +------------+--------+
>> | 2018-01-01 |  10.00 |
>> | 2018-03-01 |   8.00 |  <-- missing from a
>> | 2018-04-01 |   5.00 |
>> +------------+--------+
>>
>> db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
>> +------------+--------+------------+--------+
>> |    date    | amount |    date    | amount |
>> +------------+--------+------------+--------+
>> | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
>> | 2018-02-01 |   5.00 |            |        |
>> |            |        | 2018-03-01 |   8.00 |
>> | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
>> +------------+--------+------------+--------+
>>
>> This works fine until I have multiple items with the same date
>> and amount:
>>
>> db=> select * from a;
>> +------------+--------+
>> |    date    | amount |
>> +------------+--------+
>> | 2018-01-01 |  10.00 |
>> | 2018-02-01 |   5.00 |
>> | 2018-04-01 |   5.00 |
>> | 2018-05-01 |  20.00 |  <--
>> | 2018-05-01 |  20.00 |  <--
>> +------------+--------+
>>
>>
>> db=> select * from b;
>> +------------+--------+
>> |    date    | amount |
>> +------------+--------+
>> | 2018-01-01 |  10.00 |
>> | 2018-03-01 |   8.00 |
>> | 2018-04-01 |   5.00 |
>> | 2018-05-01 |  20.00 |  <--
>> | 2018-05-01 |  20.00 |  <--
>> +------------+--------+
>
> What's your PK on "a" and "b"?

These input tables can have duplicate rows, so defining a primary key
requires something like a row ID or similar.


> (Also, gmail seems to think that all -- or at least most -- of your email is 
> spam.)

Yes, it is becoming increasingly difficult to persuade gmail etc. that
you are not a spammer if you run your own mail server.  If you have any
interesting headers suggesting exactly what they disliked about my message,
could you please forward them off-list?  Thanks.


Regards, Phil.









Re: Limitting full join to one match

From
Ravi Krishna
Date:


Yes, it is becoming increasingly difficult to persuade gmail etc. that
you are not a spammer if you run your own mail server.  If you have any
interesting headers suggesting exactly what they disliked about my message,
could you please forward them off-list?  Thanks.


It is for this reason (and few others) I am off gmail and other free email accounts.
I have tried gmail / outlook / yahoo / aol and all of them mark many mails to this list 
as spam.

I like fastmail a lot and $3 per month is practically free.