Thread: [HACKERS] Idea on how to simplify comparing two sets

[HACKERS] Idea on how to simplify comparing two sets

From

Joel Jacobson

Date:

07 February 2017, 18:13:40

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT   GREATEST(       (SELECT COUNT(*) FROM T1),       (SELECT COUNT(*) FROM T2)   )   =   (SELECT COUNT(*) FROM (
   SELECT * FROM T1       INTERSECT ALL       SELECT * FROM T2   ) AS X)
 
INTO _Identical;

or,

SELECT 'Missmatch!' WHERE EXISTS (   SELECT * FROM Foo   FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
               Foo IS NOT DISTINCT FROM Bar)   WHERE TRUE   AND ( Foo.FooID BETWEEN 1 AND 10000 AND         Bar.BarID
BETWEEN1 AND 10000    )   AND ( Foo.FooID IS NULL OR         Bar.BarID IS NULL);
 

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

So here is an idea that doesn't break backwards compatibility:

Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.

SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR:  syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;

The example above could be written as:

_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);

Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.

Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.

Thoughts?

/Joel

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Anders Granlund

Date:

07 February 2017, 18:40:39

What about this ambiguity?

SELECT TRUE

WHERE FALSE
IS NOT DISTINCT FROM
(SELECT TRUE)

On Tue, Feb 7, 2017 at 4:13 PM, Joel Jacobson <joel@trustly.com> wrote:

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT
GREATEST(
(SELECT COUNT(*) FROM T1),
(SELECT COUNT(*) FROM T2)
)
=
(SELECT COUNT(*) FROM (
SELECT * FROM T1
INTERSECT ALL
SELECT * FROM T2
) AS X)
INTO _Identical;

or,

SELECT 'Missmatch!' WHERE EXISTS (
SELECT * FROM Foo
FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
Foo IS NOT DISTINCT FROM Bar)
WHERE TRUE
AND ( Foo.FooID BETWEEN 1 AND 10000 AND
Bar.BarID BETWEEN 1 AND 10000 )
AND ( Foo.FooID IS NULL OR
Bar.BarID IS NULL);

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

So here is an idea that doesn't break backwards compatibility:

Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.

SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR: syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;

The example above could be written as:

_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);

Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.

Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.

Thoughts?

/Joel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Joel Jacobson

Date:

07 February 2017, 18:43:16

But that's already a valid statement, so there is no ambiguity:

SELECT TRUE WHERE FALSE
IS NOT DISTINCT FROM
(SELECT TRUE);bool
------
(0 rows)


If you want to compare the set (SELECT TRUE WHERE FALSE) with the set
(SELECT TRUE) then just add parenthesis:
(SELECT TRUE WHERE FALSE)
IS NOT DISTINCT FROM
(SELECT TRUE);
ERROR:  syntax error at or near "IS"
LINE 2: IS NOT DISTINCT FROM       ^

Which is currently invalid syntax.




On Tue, Feb 7, 2017 at 4:40 PM, Anders Granlund
<anders.granlund@trustly.com> wrote:
> What about this ambiguity?
>
> SELECT TRUE
> WHERE FALSE
> IS NOT DISTINCT FROM
> (SELECT TRUE)
>
> On Tue, Feb 7, 2017 at 4:13 PM, Joel Jacobson <joel@trustly.com> wrote:
>>
>>  Hi hackers,
>>
>> Currently there is no simple way to check if two sets are equal.
>>
>> Looks like no RDBMS in the world has a simple command for it.
>>
>> You have to do something like:
>>
>> WITH
>> T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
>> T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
>> SELECT
>>     GREATEST(
>>         (SELECT COUNT(*) FROM T1),
>>         (SELECT COUNT(*) FROM T2)
>>     )
>>     =
>>     (SELECT COUNT(*) FROM (
>>         SELECT * FROM T1
>>         INTERSECT ALL
>>         SELECT * FROM T2
>>     ) AS X)
>> INTO _Identical;
>>
>> or,
>>
>> SELECT 'Missmatch!' WHERE EXISTS (
>>     SELECT * FROM Foo
>>     FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
>>                                      Foo IS NOT DISTINCT FROM Bar)
>>     WHERE TRUE
>>     AND ( Foo.FooID BETWEEN 1 AND 10000 AND
>>           Bar.BarID BETWEEN 1 AND 10000    )
>>     AND ( Foo.FooID IS NULL OR
>>           Bar.BarID IS NULL);
>>
>> Introducing new SQL keywords is of course not an option,
>> since it would possibly break backwards compatibility.
>>
>> So here is an idea that doesn't break backwards compatibility:
>>
>> Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
>> that is currently a syntax error when used between two sets.
>>
>> SELECT 1 IS DISTINCT FROM SELECT 1;
>> ERROR:  syntax error at or near "SELECT"
>> LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;
>>
>> The example above could be written as:
>>
>> _Identical := (
>> SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
>> IS NOT DISTINCT FROM
>> SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
>> );
>>
>> Which would set _Identical to TRUE if the two sets are equal,
>> and FALSE otherwise.
>>
>> Since it's currently a syntax error, there is no risk for changed
>> behaviour for any existing executable queries.
>>
>> Thoughts?
>>
>> /Joel
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



-- 
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Tom Lane

Date:

07 February 2017, 18:58:41

Joel Jacobson <joel@trustly.com> writes:
> Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?
        regards, tom lane

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Joel Jacobson

Date:

07 February 2017, 19:16:51

On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Joel Jacobson <joel@trustly.com> writes:
>> Currently there is no simple way to check if two sets are equal.
>
> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
> and SELECT set2 EXCEPT SELECT set1 are both empty?

Yes, that's one way, but it's ugly as you have to repeat yourself and
write both sets two times.
Not an issue for small queries, but if you have two big queries stored
in a .sql file,
you would have to modify both places for each query and always make
sure they are identical.

Re: [HACKERS] Idea on how to simplify comparing two sets

From

David Fetter

Date:

07 February 2017, 20:10:17

On Tue, Feb 07, 2017 at 04:13:40PM +0100, Joel Jacobson wrote:
>  Hi hackers,
> 
> Currently there is no simple way to check if two sets are equal.

Assuming that a and b each has at least one NOT NULL column, is this
simple enough?  Based on nothing much, I'm assuming here that the IS
NOT NULL test is faster than IS NULL, but you can flip that and change
the array to {0} with identical effect.

WITH t AS (   SELECT a AS a, b AS b, (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind   FROM a FULL JOIN b ON ...
)
SELECT array_agg(DISTINCT ind) = '{2}'
FROM t;

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Idea on how to simplify comparing two sets

From

David Fetter

Date:

07 February 2017, 20:28:19

On Tue, Feb 07, 2017 at 09:10:17AM -0800, David Fetter wrote:
> On Tue, Feb 07, 2017 at 04:13:40PM +0100, Joel Jacobson wrote:
> >  Hi hackers,
> > 
> > Currently there is no simple way to check if two sets are equal.
> 
> Assuming that a and b each has at least one NOT NULL column, is this
> simple enough?  Based on nothing much, I'm assuming here that the IS
> NOT NULL test is faster than IS NULL, but you can flip that and change
> the array to {0} with identical effect.
> 
> WITH t AS (
>     SELECT a AS a, b AS b, (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind
>     FROM a FULL JOIN b ON ...
> )
> SELECT array_agg(DISTINCT ind) = '{2}'
> FROM t;

You don't actually need a and b in the inner target list.

WITH t AS (   SELECT (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind   FROM a FULL JOIN b ON ...
)
SELECT array_agg(DISTINCT ind) = '{2}'
FROM t;

This could be shortened further to the following if we ever implement
DISTINCT for window functions, which might involve implementing
DISTINCT via hashing more generally, which means hashable
types...whee!

SELECT array_agg(DISTINCT (a IS NOT NULL)::int + (b IS NOT NULL)::int) OVER () = '{2}'
FROM a FULL JOIN b ON ... 

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Joel Jacobson

Date:

07 February 2017, 20:33:41

On Tue, Feb 7, 2017 at 6:28 PM, David Fetter <david@fetter.org> wrote:
> This could be shortened further to the following if we ever implement
> DISTINCT for window functions, which might involve implementing
> DISTINCT via hashing more generally, which means hashable
> types...whee!
>
> SELECT array_agg(DISTINCT (a IS NOT NULL)::int + (b IS NOT NULL)::int) OVER () = '{2}'
> FROM a FULL JOIN b ON ...

That's still a lot more syntax than just adding "IS NOT DISTINCT FROM"
in between the sets.

I just thought the general approach at looking for ways to express new
things in SQL,
without introducing new keywords, but instead rely on existing keywords but
that currently are syntax errors when used in some semantic way,
is an interesting approach to allow extending the SQL syntax without
breaking backwards compatibility.

Are there any historical examples of when this approach has been used
to make progress in PostgreSQL?

Re: [HACKERS] Idea on how to simplify comparing two sets

From

"David G. Johnston"

Date:

07 February 2017, 20:46:43

On Tue, Feb 7, 2017 at 8:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:
> Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

SELECT set1 FULL EXCEPT SELECT set2 ?

Matches with the concept and syntax of "FULL JOIN".

SELECT set1 XOR SELECT set2

That said I'm not sure how much we want to go down this road on our own. It'd be nice to have when its needed but its not something that gets much visibility on these lists to suggest a large pent-up demand.

IS NOT DISTINCT FROM doesn't imply bi-direction any better than EXCEPT does ... if we are going to add new syntax I'd say it should at least do that.

David J.

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Tom Lane

Date:

07 February 2017, 21:03:14

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, Feb 7, 2017 at 8:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Joel Jacobson <joel@trustly.com> writes:
>>> Currently there is no simple way to check if two sets are equal.

>> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
>> and SELECT set2 EXCEPT SELECT set1 are both empty?

> SELECT set1 FULL EXCEPT SELECT set2 ?
> Matches with the concept and syntax of "FULL JOIN".

Actually ... now that you mention full join, I believe this works:

select * from (select ...) s1 full join (select ...) s2 on ((s1.*)=(s2.*)) where s1.* is distinct from s2.*;

> That said I'm not sure how much we want to go down this road on our own.
> It'd be nice to have when its needed but its not something that gets much
> visibility on these lists to suggest a large pent-up demand.

Yeah, if this isn't in the standard and not in other databases either,
that would seem to suggest that it's not a big requirement.
        regards, tom lane

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Fabien COELHO

Date:

07 February 2017, 21:48:30

> Currently there is no simple way to check if two sets are equal.

My 0.02€:

Assuming that you mean set = relation, and that there is a key (which 
should be the case for a set otherwise tuples cannot be distinguished, so 
this is not really a set), and assuming not null other data, then:

CREATE TABLE TAB1(k INT PRIMARY KEY, data TEXT NOT NULL);
INSERT INTO TAB1 VALUES (1, 'one'), (2, 'two'), (3, 'three');

CREATE TABLE TAB2(k INT PRIMARY KEY, data TEXT NOT NULL);
INSERT INTO TAB2 VALUES (1, 'one'), (2, 'deux'), (4, 'four');

The TAB1 to TAB2 difference is computed with:

SELECT  CASE WHEN t1.k IS NULL THEN 'INSERT'       WHEN t2.k IS NULL THEN 'DELETE'       ELSE 'UPDATE'  END AS
operation, COALESCE(t1.k, t2.k) AS key
 
FROM TAB1 AS t1  FULL JOIN TAB2 AS t2 USING (k)
WHERE  t1.data IS NULL OR t2.data IS NULL OR t1.data <> t2.data;

Results in:
   UPDATE | 2   DELETE | 3   INSERT | 4

If there is no differences, then sets are equals...

If there is no associated data, a simpler condition:
  WHERE t1.k IS NULL OR t2.k IS NULL;

-- 
Fabien.

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Merlin Moncure

Date:

08 February 2017, 05:50:06

1On Tue, Feb 7, 2017 at 9:46 PM, Joel Jacobson <joel@trustly.com> wrote:
> On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Joel Jacobson <joel@trustly.com> writes:
>>> Currently there is no simple way to check if two sets are equal.
>>
>> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
>> and SELECT set2 EXCEPT SELECT set1 are both empty?
>
> Yes, that's one way, but it's ugly as you have to repeat yourself and
> write both sets two times.
> Not an issue for small queries, but if you have two big queries stored
> in a .sql file,
> you would have to modify both places for each query and always make
> sure they are identical.

A CTE might help:

WITH left AS (something complex),
right AS (something complex)
SELECT COUNT(*) = 0 AS good FROM
( SELECT * FROM left EXCEPT SELECT * FROM right UNION ALL SELECT * FROM right EXCEPT SELECT * FROM left
) q;

This isn't the most efficient solution, but is easily abstracted into
dynamic SQL (meaning, you could pass both queries as arguments to a
checker function).  Another, similar approach is to abstract the query
behind a view which ISTM is a practice you are underutilizing based on
your comments :-).

If I were in a hurry and the dataset was enormous I would probably
dump both queries identically ordered to a .csv, and do:
diff left.csv right.csv | head -1

in bash or something like that.  Not sure if the utility of a
bidirectional EXCEPT is enough to justify adding custom syntax for
that approach.  I use the 'double EXCEPT' tactic fairly often and
understand the need, but the bar for non-standard syntax is pretty
high (and has been getting higher over the years, I think).

merlin

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Pantelis Theodosiou

Date:

08 February 2017, 12:13:49

On Tue, Feb 7, 2017 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:
> Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

regards, tom lane

Yes, if the wanted result is true or false, something like this:

SELECT EXISTS (TABLE a EXCEPT TABLE b)   OR EXISTS (TABLE b EXCEPT TABLE a) ;

And if a new operator was added (in the same category as UNION and EXCEPT), it could be:


SELECT EXISTS (TABLE a XORSET TABLE b) ;

What about using the = and <> operators in sets? Is the following allowed in the standard?


SELECT (TABLE a) <> (TABLE b) ;

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Pantelis Theodosiou

Date:

08 February 2017, 12:24:15

On Tue, Feb 7, 2017 at 3:13 PM, Joel Jacobson <joel@trustly.com> wrote:

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

...

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

I'm not advocating it but I don't see how introducing new SQL keywords breaks backwards compatibility.

Pantelis Theodosiou

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Robert Haas

Date:

08 February 2017, 16:40:22

On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote:
> I'm not advocating it but I don't see how introducing new SQL keywords
> breaks backwards compatibility.

It does at least a little bit.  This starts failing:

select 1 new_keyword form blah;

(now you have to insert AS or quote the keyword)

If the new keyword is partially or fully reserved, then anybody who is
using that column in a now-impermissible way has to change names of
conflicting tables, columns, functions, etc.

But of course we do add keywords in every release, where it's
warranted by the value of the new feature.  I think the problem for
this proposed feature is not that adding new keywords is a completely
insane thing to do but that (1) there are lots of other good ways to
do more or less what is being requested with this new syntax, so it's
not clear that we need a new one and (2) there are cases where it
might be ambiguous which meaning of IS NOT DISTINCT FROM is met.
Joel's answer to #2 is to just prefer the existing meaning wherever
it's possible to assign that meaning and use the new meaning only in
cases where the existing meaning is impossible, but (2a) if you tried
to code it up, you'd probably find that it's quite difficult to make
bison generate a grammar that works that way, because bison isn't
designed around the idea of giving things a meaning only if they don't
already have one and (2b) apparently ambiguities can be confusing to
users even if they've been eliminated in the formal grammar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Tom Lane

Date:

08 February 2017, 19:22:56

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote:
>> I'm not advocating it but I don't see how introducing new SQL keywords
>> breaks backwards compatibility.

> It does at least a little bit.

Yes.  I think a new set-operation keyword would inevitably have to be
fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which means
that you'd break every application that has used that word as a table,
column, or function name.

Generally speaking, we try very darn hard not to introduce new reserved
words that are not called out as reserved in the SQL standard.  (And even
for those, we've sometimes made the grammar jump through hoops so as
not to reserve a word that we didn't reserve previously.)
        regards, tom lane

Re: [HACKERS] Idea on how to simplify comparing two sets

From

David Fetter

Date:

08 February 2017, 20:15:02

On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote:
> >> I'm not advocating it but I don't see how introducing new SQL keywords
> >> breaks backwards compatibility.
> 
> > It does at least a little bit.
> 
> Yes.  I think a new set-operation keyword would inevitably have to
> be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
> means that you'd break every application that has used that word as
> a table, column, or function name.

I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
elementary set operation not included in join types, but nobody at the
SQL standards committee seems to have cared enough to help.

> Generally speaking, we try very darn hard not to introduce new
> reserved words that are not called out as reserved in the SQL
> standard.  (And even for those, we've sometimes made the grammar
> jump through hoops so as not to reserve a word that we didn't
> reserve previously.)

We just never know what new keywords the standards committee will
dream up, or what silliness they'll introduce in the grammar :/

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Tom Lane

Date:

08 February 2017, 20:30:22

David Fetter <david@fetter.org> writes:
> On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:
>> Yes.  I think a new set-operation keyword would inevitably have to
>> be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
>> means that you'd break every application that has used that word as
>> a table, column, or function name.

> I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
> elementary set operation not included in join types, but nobody at the
> SQL standards committee seems to have cared enough to help.

I wonder whether you could shoehorn it in with no new reserved word
by spelling it "EXCEPT SYMMETRIC", which could be justified by the
precedent of BETWEEN SYMMETRIC.  But not sure what to do with
duplicate rows (ie, if LHS has two occurrences of row X and RHS
has one occurrence, do you output X?)
        regards, tom lane

Re: [HACKERS] Idea on how to simplify comparing two sets

From

"David G. Johnston"

Date:

08 February 2017, 20:40:15

On Wed, Feb 8, 2017 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Fetter <david@fetter.org> writes:
> On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:
>> Yes. I think a new set-operation keyword would inevitably have to
>> be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
>> means that you'd break every application that has used that word as
>> a table, column, or function name.

> I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
> elementary set operation not included in join types, but nobody at the
> SQL standards committee seems to have cared enough to help.

I wonder whether you could shoehorn it in with no new reserved word
by spelling it "EXCEPT SYMMETRIC", which could be justified by the
precedent of BETWEEN SYMMETRIC. But not sure what to do with
duplicate rows (ie, if LHS has two occurrences of row X and RHS
has one occurrence, do you output X?)

Without SYMMETRIC its defined to return:

max(m-n,0)

with SYMMETRIC I'd think that would just change to:

abs(m-n)

Then you still have to apply ALL or DISTINCT on the above result.

David J.

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Jim Nasby

Date:

24 February 2017, 00:16:07

On 2/7/17 12:03 PM, Tom Lane wrote:
>> That said I'm not sure how much we want to go down this road on our own.
>> It'd be nice to have when its needed but its not something that gets much
>> visibility on these lists to suggest a large pent-up demand.
> Yeah, if this isn't in the standard and not in other databases either,
> that would seem to suggest that it's not a big requirement.

FWIW I've found myself needing the precursor to this this (give me the 
full diff) at least a couple times, and it's quite a PITA on anything 
but a trivial relation.

It's also not possible to make this easier via an SRF because you don't 
know in advance what the result set looks like. So the best I've ever 
come up with is a file that can be included in psql that depends on 
having set two psql variables to the names of relations that can be 
queried (and if you need more than a relation you need to create a temp 
view).

I've wondered about the possibility of allowing PLs the ability to 
dynamically define their return type based on their arguments. That 
would allow for an SRF to handle this case, and would be significantly 
more flexible than trying to do that using pseudotypes.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Nico Williams

Date:

24 February 2017, 00:27:43

On Tue, Feb 07, 2017 at 10:58:41AM -0500, Tom Lane wrote:
> Joel Jacobson <joel@trustly.com> writes:
> > Currently there is no simple way to check if two sets are equal.
> 
> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
> and SELECT set2 EXCEPT SELECT set1 are both empty?

Even better, NATURAL(*) FULL OUTER JOIN the two table sources and check
that the result is empty.  If the two sources have useful indices (or if
PG constructs suitable automatic indices for them) for this then the
query should be O(N).

(*) However, if you do this then there'd better not be any NULLs in
columns, otherwise you'll get false positives for differences.  Of
course, if the two table sources have common primary key prefixes and
you only care about equality in those columns, then just FULL OUTER JOIN
USING (<primary key prefix>).

Nico
--

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Corey Huinker

Date:

24 February 2017, 00:33:05

FWIW I've found myself needing the precursor to this this (give me the full diff) at least a couple times, and it's quite a PITA on anything but a trivial relation.

It's also not possible to make this easier via an SRF because you don't know in advance what the result set looks like. So the best I've ever come up with is a file that can be included in psql that depends on having set two psql variables to the names of relations that can be queried (and if you need more than a relation you need to create a temp view).

I've wondered about the possibility of allowing PLs the ability to dynamically define their return type based on their arguments. That would allow for an SRF to handle this case, and would be significantly more flexible than trying to do that using pseudotypes.

My experiences are similar. At the moment, I'm resigned to using relying on pgtap:

-- set environment/path to point to "control"
create temporary table test_1_wants as <query of table or function as it was>;

-- set environment/path to point to "experiment"
create temporary table test_1_has as <query of thing we're trying to test now>;

select results_eq( 'table test_1_has', 'table test_1_wants', 'test 1');

I've had to do it with temp tables any time the environment is different between control/experiment, which is the case when you're developing a drop-in replacement for an SRF or view or whatever.

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Nico Williams

Date:

24 February 2017, 00:55:03

On Tue, Feb 07, 2017 at 01:03:14PM -0500, Tom Lane wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> 
> Actually ... now that you mention full join, I believe this works:
> 
> select * from (select ...) s1 full join (select ...) s2
>   on ((s1.*)=(s2.*)) where s1.* is distinct from s2.*;

You can drop the .*s:
 select * from (select ...) s1 full join (select ...) s2   on s1 = s2 where s1 is distinct from s2;

And even:
 select s1, s2 from (select ...) s1 natural full outer join (select ...) s2;

This makes it possible to write very generic (schema-wise) code for
comparing table sources.

As I've mentioned elsewhere, there is an issue with NULLs in columns...

I really, really would like either a full equijoin where equality treats
NULL = NULL -> true for this purpose, or a natural join where only
primary key or not-nullable columns are used, or a USING clause form
where I can specify such behavior without having to list all the columns
that should be used.

I use NATURAL FULL OUTER JOIN for computing materialized view diffs in
my alternate view materialization system.  NULLs are poison for this
purpose, yielding false positive differences.  But my code also uses the
table row value form above in order to avoid having to generate column
lists for a USING clause or expressions for ON.

These requests are not for syntactic sugar, not really.  But I realize
they may be non-trivial -- I may be looking for unobtanium.

> > That said I'm not sure how much we want to go down this road on our own.
> > It'd be nice to have when its needed but its not something that gets much
> > visibility on these lists to suggest a large pent-up demand.
> 
> Yeah, if this isn't in the standard and not in other databases either,
> that would seem to suggest that it's not a big requirement.

SQLite3 famously lacks FULL joins.  It kills me because the alternative
constructions become O(N log M) instead of O(N) for a properly
implemented FULL join (assuming suitable indices anyways).

I wouldn't suggest that that's a reason not to support FULL joins in any
other RDBMS, rather, I'd suggest that SQLite3 is missing an important
feature.

Pardon the tangent.  It may not really be applicable here, as here I
think OP is looking for syntactic sugar rather than an important
optimization.  But the point is that sometimes you have to lead the
standards-setting and/or the competition.

Nico
--

Re: [HACKERS] Idea on how to simplify comparing two sets

From

Jim Nasby

Date:

24 February 2017, 06:03:31

On 2/23/17 3:33 PM, Corey Huinker wrote:
> I've had to do it with temp tables any time the environment is different
> between control/experiment, which is the case when you're developing a
> drop-in replacement for an SRF or view or whatever.

FWIW I'd recommend temp views, to give the planner the most latitude. 
Less load from (presumably) pointless copying too.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)