[HACKERS] Idea on how to simplify comparing two sets - Mailing list pgsql-hackers

From Joel Jacobson
Subject [HACKERS] Idea on how to simplify comparing two sets
Date
Msg-id CAASwCXeXzwpLg4b1zWKdR9y4XW7Gt8Q=Di2nCRYs4ZT_gvMFeA@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Idea on how to simplify comparing two sets  (Anders Granlund <anders.granlund@trustly.com>)
Re: [HACKERS] Idea on how to simplify comparing two sets  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] Idea on how to simplify comparing two sets  (David Fetter <david@fetter.org>)
Re: [HACKERS] Idea on how to simplify comparing two sets  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: [HACKERS] Idea on how to simplify comparing two sets  (Pantelis Theodosiou <ypercube@gmail.com>)
List pgsql-hackers
Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT   GREATEST(       (SELECT COUNT(*) FROM T1),       (SELECT COUNT(*) FROM T2)   )   =   (SELECT COUNT(*) FROM (
   SELECT * FROM T1       INTERSECT ALL       SELECT * FROM T2   ) AS X)
 
INTO _Identical;

or,

SELECT 'Missmatch!' WHERE EXISTS (   SELECT * FROM Foo   FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
               Foo IS NOT DISTINCT FROM Bar)   WHERE TRUE   AND ( Foo.FooID BETWEEN 1 AND 10000 AND         Bar.BarID
BETWEEN1 AND 10000    )   AND ( Foo.FooID IS NULL OR         Bar.BarID IS NULL);
 

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

So here is an idea that doesn't break backwards compatibility:

Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.

SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR:  syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;

The example above could be written as:

_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);

Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.

Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.

Thoughts?

/Joel



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Next
From: Daniele Varrazzo
Date:
Subject: Re: [HACKERS] 'text' instead of 'unknown' in Postgres 10