Thread: [HACKERS] Idea on how to simplify comparing two sets
Hi hackers, Currently there is no simple way to check if two sets are equal. Looks like no RDBMS in the world has a simple command for it. You have to do something like: WITH T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000), T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000) SELECT GREATEST( (SELECT COUNT(*) FROM T1), (SELECT COUNT(*) FROM T2) ) = (SELECT COUNT(*) FROM ( SELECT * FROM T1 INTERSECT ALL SELECT * FROM T2 ) AS X) INTO _Identical; or, SELECT 'Missmatch!' WHERE EXISTS ( SELECT * FROM Foo FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND Foo IS NOT DISTINCT FROM Bar) WHERE TRUE AND ( Foo.FooID BETWEEN 1 AND 10000 AND Bar.BarID BETWEEN1 AND 10000 ) AND ( Foo.FooID IS NULL OR Bar.BarID IS NULL); Introducing new SQL keywords is of course not an option, since it would possibly break backwards compatibility. So here is an idea that doesn't break backwards compatibility: Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT, that is currently a syntax error when used between two sets. SELECT 1 IS DISTINCT FROM SELECT 1; ERROR: syntax error at or near "SELECT" LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1; The example above could be written as: _Identical := ( SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000 IS NOT DISTINCT FROM SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000 ); Which would set _Identical to TRUE if the two sets are equal, and FALSE otherwise. Since it's currently a syntax error, there is no risk for changed behaviour for any existing executable queries. Thoughts? /Joel
What about this ambiguity?
WHERE FALSE
IS NOT DISTINCT FROM
(SELECT TRUE)
IS NOT DISTINCT FROM
(SELECT TRUE)
On Tue, Feb 7, 2017 at 4:13 PM, Joel Jacobson <joel@trustly.com> wrote:
Hi hackers,
Currently there is no simple way to check if two sets are equal.
Looks like no RDBMS in the world has a simple command for it.
You have to do something like:
WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT
GREATEST(
(SELECT COUNT(*) FROM T1),
(SELECT COUNT(*) FROM T2)
)
=
(SELECT COUNT(*) FROM (
SELECT * FROM T1
INTERSECT ALL
SELECT * FROM T2
) AS X)
INTO _Identical;
or,
SELECT 'Missmatch!' WHERE EXISTS (
SELECT * FROM Foo
FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
Foo IS NOT DISTINCT FROM Bar)
WHERE TRUE
AND ( Foo.FooID BETWEEN 1 AND 10000 AND
Bar.BarID BETWEEN 1 AND 10000 )
AND ( Foo.FooID IS NULL OR
Bar.BarID IS NULL);
Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.
So here is an idea that doesn't break backwards compatibility:
Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.
SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR: syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;
The example above could be written as:
_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);
Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.
Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.
Thoughts?
/Joel
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
But that's already a valid statement, so there is no ambiguity: SELECT TRUE WHERE FALSE IS NOT DISTINCT FROM (SELECT TRUE);bool ------ (0 rows) If you want to compare the set (SELECT TRUE WHERE FALSE) with the set (SELECT TRUE) then just add parenthesis: (SELECT TRUE WHERE FALSE) IS NOT DISTINCT FROM (SELECT TRUE); ERROR: syntax error at or near "IS" LINE 2: IS NOT DISTINCT FROM ^ Which is currently invalid syntax. On Tue, Feb 7, 2017 at 4:40 PM, Anders Granlund <anders.granlund@trustly.com> wrote: > What about this ambiguity? > > SELECT TRUE > WHERE FALSE > IS NOT DISTINCT FROM > (SELECT TRUE) > > On Tue, Feb 7, 2017 at 4:13 PM, Joel Jacobson <joel@trustly.com> wrote: >> >> Hi hackers, >> >> Currently there is no simple way to check if two sets are equal. >> >> Looks like no RDBMS in the world has a simple command for it. >> >> You have to do something like: >> >> WITH >> T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000), >> T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000) >> SELECT >> GREATEST( >> (SELECT COUNT(*) FROM T1), >> (SELECT COUNT(*) FROM T2) >> ) >> = >> (SELECT COUNT(*) FROM ( >> SELECT * FROM T1 >> INTERSECT ALL >> SELECT * FROM T2 >> ) AS X) >> INTO _Identical; >> >> or, >> >> SELECT 'Missmatch!' WHERE EXISTS ( >> SELECT * FROM Foo >> FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND >> Foo IS NOT DISTINCT FROM Bar) >> WHERE TRUE >> AND ( Foo.FooID BETWEEN 1 AND 10000 AND >> Bar.BarID BETWEEN 1 AND 10000 ) >> AND ( Foo.FooID IS NULL OR >> Bar.BarID IS NULL); >> >> Introducing new SQL keywords is of course not an option, >> since it would possibly break backwards compatibility. >> >> So here is an idea that doesn't break backwards compatibility: >> >> Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT, >> that is currently a syntax error when used between two sets. >> >> SELECT 1 IS DISTINCT FROM SELECT 1; >> ERROR: syntax error at or near "SELECT" >> LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1; >> >> The example above could be written as: >> >> _Identical := ( >> SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000 >> IS NOT DISTINCT FROM >> SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000 >> ); >> >> Which would set _Identical to TRUE if the two sets are equal, >> and FALSE otherwise. >> >> Since it's currently a syntax error, there is no risk for changed >> behaviour for any existing executable queries. >> >> Thoughts? >> >> /Joel >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers > > -- Joel Jacobson Mobile: +46703603801 Trustly.com | Newsroom | LinkedIn | Twitter
Joel Jacobson <joel@trustly.com> writes: > Currently there is no simple way to check if two sets are equal. Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2 and SELECT set2 EXCEPT SELECT set1 are both empty? regards, tom lane
On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joel Jacobson <joel@trustly.com> writes: >> Currently there is no simple way to check if two sets are equal. > > Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2 > and SELECT set2 EXCEPT SELECT set1 are both empty? Yes, that's one way, but it's ugly as you have to repeat yourself and write both sets two times. Not an issue for small queries, but if you have two big queries stored in a .sql file, you would have to modify both places for each query and always make sure they are identical.
On Tue, Feb 07, 2017 at 04:13:40PM +0100, Joel Jacobson wrote: > Hi hackers, > > Currently there is no simple way to check if two sets are equal. Assuming that a and b each has at least one NOT NULL column, is this simple enough? Based on nothing much, I'm assuming here that the IS NOT NULL test is faster than IS NULL, but you can flip that and change the array to {0} with identical effect. WITH t AS ( SELECT a AS a, b AS b, (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind FROM a FULL JOIN b ON ... ) SELECT array_agg(DISTINCT ind) = '{2}' FROM t; Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Feb 07, 2017 at 09:10:17AM -0800, David Fetter wrote: > On Tue, Feb 07, 2017 at 04:13:40PM +0100, Joel Jacobson wrote: > > Hi hackers, > > > > Currently there is no simple way to check if two sets are equal. > > Assuming that a and b each has at least one NOT NULL column, is this > simple enough? Based on nothing much, I'm assuming here that the IS > NOT NULL test is faster than IS NULL, but you can flip that and change > the array to {0} with identical effect. > > WITH t AS ( > SELECT a AS a, b AS b, (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind > FROM a FULL JOIN b ON ... > ) > SELECT array_agg(DISTINCT ind) = '{2}' > FROM t; You don't actually need a and b in the inner target list. WITH t AS ( SELECT (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind FROM a FULL JOIN b ON ... ) SELECT array_agg(DISTINCT ind) = '{2}' FROM t; This could be shortened further to the following if we ever implement DISTINCT for window functions, which might involve implementing DISTINCT via hashing more generally, which means hashable types...whee! SELECT array_agg(DISTINCT (a IS NOT NULL)::int + (b IS NOT NULL)::int) OVER () = '{2}' FROM a FULL JOIN b ON ... Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Feb 7, 2017 at 6:28 PM, David Fetter <david@fetter.org> wrote: > This could be shortened further to the following if we ever implement > DISTINCT for window functions, which might involve implementing > DISTINCT via hashing more generally, which means hashable > types...whee! > > SELECT array_agg(DISTINCT (a IS NOT NULL)::int + (b IS NOT NULL)::int) OVER () = '{2}' > FROM a FULL JOIN b ON ... That's still a lot more syntax than just adding "IS NOT DISTINCT FROM" in between the sets. I just thought the general approach at looking for ways to express new things in SQL, without introducing new keywords, but instead rely on existing keywords but that currently are syntax errors when used in some semantic way, is an interesting approach to allow extending the SQL syntax without breaking backwards compatibility. Are there any historical examples of when this approach has been used to make progress in PostgreSQL?
Joel Jacobson <joel@trustly.com> writes:
> Currently there is no simple way to check if two sets are equal.
Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?
SELECT set1 FULL EXCEPT SELECT set2 ?
Matches with the concept and syntax of "FULL JOIN".
or
SELECT set1 XOR SELECT set2
That said I'm not sure how much we want to go down this road on our own. It'd be nice to have when its needed but its not something that gets much visibility on these lists to suggest a large pent-up demand.
IS NOT DISTINCT FROM doesn't imply bi-direction any better than EXCEPT does ... if we are going to add new syntax I'd say it should at least do that.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Tue, Feb 7, 2017 at 8:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Joel Jacobson <joel@trustly.com> writes: >>> Currently there is no simple way to check if two sets are equal. >> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2 >> and SELECT set2 EXCEPT SELECT set1 are both empty? > SELECT set1 FULL EXCEPT SELECT set2 ? > Matches with the concept and syntax of "FULL JOIN". Actually ... now that you mention full join, I believe this works: select * from (select ...) s1 full join (select ...) s2 on ((s1.*)=(s2.*)) where s1.* is distinct from s2.*; > That said I'm not sure how much we want to go down this road on our own. > It'd be nice to have when its needed but its not something that gets much > visibility on these lists to suggest a large pent-up demand. Yeah, if this isn't in the standard and not in other databases either, that would seem to suggest that it's not a big requirement. regards, tom lane
> Currently there is no simple way to check if two sets are equal. My 0.02€: Assuming that you mean set = relation, and that there is a key (which should be the case for a set otherwise tuples cannot be distinguished, so this is not really a set), and assuming not null other data, then: CREATE TABLE TAB1(k INT PRIMARY KEY, data TEXT NOT NULL); INSERT INTO TAB1 VALUES (1, 'one'), (2, 'two'), (3, 'three'); CREATE TABLE TAB2(k INT PRIMARY KEY, data TEXT NOT NULL); INSERT INTO TAB2 VALUES (1, 'one'), (2, 'deux'), (4, 'four'); The TAB1 to TAB2 difference is computed with: SELECT CASE WHEN t1.k IS NULL THEN 'INSERT' WHEN t2.k IS NULL THEN 'DELETE' ELSE 'UPDATE' END AS operation, COALESCE(t1.k, t2.k) AS key FROM TAB1 AS t1 FULL JOIN TAB2 AS t2 USING (k) WHERE t1.data IS NULL OR t2.data IS NULL OR t1.data <> t2.data; Results in: UPDATE | 2 DELETE | 3 INSERT | 4 If there is no differences, then sets are equals... If there is no associated data, a simpler condition: WHERE t1.k IS NULL OR t2.k IS NULL; -- Fabien.
1On Tue, Feb 7, 2017 at 9:46 PM, Joel Jacobson <joel@trustly.com> wrote: > On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Joel Jacobson <joel@trustly.com> writes: >>> Currently there is no simple way to check if two sets are equal. >> >> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2 >> and SELECT set2 EXCEPT SELECT set1 are both empty? > > Yes, that's one way, but it's ugly as you have to repeat yourself and > write both sets two times. > Not an issue for small queries, but if you have two big queries stored > in a .sql file, > you would have to modify both places for each query and always make > sure they are identical. A CTE might help: WITH left AS (something complex), right AS (something complex) SELECT COUNT(*) = 0 AS good FROM ( SELECT * FROM left EXCEPT SELECT * FROM right UNION ALL SELECT * FROM right EXCEPT SELECT * FROM left ) q; This isn't the most efficient solution, but is easily abstracted into dynamic SQL (meaning, you could pass both queries as arguments to a checker function). Another, similar approach is to abstract the query behind a view which ISTM is a practice you are underutilizing based on your comments :-). If I were in a hurry and the dataset was enormous I would probably dump both queries identically ordered to a .csv, and do: diff left.csv right.csv | head -1 in bash or something like that. Not sure if the utility of a bidirectional EXCEPT is enough to justify adding custom syntax for that approach. I use the 'double EXCEPT' tactic fairly often and understand the need, but the bar for non-standard syntax is pretty high (and has been getting higher over the years, I think). merlin
On Tue, Feb 7, 2017 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Joel Jacobson <joel@trustly.com> writes:
> Currently there is no simple way to check if two sets are equal.
Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?
regards, tom lane
Yes, if the wanted result is true or false, something like this:
SELECT EXISTS (TABLE a EXCEPT TABLE b) OR EXISTS (TABLE b EXCEPT TABLE a) ;
And if a new operator was added (in the same category as UNION and EXCEPT), it could be:SELECT EXISTS (TABLE a XORSET TABLE b) ;
What about using the = and <> operators in sets? Is the follow
ing allowed in the standard?
SELECT (TABLE a) <> (TABLE b) ;
On Tue, Feb 7, 2017 at 3:13 PM, Joel Jacobson <joel@trustly.com> wrote:
Hi hackers,
Currently there is no simple way to check if two sets are equal.
Looks like no RDBMS in the world has a simple command for it.
You have to do something like:
...
Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.
I'm not advocating it but I don't see how introducing new SQL keywords breaks backwards compatibility.
Pantelis Theodosiou
On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote: > I'm not advocating it but I don't see how introducing new SQL keywords > breaks backwards compatibility. It does at least a little bit. This starts failing: select 1 new_keyword form blah; (now you have to insert AS or quote the keyword) If the new keyword is partially or fully reserved, then anybody who is using that column in a now-impermissible way has to change names of conflicting tables, columns, functions, etc. But of course we do add keywords in every release, where it's warranted by the value of the new feature. I think the problem for this proposed feature is not that adding new keywords is a completely insane thing to do but that (1) there are lots of other good ways to do more or less what is being requested with this new syntax, so it's not clear that we need a new one and (2) there are cases where it might be ambiguous which meaning of IS NOT DISTINCT FROM is met. Joel's answer to #2 is to just prefer the existing meaning wherever it's possible to assign that meaning and use the new meaning only in cases where the existing meaning is impossible, but (2a) if you tried to code it up, you'd probably find that it's quite difficult to make bison generate a grammar that works that way, because bison isn't designed around the idea of giving things a meaning only if they don't already have one and (2b) apparently ambiguities can be confusing to users even if they've been eliminated in the formal grammar. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote: >> I'm not advocating it but I don't see how introducing new SQL keywords >> breaks backwards compatibility. > It does at least a little bit. Yes. I think a new set-operation keyword would inevitably have to be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which means that you'd break every application that has used that word as a table, column, or function name. Generally speaking, we try very darn hard not to introduce new reserved words that are not called out as reserved in the SQL standard. (And even for those, we've sometimes made the grammar jump through hoops so as not to reserve a word that we didn't reserve previously.) regards, tom lane
On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote: > >> I'm not advocating it but I don't see how introducing new SQL keywords > >> breaks backwards compatibility. > > > It does at least a little bit. > > Yes. I think a new set-operation keyword would inevitably have to > be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which > means that you'd break every application that has used that word as > a table, column, or function name. I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only elementary set operation not included in join types, but nobody at the SQL standards committee seems to have cared enough to help. > Generally speaking, we try very darn hard not to introduce new > reserved words that are not called out as reserved in the SQL > standard. (And even for those, we've sometimes made the grammar > jump through hoops so as not to reserve a word that we didn't > reserve previously.) We just never know what new keywords the standards committee will dream up, or what silliness they'll introduce in the grammar :/ Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote: >> Yes. I think a new set-operation keyword would inevitably have to >> be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which >> means that you'd break every application that has used that word as >> a table, column, or function name. > I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only > elementary set operation not included in join types, but nobody at the > SQL standards committee seems to have cared enough to help. I wonder whether you could shoehorn it in with no new reserved word by spelling it "EXCEPT SYMMETRIC", which could be justified by the precedent of BETWEEN SYMMETRIC. But not sure what to do with duplicate rows (ie, if LHS has two occurrences of row X and RHS has one occurrence, do you output X?) regards, tom lane
David Fetter <david@fetter.org> writes:
> On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:
>> Yes. I think a new set-operation keyword would inevitably have to
>> be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
>> means that you'd break every application that has used that word as
>> a table, column, or function name.
> I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
> elementary set operation not included in join types, but nobody at the
> SQL standards committee seems to have cared enough to help.
I wonder whether you could shoehorn it in with no new reserved word
by spelling it "EXCEPT SYMMETRIC", which could be justified by the
precedent of BETWEEN SYMMETRIC. But not sure what to do with
duplicate rows (ie, if LHS has two occurrences of row X and RHS
has one occurrence, do you output X?)
Without SYMMETRIC its defined to return:
max(m-n,0)
with SYMMETRIC I'd think that would just change to:
abs(m-n)
Then you still have to apply ALL or DISTINCT on the above result.
David J.
On 2/7/17 12:03 PM, Tom Lane wrote: >> That said I'm not sure how much we want to go down this road on our own. >> It'd be nice to have when its needed but its not something that gets much >> visibility on these lists to suggest a large pent-up demand. > Yeah, if this isn't in the standard and not in other databases either, > that would seem to suggest that it's not a big requirement. FWIW I've found myself needing the precursor to this this (give me the full diff) at least a couple times, and it's quite a PITA on anything but a trivial relation. It's also not possible to make this easier via an SRF because you don't know in advance what the result set looks like. So the best I've ever come up with is a file that can be included in psql that depends on having set two psql variables to the names of relations that can be queried (and if you need more than a relation you need to create a temp view). I've wondered about the possibility of allowing PLs the ability to dynamically define their return type based on their arguments. That would allow for an SRF to handle this case, and would be significantly more flexible than trying to do that using pseudotypes. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Tue, Feb 07, 2017 at 10:58:41AM -0500, Tom Lane wrote: > Joel Jacobson <joel@trustly.com> writes: > > Currently there is no simple way to check if two sets are equal. > > Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2 > and SELECT set2 EXCEPT SELECT set1 are both empty? Even better, NATURAL(*) FULL OUTER JOIN the two table sources and check that the result is empty. If the two sources have useful indices (or if PG constructs suitable automatic indices for them) for this then the query should be O(N). (*) However, if you do this then there'd better not be any NULLs in columns, otherwise you'll get false positives for differences. Of course, if the two table sources have common primary key prefixes and you only care about equality in those columns, then just FULL OUTER JOIN USING (<primary key prefix>). Nico --
FWIW I've found myself needing the precursor to this this (give me the full diff) at least a couple times, and it's quite a PITA on anything but a trivial relation.
It's also not possible to make this easier via an SRF because you don't know in advance what the result set looks like. So the best I've ever come up with is a file that can be included in psql that depends on having set two psql variables to the names of relations that can be queried (and if you need more than a relation you need to create a temp view).
I've wondered about the possibility of allowing PLs the ability to dynamically define their return type based on their arguments. That would allow for an SRF to handle this case, and would be significantly more flexible than trying to do that using pseudotypes.
My experiences are similar. At the moment, I'm resigned to using relying on pgtap:
-- set environment/path to point to "control"
create temporary table test_1_wants as <query of table or function as it was>;-- set environment/path to point to "experiment"create temporary table test_1_has as <query of thing we're trying to test now>;select results_eq( 'table test_1_has', 'table test_1_wants', 'test 1');
I've had to do it with temp tables any time the environment is different between control/experiment, which is the case when you're developing a drop-in replacement for an SRF or view or whatever.
On Tue, Feb 07, 2017 at 01:03:14PM -0500, Tom Lane wrote: > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > Actually ... now that you mention full join, I believe this works: > > select * from (select ...) s1 full join (select ...) s2 > on ((s1.*)=(s2.*)) where s1.* is distinct from s2.*; You can drop the .*s: select * from (select ...) s1 full join (select ...) s2 on s1 = s2 where s1 is distinct from s2; And even: select s1, s2 from (select ...) s1 natural full outer join (select ...) s2; This makes it possible to write very generic (schema-wise) code for comparing table sources. As I've mentioned elsewhere, there is an issue with NULLs in columns... I really, really would like either a full equijoin where equality treats NULL = NULL -> true for this purpose, or a natural join where only primary key or not-nullable columns are used, or a USING clause form where I can specify such behavior without having to list all the columns that should be used. I use NATURAL FULL OUTER JOIN for computing materialized view diffs in my alternate view materialization system. NULLs are poison for this purpose, yielding false positive differences. But my code also uses the table row value form above in order to avoid having to generate column lists for a USING clause or expressions for ON. These requests are not for syntactic sugar, not really. But I realize they may be non-trivial -- I may be looking for unobtanium. > > That said I'm not sure how much we want to go down this road on our own. > > It'd be nice to have when its needed but its not something that gets much > > visibility on these lists to suggest a large pent-up demand. > > Yeah, if this isn't in the standard and not in other databases either, > that would seem to suggest that it's not a big requirement. SQLite3 famously lacks FULL joins. It kills me because the alternative constructions become O(N log M) instead of O(N) for a properly implemented FULL join (assuming suitable indices anyways). I wouldn't suggest that that's a reason not to support FULL joins in any other RDBMS, rather, I'd suggest that SQLite3 is missing an important feature. Pardon the tangent. It may not really be applicable here, as here I think OP is looking for syntactic sugar rather than an important optimization. But the point is that sometimes you have to lead the standards-setting and/or the competition. Nico --
On 2/23/17 3:33 PM, Corey Huinker wrote: > I've had to do it with temp tables any time the environment is different > between control/experiment, which is the case when you're developing a > drop-in replacement for an SRF or view or whatever. FWIW I'd recommend temp views, to give the planner the most latitude. Less load from (presumably) pointless copying too. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)