Re: [HACKERS] New CORRESPONDING clause design - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] New CORRESPONDING clause design
Date
Msg-id 3424.1489855840@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] New CORRESPONDING clause design  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: [HACKERS] New CORRESPONDING clause design  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: [HACKERS] New CORRESPONDING clause design  (Surafel Temesgen <surafel3000@gmail.com>)
Re: New CORRESPONDING clause design  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have not any objection - I'll mark this patch as ready for commiter

I took a quick look through this and noted that it fails to touch
ruleutils.c, which means that dumping of views containing CORRESPONDING
certainly doesn't work.

Also, the changes in parser/analyze.c seem rather massive and
correspondingly hard to review.  Is it possible to rearrange the
patch to reduce the size of that diff?  If you can avoid moving
or reindenting existing code, that'd help.

The code in that area seems rather confused, too.  For instance,
I'm not sure exactly what orderCorrespondingList() is good for,
but it certainly doesn't look to be doing anything that either its
name or its header comment (or the comments at the call sites) would
suggest.  Its input and output tlists are always in the same order.

I'm a little disturbed by the fact that determineMatchingColumns()
is called twice, and more disturbed by the fact that it looks to be
O(N^2).  This could be really slow with a lot of columns, couldn't it?

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says
           a) If CORRESPONDING is specified, then:             i) Within the columns of T1, equivalent <column name>s
shall               not be specified more than once and within the columns of                T2, equivalent <column
name>sshall not be specified more                than once.
 

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".

I don't think you want to be leaving behind debugging cruft like this:
+            elog(DEBUG4, "%s", ltle->resname);

I'm not impressed by using A_Const for the members of the CORRESPONDING
name list.  That's not a clever solution, that's a confusing kluge,
because it's a complete violation of the meaning of A_Const.  Elsewhere
we just use lists of String for name lists, and that seems sufficient
here.  Personally I'd just use the existing columnList production rather
than rolling your own.

The comments in parsenodes.h about the new fields seem rather confused,
in fact I think they're probably backwards.

Also, I thought the test cases were excessively uninspired and repetitive.
This, for example, seems like about two tests too many:

+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a) SELECT 4 a, 5 b, 6 c;
+ a 
+---
+ 1
+ 4
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 a, 5 b, 6 c;
+ b 
+---
+ 2
+ 5
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c) SELECT 4 a, 5 b, 6 c;
+ c 
+---
+ 3
+ 6
+(2 rows)

without even considering the fact that these are pretty duplicative of
tests around them.  And some of the added tests seem to be just testing
basic UNION functionality, which we already have tests for.  I fail to
see the point of adding any of these:

+SELECT 1 AS two UNION CORRESPONDING SELECT 2 two;
+ two 
+-----
+   1
+   2
+(2 rows)
+
+SELECT 1 AS one UNION CORRESPONDING SELECT 1 one;
+ one 
+-----
+   1
+(1 row)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 2 two;
+ two 
+-----
+   1
+   2
+(2 rows)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 1 two;
+ two 
+-----
+   1
+   1
+(2 rows)

What I think actually is needed is some targeted testing showing
non-interference with nearby features.  As an example, CORRESPONDING
can't safely be implemented by just replacing the tlists of the
input sub-selects with shortened versions, because that would break
situations such as DISTINCT in the sub-selects.  I think it'd be a
good idea to have a test along the lines of
SELECT DISTINCT x, y FROM ...UNION ALL CORRESPONDINGSELECT DISTINCT x, z FROM ...

with values chosen to show that the DISTINCT operators did operate
on x/y and x/z, not just x alone.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] createlang/droplang deprecated
Next
From: Pavel Stehule
Date:
Subject: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text,binary files