Thread: [HACKERS] New CORRESPONDING clause design

[HACKERS] New CORRESPONDING clause design

From
Surafel Temsgen
Date:
I am new here and I really want to contribute, I have read same resource that help understanding database system and postgresql. I would like to start implementing sql syntax corresponding by clause because I believe implementing sql syntax gives an opportunity to familiarize  many part of postgresql source code. Previous implementation is here and have an issue on explain query and break cases on unlabeled NULLs
To repeat what a corresponding by clause means
Corresponding clause either contains a BY(...) clause or not. If it
doesn't have a BY(...) clause the usage is as follows.
 
SELECT 1 a, 2 b, 3 c UNION CORRESPONDING SELECT 4 b, 5 d, 6 c, 7 f;
 
with output:
b c
-----
2 3
4 6
 
i.e. matching column names are filtered and are only output from the
whole set operation clause.
 
If we introduce a BY(...) clause, then column names are further
intersected with that BY clause:
 
SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 b, 5 d, 6 c, 7 f;
 
with output:
 
b
--
2
4
 My design is
 In parsing stage
1. Check at least one common column name appear in queries  
2. If corresponding column list is not specified, then make corresponding list from common column name in queries target lists in the order 
    that those column names appear in first query
3. If corresponding column list is specified, then check that every column name 
    in The corresponding column list appears in column names of both queries
    In planner
1. Take projection columen name from corresponding list
2. Reorder left and right queries target lists according to corresponding list
3. For each query, project columens as many as number of corresponding columen.
Continue with normal set operation process

Re: [HACKERS] New CORRESPONDING clause design

From
Merlin Moncure
Date:
On Tue, Jan 17, 2017 at 12:37 AM, Surafel Temsgen <surafel3000@gmail.com> wrote:
> I am new here and I really want to contribute, I have read same resource
> that help understanding database system and postgresql. I would like to
> start implementing sql syntax corresponding by clause because I believe
> implementing sql syntax gives an opportunity to familiarize  many part of
> postgresql source code. Previous implementation is here and have an issue on
> explain query and break cases on unlabeled NULLs
> To repeat what a corresponding by clause means
> Corresponding clause either contains a BY(...) clause or not. If it
> doesn't have a BY(...) clause the usage is as follows.

This is great stuff. Does the syntax only apply to UNION?  I would
imagine it would also apply to INTERSECT/EXCEPT?  What about UNION
ALL?

merlin



Re: [HACKERS] New CORRESPONDING clause design

From
David Fetter
Date:
On Tue, Jan 17, 2017 at 08:20:25AM -0600, Merlin Moncure wrote:
> On Tue, Jan 17, 2017 at 12:37 AM, Surafel Temsgen <surafel3000@gmail.com> wrote:
> > I am new here and I really want to contribute, I have read same resource
> > that help understanding database system and postgresql. I would like to
> > start implementing sql syntax corresponding by clause because I believe
> > implementing sql syntax gives an opportunity to familiarize  many part of
> > postgresql source code. Previous implementation is here and have an issue on
> > explain query and break cases on unlabeled NULLs
> > To repeat what a corresponding by clause means
> > Corresponding clause either contains a BY(...) clause or not. If it
> > doesn't have a BY(...) clause the usage is as follows.
> 
> This is great stuff. Does the syntax only apply to UNION?  I would
> imagine it would also apply to INTERSECT/EXCEPT?  What about UNION
> ALL?

My draft working standard from 2011 says in 7IWD-02-Foundation section 7.13 <query expression>:

a) If CORRESPONDING is specified, then:
i) Within the columns of T1, equivalent <column name>s shall not be specified more than once
and within the columns of T2, equivalent <column name>s shall not be specified more than
once.
ii) At least one column of T1 shall have a <column name> that is the <column name> of some
column of T2.
iii) Case:
1) If <corresponding column list> is not specified, then let SL be a <select list> of those
<column name>s that are <column name>s of both T1 and T2 in the order that those
<column name>s appear in T1.
2) If <corresponding column list> is specified, then let SL be a <select list> of those <column
name>s explicitly appearing in the <corresponding column list> in the order that these
<column name>s appear in the <corresponding column list>. Every <column name> in
the <corresponding column list> shall be a <column name> of both T1 and T2.
iv) The <query term> or <query expression body> is equivalent to:
( SELECT SL FROM TN1 ) OP ( SELECT SL FROM TN2 )

Earlier, it defines ( UNION | EXCEPT ) [ ALL | DISTINCT ] to have a
meaning in this context, to wit:

<query expression body> ::=
<query term>
| <query expression body> UNION [ ALL | DISTINCT ]
[ <corresponding spec> ] <query term>
| <query expression body> EXCEPT [ ALL | DISTINCT ]
[ <corresponding spec> ] <query term>

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] New CORRESPONDING clause design

From
Tom Lane
Date:
Surafel Temsgen <surafel3000@gmail.com> writes:
>  My design is
>  *In parsing stage*
> 1. Check at least one common column name appear in queries
> 2. If corresponding column list is not specified, then make corresponding
> list from common column name in queries target lists in the order
>     that those column names appear in first query
> 3. If corresponding column list is specified, then check that every column
> name
>     in The corresponding column list appears in column names of both queries
>     *In planner*
> 1. Take projection columen name from corresponding list
> 2. Reorder left and right queries target lists according to corresponding
> list
> 3. For each query, project columens as many as number of corresponding
> columen.

FWIW, I think you need to do most of that work in the parser, not the
planner.  The parser certainly has to determine the actual output column
names and types of the setop construct, and we usually consider that the
parsing stage is expected to fully determine the semantics of the query.
So I'd envision the parser as emitting some explicit representation of
the column mapping for each side, rather than expecting the planner to
determine for itself what maps to what.

It probably does make sense for the actual implementation to involve
inserting projection nodes below the setop.  I'm pretty sure the executor
code for setops expects to just pass input tuples through without
projection.  You could imagine changing that, but it would add a penalty
for non-CORRESPONDING setops, which IMO would be the wrong tradeoff.
        regards, tom lane



Re: [HACKERS] New CORRESPONDING clause design

From
Surafel Temsgen
Date:
Here is the implementation of the clause with the slight change, instead of doing column mapping for each side of leaf Queries in planner I make the projection nodes output to corresponding column lists only.

This patch compiles and tests successfully with master branch on ubuntu-15.10-desktop-amd64.It also includes documentation and new regression tests in union.sql.

Regards

Surafel Temesgen

 


On Tue, Jan 17, 2017 at 8:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Surafel Temsgen <surafel3000@gmail.com> writes:
>  My design is
>  *In parsing stage*
> 1. Check at least one common column name appear in queries
> 2. If corresponding column list is not specified, then make corresponding
> list from common column name in queries target lists in the order
>     that those column names appear in first query
> 3. If corresponding column list is specified, then check that every column
> name
>     in The corresponding column list appears in column names of both queries
>     *In planner*
> 1. Take projection columen name from corresponding list
> 2. Reorder left and right queries target lists according to corresponding
> list
> 3. For each query, project columens as many as number of corresponding
> columen.

FWIW, I think you need to do most of that work in the parser, not the
planner.  The parser certainly has to determine the actual output column
names and types of the setop construct, and we usually consider that the
parsing stage is expected to fully determine the semantics of the query.
So I'd envision the parser as emitting some explicit representation of
the column mapping for each side, rather than expecting the planner to
determine for itself what maps to what.

It probably does make sense for the actual implementation to involve
inserting projection nodes below the setop.  I'm pretty sure the executor
code for setops expects to just pass input tuples through without
projection.  You could imagine changing that, but it would add a penalty
for non-CORRESPONDING setops, which IMO would be the wrong tradeoff.

                        regards, tom lane

Attachment

Re: [HACKERS] New CORRESPONDING clause design

From
Robert Haas
Date:
On Thu, Feb 16, 2017 at 8:28 PM, Surafel Temsgen <surafel3000@gmail.com> wrote:
> Here is the implementation of the clause with the slight change, instead of
> doing column mapping for each side of leaf Queries in planner I make the
> projection nodes output to corresponding column lists only.
>
> This patch compiles and tests successfully with master branch on
> ubuntu-15.10-desktop-amd64.It also includes documentation and new regression
> tests in union.sql.

You should probably add this to https://commitfest.postgresql.org/ so
that it doesn't get forgotten about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

I am sending a review of this interesting feature.

I found following issues, questions:

1. unclosed tags <optional> in documentation 
2. bad name "changeTargetEntry" - should be makeTargetEntry?
3. Why you removed lot of asserts in prepunion.c? These asserts should be valid still 
4. make_coresponding_target has wrong formatting
5. error "%s queries with a CORRESPONDING clause must have at least one column with the same name" has wrong formatting, you can show position
6. previous issue is repeated - look on formatting ereport function, please, you can use DETAIL and HINT fields
7. corresponding clause should to contain column list (I am looking to ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it has impact on all implementation.
8. typo orderCorrespondingLsit(List *targetlist)
9. I miss more tests for CORRESPONDING BY
10. if I understand to this feature, this query should to work

postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 6 c, 8 d;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...

postgres=# create table t1(a int, b int, c int);
CREATE TABLE
Time: 63,260 ms
postgres=# create table t2(a int, b int);
CREATE TABLE
Time: 57,120 ms
postgres=# select * from t1 union corresponding select * from t2;
ERROR:  each UNION query must have the same number of columns
LINE 1: select * from t1 union corresponding select * from t2;
If it is your first patch to Postgres, then it is perfect work!

The @7 is probably most significant - I dislike a expression list there. name_list should be better there.

Regards

Pavel

Re: [HACKERS] New CORRESPONDING clause design

From
Surafel Temesgen
Date:

Hi ,

Here is a patch corrected as your feedback except missed tests case because corresponding by clause is implemented on the top of set operation and you can’t do that to set operation without corresponding by clause too

 

Eg


postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

ERROR:  each UNION query must have the same number of columns

LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

                                          ^

postgres=# create table t1(a int, b int, c int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE

 

postgres=# select * from t1 union select * from t2;

ERROR:  each UNION query must have the same number of columns

LINE 1: select * from t1 union select * from t2;

 

Thanks

Surafel 


On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

I am sending a review of this interesting feature.

I found following issues, questions:

1. unclosed tags <optional> in documentation 
2. bad name "changeTargetEntry" - should be makeTargetEntry?
3. Why you removed lot of asserts in prepunion.c? These asserts should be valid still 
4. make_coresponding_target has wrong formatting
5. error "%s queries with a CORRESPONDING clause must have at least one column with the same name" has wrong formatting, you can show position
6. previous issue is repeated - look on formatting ereport function, please, you can use DETAIL and HINT fields
7. corresponding clause should to contain column list (I am looking to ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it has impact on all implementation.
8. typo orderCorrespondingLsit(List *targetlist)
9. I miss more tests for CORRESPONDING BY
10. if I understand to this feature, this query should to work

postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 6 c, 8 d;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...

postgres=# create table t1(a int, b int, c int);
CREATE TABLE
Time: 63,260 ms
postgres=# create table t2(a int, b int);
CREATE TABLE
Time: 57,120 ms
postgres=# select * from t1 union corresponding select * from t2;
ERROR:  each UNION query must have the same number of columns
LINE 1: select * from t1 union corresponding select * from t2;
If it is your first patch to Postgres, then it is perfect work!

The @7 is probably most significant - I dislike a expression list there. name_list should be better there.

Regards

Pavel


Attachment

Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-09 13:18 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Hi ,

Here is a patch corrected as your feedback except missed tests case because corresponding by clause is implemented on the top of set operation and you can’t do that to set operation without corresponding by clause too


I don't understand. 

The following statement should to work

postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30 as b, 40 as c;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...                            
Corresponding clause should to work like projection filter. 
Regards
Pavel

 

Eg


postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

ERROR:  each UNION query must have the same number of columns

LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

                                          ^

postgres=# create table t1(a int, b int, c int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE

 

postgres=# select * from t1 union select * from t2;

ERROR:  each UNION query must have the same number of columns

LINE 1: select * from t1 union select * from t2;

 

Thanks

Surafel 


On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

I am sending a review of this interesting feature.

I found following issues, questions:

1. unclosed tags <optional> in documentation 
2. bad name "changeTargetEntry" - should be makeTargetEntry?
3. Why you removed lot of asserts in prepunion.c? These asserts should be valid still 
4. make_coresponding_target has wrong formatting
5. error "%s queries with a CORRESPONDING clause must have at least one column with the same name" has wrong formatting, you can show position
6. previous issue is repeated - look on formatting ereport function, please, you can use DETAIL and HINT fields
7. corresponding clause should to contain column list (I am looking to ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it has impact on all implementation.
8. typo orderCorrespondingLsit(List *targetlist)
9. I miss more tests for CORRESPONDING BY
10. if I understand to this feature, this query should to work

postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 6 c, 8 d;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...

postgres=# create table t1(a int, b int, c int);
CREATE TABLE
Time: 63,260 ms
postgres=# create table t2(a int, b int);
CREATE TABLE
Time: 57,120 ms
postgres=# select * from t1 union corresponding select * from t2;
ERROR:  each UNION query must have the same number of columns
LINE 1: select * from t1 union corresponding select * from t2;
If it is your first patch to Postgres, then it is perfect work!

The @7 is probably most significant - I dislike a expression list there. name_list should be better there.

Regards

Pavel



Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:
hi

2017-03-09 17:19 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-09 13:18 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Hi ,

Here is a patch corrected as your feedback except missed tests case because corresponding by clause is implemented on the top of set operation and you can’t do that to set operation without corresponding by clause too


I don't understand. 

The following statement should to work

postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30 as b, 40 as c;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...                            
Corresponding clause should to work like projection filter. 

I found a link to postgresql mailing list related to some older try to this feature implementation

Regards
Pavel

 

Eg


postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

ERROR:  each UNION query must have the same number of columns

LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

                                          ^

postgres=# create table t1(a int, b int, c int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE

 

postgres=# select * from t1 union select * from t2;

ERROR:  each UNION query must have the same number of columns

LINE 1: select * from t1 union select * from t2;

 

Thanks

Surafel 


On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

I am sending a review of this interesting feature.

I found following issues, questions:

1. unclosed tags <optional> in documentation 
2. bad name "changeTargetEntry" - should be makeTargetEntry?
3. Why you removed lot of asserts in prepunion.c? These asserts should be valid still 
4. make_coresponding_target has wrong formatting
5. error "%s queries with a CORRESPONDING clause must have at least one column with the same name" has wrong formatting, you can show position
6. previous issue is repeated - look on formatting ereport function, please, you can use DETAIL and HINT fields
7. corresponding clause should to contain column list (I am looking to ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it has impact on all implementation.
8. typo orderCorrespondingLsit(List *targetlist)
9. I miss more tests for CORRESPONDING BY
10. if I understand to this feature, this query should to work

postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 6 c, 8 d;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...

postgres=# create table t1(a int, b int, c int);
CREATE TABLE
Time: 63,260 ms
postgres=# create table t2(a int, b int);
CREATE TABLE
Time: 57,120 ms
postgres=# select * from t1 union corresponding select * from t2;
ERROR:  each UNION query must have the same number of columns
LINE 1: select * from t1 union corresponding select * from t2;
If it is your first patch to Postgres, then it is perfect work!

The @7 is probably most significant - I dislike a expression list there. name_list should be better there.

Regards

Pavel




Re: [HACKERS] New CORRESPONDING clause design

From
Surafel Temesgen
Date:

Yes, you are correct it should to work on CORRESPONDING clause case. SQL 20nn standard draft only said each query to be of the same degree in a case of set operation without corresponding clause. The attached patch is corrected as such .I add those new test case to regression test too


Regards


Surafel  


On Thu, Mar 9, 2017 at 9:49 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
hi

2017-03-09 17:19 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-09 13:18 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Hi ,

Here is a patch corrected as your feedback except missed tests case because corresponding by clause is implemented on the top of set operation and you can’t do that to set operation without corresponding by clause too


I don't understand. 

The following statement should to work

postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30 as b, 40 as c;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...                            
Corresponding clause should to work like projection filter. 

I found a link to postgresql mailing list related to some older try to this feature implementation

Regards
Pavel

 

Eg


postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

ERROR:  each UNION query must have the same number of columns

LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

                                          ^

postgres=# create table t1(a int, b int, c int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE

 

postgres=# select * from t1 union select * from t2;

ERROR:  each UNION query must have the same number of columns

LINE 1: select * from t1 union select * from t2;

 

Thanks

Surafel 


On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

I am sending a review of this interesting feature.

I found following issues, questions:

1. unclosed tags <optional> in documentation 
2. bad name "changeTargetEntry" - should be makeTargetEntry?
3. Why you removed lot of asserts in prepunion.c? These asserts should be valid still 
4. make_coresponding_target has wrong formatting
5. error "%s queries with a CORRESPONDING clause must have at least one column with the same name" has wrong formatting, you can show position
6. previous issue is repeated - look on formatting ereport function, please, you can use DETAIL and HINT fields
7. corresponding clause should to contain column list (I am looking to ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it has impact on all implementation.
8. typo orderCorrespondingLsit(List *targetlist)
9. I miss more tests for CORRESPONDING BY
10. if I understand to this feature, this query should to work

postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 6 c, 8 d;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...

postgres=# create table t1(a int, b int, c int);
CREATE TABLE
Time: 63,260 ms
postgres=# create table t2(a int, b int);
CREATE TABLE
Time: 57,120 ms
postgres=# select * from t1 union corresponding select * from t2;
ERROR:  each UNION query must have the same number of columns
LINE 1: select * from t1 union corresponding select * from t2;
If it is your first patch to Postgres, then it is perfect work!

The @7 is probably most significant - I dislike a expression list there. name_list should be better there.

Regards

Pavel





Attachment

Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-10 10:13 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Yes, you are correct it should to work on CORRESPONDING clause case. SQL 20nn standard draft only said each query to be of the same degree in a case of set operation without corresponding clause. The attached patch is corrected as such .I add those new test case to regression test too


Thank you - I will recheck it.

Regards

Pavel
 


Regards


Surafel  


On Thu, Mar 9, 2017 at 9:49 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
hi

2017-03-09 17:19 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-09 13:18 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Hi ,

Here is a patch corrected as your feedback except missed tests case because corresponding by clause is implemented on the top of set operation and you can’t do that to set operation without corresponding by clause too


I don't understand. 

The following statement should to work

postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30 as b, 40 as c;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...                            
Corresponding clause should to work like projection filter. 

I found a link to postgresql mailing list related to some older try to this feature implementation

Regards
Pavel

 

Eg


postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

ERROR:  each UNION query must have the same number of columns

LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

                                          ^

postgres=# create table t1(a int, b int, c int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE

 

postgres=# select * from t1 union select * from t2;

ERROR:  each UNION query must have the same number of columns

LINE 1: select * from t1 union select * from t2;

 

Thanks

Surafel 


On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

I am sending a review of this interesting feature.

I found following issues, questions:

1. unclosed tags <optional> in documentation 
2. bad name "changeTargetEntry" - should be makeTargetEntry?
3. Why you removed lot of asserts in prepunion.c? These asserts should be valid still 
4. make_coresponding_target has wrong formatting
5. error "%s queries with a CORRESPONDING clause must have at least one column with the same name" has wrong formatting, you can show position
6. previous issue is repeated - look on formatting ereport function, please, you can use DETAIL and HINT fields
7. corresponding clause should to contain column list (I am looking to ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it has impact on all implementation.
8. typo orderCorrespondingLsit(List *targetlist)
9. I miss more tests for CORRESPONDING BY
10. if I understand to this feature, this query should to work

postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 6 c, 8 d;
ERROR:  each UNION query must have the same number of columns
LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...

postgres=# create table t1(a int, b int, c int);
CREATE TABLE
Time: 63,260 ms
postgres=# create table t2(a int, b int);
CREATE TABLE
Time: 57,120 ms
postgres=# select * from t1 union corresponding select * from t2;
ERROR:  each UNION query must have the same number of columns
LINE 1: select * from t1 union corresponding select * from t2;
If it is your first patch to Postgres, then it is perfect work!

The @7 is probably most significant - I dislike a expression list there. name_list should be better there.

Regards

Pavel






Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

2017-03-10 12:55 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-10 10:13 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Yes, you are correct it should to work on CORRESPONDING clause case. SQL 20nn standard draft only said each query to be of the same degree in a case of set operation without corresponding clause. The attached patch is corrected as such .I add those new test case to regression test too


Thank you - I will recheck it.

 Fast check - it looks well

Regards

Pavel

Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

2017-03-10 13:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2017-03-10 12:55 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-10 10:13 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

Yes, you are correct it should to work on CORRESPONDING clause case. SQL 20nn standard draft only said each query to be of the same degree in a case of set operation without corresponding clause. The attached patch is corrected as such .I add those new test case to regression test too


Thank you - I will recheck it.

 Fast check - it looks well

I am sending minor update - cleaning formatting and white spaces, error messages + few more tests

It is working very well.

Maybe correspondingClause needs own node type with attached location. Then context can be much better positioned.

Regards

Pavel
 

Regards

Pavel

Attachment

Re: [HACKERS] New CORRESPONDING clause design

From
Surafel Temesgen
Date:


On Sat, Mar 11, 2017 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


I am sending minor update - cleaning formatting and white spaces, error messages + few more tests
 
Thank you very much for your help
  

Maybe correspondingClause needs own node type with attached location. Then context can be much better positioned.
 

I think we can solve it by using your option or using expr_list for corresponding column and check the syntax manually.

In my opinion, the last option eliminate the introduction of new node for only the sake of error position.


What did you think about the second option?


Regards 

Surafel



Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-13 14:13 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:


On Sat, Mar 11, 2017 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


I am sending minor update - cleaning formatting and white spaces, error messages + few more tests
 
Thank you very much for your help
  

Maybe correspondingClause needs own node type with attached location. Then context can be much better positioned.
 

I think we can solve it by using your option or using expr_list for corresponding column and check the syntax manually.

In my opinion, the last option eliminate the introduction of new node for only the sake of error position.


What did you think about the second option?


I don't like it too much - using expr only for location is too misuse. 

Some errors are related to just CORRESPONDING without any columns. So using expr doesn't help here. So parse node CORRESPONDING can solve both issues.  

Regards

Pavel 


Regards 

Surafel




Re: [HACKERS] New CORRESPONDING clause design

From
Surafel Temesgen
Date:

hi

Some errors are related to just CORRESPONDING without any columns. So using expr doesn't help here. So parse node CORRESPONDING can solve both issues.   
 In current implementation pointing to a node means pointing to a node’s first element so I don’t think we can be able to point to CORRESPONDING without any columns

I find out that there is already a node prepare for the case called A_Const.
The attached patch use that node 

Regards 
Surafel
Attachment

Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

2017-03-14 16:33 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:

hi

Some errors are related to just CORRESPONDING without any columns. So using expr doesn't help here. So parse node CORRESPONDING can solve both issues.   
 In current implementation pointing to a node means pointing to a node’s first element so I don’t think we can be able to point to CORRESPONDING without any columns

I find out that there is already a node prepare for the case called A_Const.
The attached patch use that node 

It looks better

I fixed format of comments and some too long lines. 

all regress tests passed

I have not any objection - I'll mark this patch as ready for commiter

Regards

Pavel

 

Regards 
Surafel

Attachment

Re: [HACKERS] New CORRESPONDING clause design

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have not any objection - I'll mark this patch as ready for commiter

I took a quick look through this and noted that it fails to touch
ruleutils.c, which means that dumping of views containing CORRESPONDING
certainly doesn't work.

Also, the changes in parser/analyze.c seem rather massive and
correspondingly hard to review.  Is it possible to rearrange the
patch to reduce the size of that diff?  If you can avoid moving
or reindenting existing code, that'd help.

The code in that area seems rather confused, too.  For instance,
I'm not sure exactly what orderCorrespondingList() is good for,
but it certainly doesn't look to be doing anything that either its
name or its header comment (or the comments at the call sites) would
suggest.  Its input and output tlists are always in the same order.

I'm a little disturbed by the fact that determineMatchingColumns()
is called twice, and more disturbed by the fact that it looks to be
O(N^2).  This could be really slow with a lot of columns, couldn't it?

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says
           a) If CORRESPONDING is specified, then:             i) Within the columns of T1, equivalent <column name>s
shall               not be specified more than once and within the columns of                T2, equivalent <column
name>sshall not be specified more                than once.
 

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".

I don't think you want to be leaving behind debugging cruft like this:
+            elog(DEBUG4, "%s", ltle->resname);

I'm not impressed by using A_Const for the members of the CORRESPONDING
name list.  That's not a clever solution, that's a confusing kluge,
because it's a complete violation of the meaning of A_Const.  Elsewhere
we just use lists of String for name lists, and that seems sufficient
here.  Personally I'd just use the existing columnList production rather
than rolling your own.

The comments in parsenodes.h about the new fields seem rather confused,
in fact I think they're probably backwards.

Also, I thought the test cases were excessively uninspired and repetitive.
This, for example, seems like about two tests too many:

+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a) SELECT 4 a, 5 b, 6 c;
+ a 
+---
+ 1
+ 4
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 a, 5 b, 6 c;
+ b 
+---
+ 2
+ 5
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c) SELECT 4 a, 5 b, 6 c;
+ c 
+---
+ 3
+ 6
+(2 rows)

without even considering the fact that these are pretty duplicative of
tests around them.  And some of the added tests seem to be just testing
basic UNION functionality, which we already have tests for.  I fail to
see the point of adding any of these:

+SELECT 1 AS two UNION CORRESPONDING SELECT 2 two;
+ two 
+-----
+   1
+   2
+(2 rows)
+
+SELECT 1 AS one UNION CORRESPONDING SELECT 1 one;
+ one 
+-----
+   1
+(1 row)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 2 two;
+ two 
+-----
+   1
+   2
+(2 rows)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 1 two;
+ two 
+-----
+   1
+   1
+(2 rows)

What I think actually is needed is some targeted testing showing
non-interference with nearby features.  As an example, CORRESPONDING
can't safely be implemented by just replacing the tlists of the
input sub-selects with shortened versions, because that would break
situations such as DISTINCT in the sub-selects.  I think it'd be a
good idea to have a test along the lines of
SELECT DISTINCT x, y FROM ...UNION ALL CORRESPONDINGSELECT DISTINCT x, z FROM ...

with values chosen to show that the DISTINCT operators did operate
on x/y and x/z, not just x alone.
        regards, tom lane



Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-18 17:50 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have not any objection - I'll mark this patch as ready for commiter

I took a quick look through this and noted that it fails to touch
ruleutils.c, which means that dumping of views containing CORRESPONDING
certainly doesn't work.

it my mistake - this bug I should to see :( 

Also, the changes in parser/analyze.c seem rather massive and
correspondingly hard to review.  Is it possible to rearrange the
patch to reduce the size of that diff?  If you can avoid moving
or reindenting existing code, that'd help.

The code in that area seems rather confused, too.  For instance,
I'm not sure exactly what orderCorrespondingList() is good for,
but it certainly doesn't look to be doing anything that either its
name or its header comment (or the comments at the call sites) would
suggest.  Its input and output tlists are always in the same order.

I'm a little disturbed by the fact that determineMatchingColumns()
is called twice, and more disturbed by the fact that it looks to be
O(N^2).  This could be really slow with a lot of columns, couldn't it?

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says

            a) If CORRESPONDING is specified, then:
              i) Within the columns of T1, equivalent <column name>s shall
                 not be specified more than once and within the columns of
                 T2, equivalent <column name>s shall not be specified more
                 than once.

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".

I don't think you want to be leaving behind debugging cruft like this:
+                       elog(DEBUG4, "%s", ltle->resname);

I'm not impressed by using A_Const for the members of the CORRESPONDING
name list.  That's not a clever solution, that's a confusing kluge,
because it's a complete violation of the meaning of A_Const.  Elsewhere
we just use lists of String for name lists, and that seems sufficient
here.  Personally I'd just use the existing columnList production rather
than rolling your own.

The reason was attach a location to name for more descriptive error message. 


The comments in parsenodes.h about the new fields seem rather confused,
in fact I think they're probably backwards.

Also, I thought the test cases were excessively uninspired and repetitive.
This, for example, seems like about two tests too many:

+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(a) SELECT 4 a, 5 b, 6 c;
+ a
+---
+ 1
+ 4
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 a, 5 b, 6 c;
+ b
+---
+ 2
+ 5
+(2 rows)
+
+SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c) SELECT 4 a, 5 b, 6 c;
+ c
+---
+ 3
+ 6
+(2 rows)

without even considering the fact that these are pretty duplicative of
tests around them.  And some of the added tests seem to be just testing
basic UNION functionality, which we already have tests for.  I fail to
see the point of adding any of these:

+SELECT 1 AS two UNION CORRESPONDING SELECT 2 two;
+ two
+-----
+   1
+   2
+(2 rows)
+
+SELECT 1 AS one UNION CORRESPONDING SELECT 1 one;
+ one
+-----
+   1
+(1 row)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 2 two;
+ two
+-----
+   1
+   2
+(2 rows)
+
+SELECT 1 AS two UNION ALL CORRESPONDING SELECT 1 two;
+ two
+-----
+   1
+   1
+(2 rows)

What I think actually is needed is some targeted testing showing
non-interference with nearby features.  As an example, CORRESPONDING
can't safely be implemented by just replacing the tlists of the
input sub-selects with shortened versions, because that would break
situations such as DISTINCT in the sub-selects.  I think it'd be a
good idea to have a test along the lines of

        SELECT DISTINCT x, y FROM ...
        UNION ALL CORRESPONDING
        SELECT DISTINCT x, z FROM ...

with values chosen to show that the DISTINCT operators did operate
on x/y and x/z, not just x alone.

                        regards, tom lane

Re: [HACKERS] New CORRESPONDING clause design

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2017-03-18 17:50 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> I'm not impressed by using A_Const for the members of the CORRESPONDING
>> name list.  That's not a clever solution, that's a confusing kluge,
>> because it's a complete violation of the meaning of A_Const.  Elsewhere
>> we just use lists of String for name lists, and that seems sufficient
>> here.  Personally I'd just use the existing columnList production rather
>> than rolling your own.

> The reason was attach a location to name for more descriptive error
> message.

[ shrug... ] The patch fails to actually use the location anywhere.
If it had, you might have noticed that it's attaching the wrong location
to all elements except the first :-(.  So I'm not very excited about that.
I definitely don't see a reason for CORRESPONDING to track locations of
name list elements when no other name list productions do.  It might be
worth proposing a followon patch to change all of them (perhaps by adding
a location field to struct "Value") and then make use of the locations in
error messages more widely.
        regards, tom lane



Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-18 18:32 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2017-03-18 17:50 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> I'm not impressed by using A_Const for the members of the CORRESPONDING
>> name list.  That's not a clever solution, that's a confusing kluge,
>> because it's a complete violation of the meaning of A_Const.  Elsewhere
>> we just use lists of String for name lists, and that seems sufficient
>> here.  Personally I'd just use the existing columnList production rather
>> than rolling your own.

> The reason was attach a location to name for more descriptive error
> message.

[ shrug... ] The patch fails to actually use the location anywhere.
If it had, you might have noticed that it's attaching the wrong location
to all elements except the first :-(.  So I'm not very excited about that.
I definitely don't see a reason for CORRESPONDING to track locations of
name list elements when no other name list productions do.  It might be
worth proposing a followon patch to change all of them (perhaps by adding
a location field to struct "Value") and then make use of the locations in
error messages more widely.

I had a idea use own node for  CORRESPONDING with location - and using this location in related error messages.

What do you think about it?

Regards

Pavel

                        regards, tom lane

Re: [HACKERS] New CORRESPONDING clause design

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2017-03-18 18:32 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> I definitely don't see a reason for CORRESPONDING to track locations of
>> name list elements when no other name list productions do.  It might be
>> worth proposing a followon patch to change all of them (perhaps by adding
>> a location field to struct "Value") and then make use of the locations in
>> error messages more widely.

> I had a idea use own node for  CORRESPONDING with location - and using this
> location in related error messages.

I think using a private node type for CORRESPONDING is exactly the wrong
thing.  It's a columnList and it should be like other columnLists.  If
there's an argument for providing a location for "no such column" errors
for CORRESPONDING, then surely there's also an argument for providing
a location for "no such column" errors for FOREIGN KEY and the other
places where we have lists of column names.
        regards, tom lane



Re: [HACKERS] New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-18 19:12 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2017-03-18 18:32 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> I definitely don't see a reason for CORRESPONDING to track locations of
>> name list elements when no other name list productions do.  It might be
>> worth proposing a followon patch to change all of them (perhaps by adding
>> a location field to struct "Value") and then make use of the locations in
>> error messages more widely.

> I had a idea use own node for  CORRESPONDING with location - and using this
> location in related error messages.

I think using a private node type for CORRESPONDING is exactly the wrong
thing.  It's a columnList and it should be like other columnLists.  If
there's an argument for providing a location for "no such column" errors
for CORRESPONDING, then surely there's also an argument for providing
a location for "no such column" errors for FOREIGN KEY and the other
places where we have lists of column names.

The corresponding clause is used in UNION queries - these queries can be pretty long, so marking wrong corresponding clause can be helpful.

Probably there are not any other argument for special node,

Regards

Pavel
 

                        regards, tom lane

Re: [HACKERS] New CORRESPONDING clause design

From
Surafel Temesgen
Date:


On Sat, Mar 18, 2017 at 7:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have not any objection - I'll mark this patch as ready for commiter


I'm a little disturbed by the fact that determineMatchingColumns()
is called twice, and more disturbed by the fact that it looks to be
O(N^2).  This could be really slow with a lot of columns, couldn't it?

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says

            a) If CORRESPONDING is specified, then:
              i) Within the columns of T1, equivalent <column name>s shall
                 not be specified more than once and within the columns of
                 T2, equivalent <column name>s shall not be specified more
                 than once.

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".

determineMatchingColumns can be reduce to one call but in order
to enforce the above semantics I leave it as it is by thinking 
the performance penalty can be compromise with sql standard 
conformance. 


What I think actually is needed is some targeted testing showing
non-interference with nearby features.  As an example, CORRESPONDING
can't safely be implemented by just replacing the tlists of the
input sub-selects with shortened versions, because that would break
situations such as DISTINCT in the sub-selects.  I think it'd be a
good idea to have a test along the lines of

        SELECT DISTINCT x, y FROM ...
        UNION ALL CORRESPONDING
        SELECT DISTINCT x, z FROM ...

with values chosen to show that the DISTINCT operators did operate
on x/y and x/z, not just x alone.
 
Even if without replacing the tlists of the top-level query 
which is leftmostQuery tagretlist  with shortened versioned
corresponding target list DISTINCT operators did’t operate 
on x/z on left query because  top-level Query has a dummy 
targetlist that exists mainly to show the union'd datatype 
of each output column, and it carries any sortClause, limitClause,
etc needed for the output of the entire operation


if I don’t miss something that means in current implementation we cannot 
use sortClause limitClause with target list outside  final out put or on 
individual query and I think planning doing as a whole will be good 
 
Regrades 

Surafel

Re: New CORRESPONDING clause design

From
Surafel Temesgen
Date:



I took a quick look through this and noted that it fails to touch
ruleutils.c, which means that dumping of views containing CORRESPONDING
certainly doesn't work.
fixed  
Also, the changes in parser/analyze.c seem rather massive and
correspondingly hard to review.  Is it possible to rearrange the
patch to reduce the size of that diff?  If you can avoid moving
or reindenting existing code, that'd help.
Part of transformSetOperationTree that make union data type of 
set operation target list became makeUnionDatatype inorder to 
easy using it multiple time and avoid very long transformSetOperationTree 
function 
 
The code in that area seems rather confused, too.  For instance,
I'm not sure exactly what orderCorrespondingList() is good for,
but it certainly doesn't look to be doing anything that either its
name or its header comment (or the comments at the call sites) would
suggest.  Its input and output tlists are always in the same order.

It give corresponding target list a sequential resnos
Inorder to avoid touching generate_append_tlist I change 
the comment and function name as such 

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says

            a) If CORRESPONDING is specified, then:
              i) Within the columns of T1, equivalent <column name>s shall
                 not be specified more than once and within the columns of
                 T2, equivalent <column name>s shall not be specified more
                 than once.

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".
fixed  

I'm not impressed by using A_Const for the members of the CORRESPONDING
name list.  That's not a clever solution, that's a confusing kluge,
because it's a complete violation of the meaning of A_Const.  Elsewhere
we just use lists of String for name lists, and that seems sufficient
here.  Personally I'd just use the existing columnList production rather
than rolling your own.
fixed  


Attachment

Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

2017-03-25 13:41 GMT+01:00 Surafel Temesgen <surafel3000@gmail.com>:



I took a quick look through this and noted that it fails to touch
ruleutils.c, which means that dumping of views containing CORRESPONDING
certainly doesn't work.
fixed  
Also, the changes in parser/analyze.c seem rather massive and
correspondingly hard to review.  Is it possible to rearrange the
patch to reduce the size of that diff?  If you can avoid moving
or reindenting existing code, that'd help.
Part of transformSetOperationTree that make union data type of 
set operation target list became makeUnionDatatype inorder to 
easy using it multiple time and avoid very long transformSetOperationTree 
function 
 
The code in that area seems rather confused, too.  For instance,
I'm not sure exactly what orderCorrespondingList() is good for,
but it certainly doesn't look to be doing anything that either its
name or its header comment (or the comments at the call sites) would
suggest.  Its input and output tlists are always in the same order.

It give corresponding target list a sequential resnos
Inorder to avoid touching generate_append_tlist I change 
the comment and function name as such 

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says

            a) If CORRESPONDING is specified, then:
              i) Within the columns of T1, equivalent <column name>s shall
                 not be specified more than once and within the columns of
                 T2, equivalent <column name>s shall not be specified more
                 than once.

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".
fixed  

I'm not impressed by using A_Const for the members of the CORRESPONDING
name list.  That's not a clever solution, that's a confusing kluge,
because it's a complete violation of the meaning of A_Const.  Elsewhere
we just use lists of String for name lists, and that seems sufficient
here.  Personally I'd just use the existing columnList production rather
than rolling your own.
fixed  


I am sending updated version:

1. the corresponding columns must be unique, other not - code refactoring (the code is still O(N*M*Z) - but some slow operations (string cmp) reduced) + regress tests
2. regress tests for views
3. some cleaning (white chars)

Regards

Pavel

Attachment

Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can you fix it.

It crash on 

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel 


Attachment

Re: New CORRESPONDING clause design

From
Surafel Temesgen
Date:
can you help with fixing it Pavel?

On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can you fix it.

It crash on 

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel 



Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-28 13:58 GMT+02:00 Surafel Temesgen <surafel3000@gmail.com>:
can you help with fixing it Pavel?

There must be some new preanalyze stage - you have to know result columns before you are starting a analyze

Regards

Pavel

On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can you fix it.

It crash on 

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel 




Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-28 14:18 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-28 13:58 GMT+02:00 Surafel Temesgen <surafel3000@gmail.com>:
can you help with fixing it Pavel?

There must be some new preanalyze stage - you have to know result columns before you are starting a analyze

maybe some recheck after analyze stage to remove invalid columns can be good enough.

Regards

Pavel
 

Regards

Pavel

On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can you fix it.

It crash on 

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel 





Re: New CORRESPONDING clause design

From
Surafel Temesgen
Date:
hi

Thank you very much for your help .
here is the patch fix that issue as you suggest
 
Regards
 
Surafel 


On Tue, Mar 28, 2017 at 5:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2017-03-28 14:18 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-28 13:58 GMT+02:00 Surafel Temesgen <surafel3000@gmail.com>:
can you help with fixing it Pavel?

There must be some new preanalyze stage - you have to know result columns before you are starting a analyze

maybe some recheck after analyze stage to remove invalid columns can be good enough.

Regards

Pavel
 

Regards

Pavel

On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can you fix it.

It crash on 

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel 






Attachment

Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

2017-03-30 13:11 GMT+02:00 Surafel Temesgen <surafel3000@gmail.com>:
hi

Thank you very much for your help .
here is the patch fix that issue as you suggest

The crash is fixed

I did a rebase + few more regress tests.

Is following use case defined in standard?

postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
           UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
           UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
┌───┐
│ a │
╞═══╡
│ 1 │
│ 3 │
│ 6 │
└───┘
(3 rows)

It depends on order of implementation 

if we do (T1 U T2) U T3 ---> then result is correct,
but if we do T1 U (T2 U T3) ---> than it should to fail

I am not sure, if this result is expected (correct). I expect more syntax error because corresponding by is not filled. 
 
 
Regards
 
Surafel 


On Tue, Mar 28, 2017 at 5:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2017-03-28 14:18 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-28 13:58 GMT+02:00 Surafel Temesgen <surafel3000@gmail.com>:
can you help with fixing it Pavel?

There must be some new preanalyze stage - you have to know result columns before you are starting a analyze

maybe some recheck after analyze stage to remove invalid columns can be good enough.

Regards

Pavel
 

Regards

Pavel

On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can you fix it.

It crash on 

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel 







Attachment

Re: New CORRESPONDING clause design

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Is following use case defined in standard?

> postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
>            UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS a,
> 0 AS x6, -1 AS x6
>            UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
> ┌───┐
> │ a │
> ╞═══╡
> │ 1 │
> │ 3 │
> │ 6 │
> └───┘
> (3 rows)

> It depends on order of implementation

> if we do (T1 U T2) U T3 ---> then result is correct,
> but if we do T1 U (T2 U T3) ---> than it should to fail

UNION ALL should associate left-to-right, just like most other binary
operators, so this looks fine to me.  Did you check that you get an
error if you put in parens to force the other order?
        regards, tom lane



Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:


2017-03-30 21:43 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Is following use case defined in standard?

> postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
>            UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS a,
> 0 AS x6, -1 AS x6
>            UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
> ┌───┐
> │ a │
> ╞═══╡
> │ 1 │
> │ 3 │
> │ 6 │
> └───┘
> (3 rows)

> It depends on order of implementation

> if we do (T1 U T2) U T3 ---> then result is correct,
> but if we do T1 U (T2 U T3) ---> than it should to fail

UNION ALL should associate left-to-right, just like most other binary
operators, so this looks fine to me.  Did you check that you get an
error if you put in parens to force the other order?

yes - it fails

 postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 UNION ALL CORRESPONDING BY(a,b) (SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa);
ERROR:  column name "b" can not be used in CORRESPONDING BY list
LINE 1: ...b, 0 AS x3, -1 AS x3 UNION ALL CORRESPONDING BY(a,b) (SELECT...
                                                             ^
HINT:  UNION queries with a CORRESPONDING BY clause must contain column names from both tables.
Time: 1,135 ms

Regards

Pavel

                        regards, tom lane

Re: New CORRESPONDING clause design

From
Pavel Stehule
Date:
Hi

2017-03-30 21:55 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-30 21:43 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Is following use case defined in standard?

> postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
>            UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS a,
> 0 AS x6, -1 AS x6
>            UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
> ┌───┐
> │ a │
> ╞═══╡
> │ 1 │
> │ 3 │
> │ 6 │
> └───┘
> (3 rows)

> It depends on order of implementation

> if we do (T1 U T2) U T3 ---> then result is correct,
> but if we do T1 U (T2 U T3) ---> than it should to fail

UNION ALL should associate left-to-right, just like most other binary
operators, so this looks fine to me.  Did you check that you get an
error if you put in parens to force the other order?

yes - it fails

 postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3 UNION ALL CORRESPONDING BY(a,b) (SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6 UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa);
ERROR:  column name "b" can not be used in CORRESPONDING BY list
LINE 1: ...b, 0 AS x3, -1 AS x3 UNION ALL CORRESPONDING BY(a,b) (SELECT...
                                                             ^
HINT:  UNION queries with a CORRESPONDING BY clause must contain column names from both tables.
Time: 1,135 ms


I fixed wrong my comment 

I have no any other objections, I'll mark this patch as ready for commiter

Regards

Pavel
 
Regards

Pavel

                        regards, tom lane


Attachment

Re: New CORRESPONDING clause design

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ corresponding_clause_v12.patch ]

I worked on this for awhile but eventually decided that it's not very
close to being committable.  The main thing that's scaring me off is
a realization that there are a *lot* of places that assume that the
output columns of a set operation are one-for-one with the columns of
the inputs.  One such place is the subquery qual pushdown logic in
allpaths.c, which this patch hasn't touched, resulting in

regression=# create table t1 (a int, b int, c int);
CREATE TABLE
regression=# create table t2 (a int, b int, c int);
CREATE TABLE
regression=# create view vv2 as
regression-# select * from t1 union all corresponding by (b,c) select * from t2;
CREATE VIEW
regression=# select * from vv2 where c = 44;
ERROR:  wrong number of tlist entries

Another is the code that converts a UNION ALL subquery into an appendrel.
The best thing there might be to just reject CORRESPONDING in
is_simple_union_all, but this patch doesn't, which means we get the wrong
results from

regression=# create table t3 (b int, a int, c int);
CREATE TABLE
regression=# explain verbose select * from t1 union all corresponding select * from t3;
QUERYPLAN                              
--------------------------------------------------------------------Append  (cost=0.00..60.80 rows=4080 width=12)  ->
SeqScan on public.t1  (cost=0.00..30.40 rows=2040 width=12)        Output: t1.a, t1.b, t1.c  ->  Seq Scan on public.t3
(cost=0.00..30.40rows=2040 width=12)        Output: t3.b, t3.a, t3.c 
(5 rows)

Notice it's failed to rearrange the columns to match.

There are also such assumptions in ruleutils.c.  Trying to reverse-list
the above view gives

regression=# \d+ vv2                            View "public.vv2"Column |  Type   | Collation | Nullable | Default |
Storage| Description  
--------+---------+-----------+----------+---------+---------+-------------b      | integer |           |          |
    | plain   | c      | integer |           |          |         | plain   |  
View definition:SELECT t1.a AS b,   t1.b AS c,   t1.c  FROM t1
UNION ALL CORRESPONDING BY (b, c)SELECT t2.a AS b,   t2.b AS c,   t2.c  FROM t2;

which is obviously wrong.  The reason for that is that the code is
trying to ensure that the SELECT's output column names match the
view's column names, so it sticks AS clauses on the select-list
expressions.  If we go a little further:

regression=# alter table vv2 rename column c to ceetwo;
ALTER TABLE
regression=# \d+ vv2                            View "public.vv2"Column |  Type   | Collation | Nullable | Default |
Storage| Description  
--------+---------+-----------+----------+---------+---------+-------------b      | integer |           |          |
    | plain   | ceetwo | integer |           |          |         | plain   |  
View definition:SELECT t1.a AS b,   t1.b AS ceetwo,   t1.c  FROM t1
UNION ALL CORRESPONDING BY (b, c)SELECT t2.a AS b,   t2.b AS ceetwo,   t2.c  FROM t2;

Now things are a *complete* mess, because this view definition would
successfully parse during a dump-and-reload, but it means something
else than it did before; the CORRESPONDING BY list has failed to track
the change in names of the columns.

In general, there's a lot of subtle logic in ruleutils.c about what we
need to do to ensure that reverse-listed views keep the same semantic
meaning in the face of column renamings or additions in their input
tables.  CORRESPONDING is really scary in this situation because its
semantics depend on column names.  I tried to break it by adding/renaming
columns of the input tables to see if I could get ruleutils to print
something that didn't mean what it meant before.  I didn't succeed
immediately, but I am thinking it would be smart for us always to
reverse-list the construct as CORRESPONDING BY with an explicit list of
column names.  We don't ever reverse-list a NATURAL JOIN as such,
preferring to use an explicit JOIN USING list, for closely related
reasons.  (You might care to study the logic in ruleutils.c around the
usingNames list, which exists to ensure that JOIN USING doesn't get broken
by column renames.  It's entirely likely that we're going to need
something equivalently complicated for CORRESPONDING BY.  Or if we
don't, I'd want to have some comments in there that clearly explain
why we don't, because otherwise somebody will break it in future.)

I also found that WITH RECURSIVE fails immediately if you stick
CORRESPONDING into the recursive union step; eg in this lightly
modified version of a query from with.sql,

CREATE TEMPORARY TABLE tree(   id INTEGER PRIMARY KEY,   parent_id INTEGER REFERENCES tree(id)
);

WITH RECURSIVE t(id, path) AS (   select 1 as id, ARRAY[]::integer[] as path
UNION ALL CORRESPONDING   SELECT tree.id, t.path || tree.id as path   FROM tree JOIN t ON (tree.parent_id = t.id)
)
SELECT t1.*, t2.* FROM t AS t1 JOIN t AS t2 ON(t1.path[1] = t2.path[1] ANDarray_upper(t1.path,1) = 1
ANDarray_upper(t2.path,1)> 1)ORDER BY t1.id, t2.id; 

ERROR:  column t.id does not exist
LINE 5:     FROM tree JOIN t ON (tree.parent_id = t.id)                                                 ^
HINT:  Perhaps you meant to reference the column "tree.id".

I was expecting to find trouble if the CORRESPONDING had to rearrange
columns (breaking the 1-to-1 assumption), but it fails even without that.

There may well be other places in the system that are making similar
assumptions about 1-to-1 mapping between setop inputs and outputs.
This comment in plan_set_operations is pretty scary, for instance:   * Find the leftmost component Query.  We need to
useits column names for   * all generated tlists (else SELECT INTO won't work right). 
because the column names in the component query certainly won't be
1-to-1 with upper tlists if CORRESPONDING is active.

I wonder whether it wouldn't be a smarter plan to abandon this
implementation approach and rely on inserting an explicit level of
sub-selects during the parse analysis transformation.  That is,
 SELECT * FROM foo UNION CORRESPONDING BY (x, y) SELECT * FROM bar

would get converted into
 SELECT x, y FROM (SELECT * FROM foo) ss1 UNION SELECT x, y FROM (SELECT * FROM bar) ss2

This isn't very pretty, but I would have a lot fewer worries
about how long we'd be finding bugs induced by the feature.

If we do stick with trying to represent CORRESPONDING BY explicitly
in the parse tree, I'm not very satisfied with the representation
you've chosen for SetOperationStmt.correspondingColumns.  The patch
says

+    List       *correspondingColumns;    /* list of corresponding column names */

which is a flat out lie.  debug_print_parse and some study of the code
showed me that it's actually a list of TargetEntry nodes containing
copies of the left-hand input's target expression trees.  This seems
bulky, bizarre, and asymmetric; among other things it requires the planner
to re-derive the column matching that the parser had already figured out.
There certainly doesn't seem to be any value in carrying around an extra
copy of the left-hand expressions.

I'd be inclined to suggest that the best representation in
SetOperationStmt is two integer lists of the left-hand and right-hand
column numbers of the CORRESPONDING columns.  This would probably
simplify matters greatly for the planner.  It would mean that ruleutils.c
would have to do a bit more work to find out the column names to print
in CORRESPONDING BY; but as I showed above, just printing the names that
appeared at parse time is a doomed strategy anyway.

Also, just in passing --- while it may well be a good idea to add a
location field to struct Value, I'm not on board with doing so and
then using it in only one place.  As I said earlier, I think that
CORRESPONDING's column name list should act like every other column
name list in the grammar, and that includes making use of error cursor
locations where appropriate.  And changing struct Value has repercussions
way beyond that, for instance it's not clear why we'd keep A_Const as a
separate node type if Value has a location field.  I think if you want
to push that forward, it would be a good idea to submit a separate
patch that just focuses on adding error cursor location reports
everywhere that adding locations to Values would enable.

I'll set this back to Waiting on Author, but I think the chances of
getting to a committable patch before the end of the commitfest are
about nil.
        regards, tom lane



Re: New CORRESPONDING clause design

From
David Steele
Date:
On 4/1/17 1:54 PM, Tom Lane wrote:
> 
> I'll set this back to Waiting on Author, but I think the chances of
> getting to a committable patch before the end of the commitfest are
> about nil.

I think this is especially true now that another three days have passed.

This submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.

Regards,
-- 
-David
david@pgmasters.net