Re: [HACKERS] New CORRESPONDING clause design - Mailing list pgsql-hackers

From Surafel Temesgen
Subject Re: [HACKERS] New CORRESPONDING clause design
Date
Msg-id CALAY4q83=9HfTz42mY_zhaFTKmNxC_sdDSeev2BYXzUznXWPfg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] New CORRESPONDING clause design  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Sat, Mar 18, 2017 at 7:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have not any objection - I'll mark this patch as ready for commiter


I'm a little disturbed by the fact that determineMatchingColumns()
is called twice, and more disturbed by the fact that it looks to be
O(N^2).  This could be really slow with a lot of columns, couldn't it?

I also think there should be some comments about exactly what matching
semantics we're enforcing.   The SQL standard says

            a) If CORRESPONDING is specified, then:
              i) Within the columns of T1, equivalent <column name>s shall
                 not be specified more than once and within the columns of
                 T2, equivalent <column name>s shall not be specified more
                 than once.

That seems unreasonably stringent to me; it ought to be sufficient to
forbid duplicates of the names listed in CORRESPONDING, or the common
column names if there's no BY list.  But whichever restriction you prefer,
this code seems to be failing to check it --- I certainly don't see any
new error message about "column name "foo" appears more than once".

determineMatchingColumns can be reduce to one call but in order
to enforce the above semantics I leave it as it is by thinking 
the performance penalty can be compromise with sql standard 
conformance. 


What I think actually is needed is some targeted testing showing
non-interference with nearby features.  As an example, CORRESPONDING
can't safely be implemented by just replacing the tlists of the
input sub-selects with shortened versions, because that would break
situations such as DISTINCT in the sub-selects.  I think it'd be a
good idea to have a test along the lines of

        SELECT DISTINCT x, y FROM ...
        UNION ALL CORRESPONDING
        SELECT DISTINCT x, z FROM ...

with values chosen to show that the DISTINCT operators did operate
on x/y and x/z, not just x alone.
 
Even if without replacing the tlists of the top-level query 
which is leftmostQuery tagretlist  with shortened versioned
corresponding target list DISTINCT operators did’t operate 
on x/z on left query because  top-level Query has a dummy 
targetlist that exists mainly to show the union'd datatype 
of each output column, and it carries any sortClause, limitClause,
etc needed for the output of the entire operation


if I don’t miss something that means in current implementation we cannot 
use sortClause limitClause with target list outside  final out put or on 
individual query and I think planning doing as a whole will be good 
 
Regrades 

Surafel

pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: [HACKERS] Page Scan Mode in Hash Index
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Partitioned tables and relfilenode