Thread: Some progress on INSERT/SELECT/GROUP BY bugs

Some progress on INSERT/SELECT/GROUP BY bugs

From
Tom Lane
Date:
I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.

Here is the test case I'm looking at:

CREATE TABLE si_tmpverifyaccountbalances (       type int4 NOT NULL,       memberid int4 NOT NULL,       categoriesid
int4NOT NULL,       amount numeric);
 

CREATE TABLE invoicelinedetails (       invoiceid int4,       memberid int4,       totshippinghandling numeric,
invoicelinesidint4);
 

INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;

ERROR:  INSERT has more expressions than target columns

The reason this is coming out is that the matching of GROUP BY (also
ORDER BY) items to targetlist entries is fundamentally broken in this
context.  The GROUP BY items "memberid" and "totshippinghandling" are
simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
in parse_clause.c; what findTargetlistEntry() does with them is to try
to match them against the resdom names of the existing targetlist items.
I think that's correct behavior in the plain SELECT case (but note it
means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
--- is that per spec??).  But it fails miserably in the INSERT/SELECT
case, because by the time control gets here, the targetlist items have
been given resdom names *corresponding to the column names of the target
table*.

So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.

Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong.  (Don't have a spec to check this with,
however.)

I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete.  In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does.  The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.

(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones.  The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE.  So the revised code should only
assign dest column names to non-junk TLEs.)

I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it?  I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...
        regards, tom lane


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Bruce Momjian
Date:
> (MakeTargetEntryIdent is broken anyway because it tries to associate
> a destination column with every TLE, even the resjunk ones.  The reason
> we see the quoted error message in this situation is that after
> findTargetlistEntry fails to detect that totshippinghandling is already
> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
> totshippinghandling, and then MakeTargetEntryIdent tries to find a
> target column to go with the junk TLE.  So the revised code should only
> assign dest column names to non-junk TLEs.)
> 
> I'm not really familiar enough with the parser to want to tackle this
> size of change by myself --- Thomas, do you want to do it?  I think it's
> largely a matter of moving code around, but I'm not sure where is the
> right place for it...

Yes, I clearly remember the INSERT assigning target names to columns in
the select to match up the entries.  I still am unclear which of these
are valid SQL:
select a as b from test order by aselect a as b from test order by b

Can we just defer the renaming until after we do group-by?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Bruce Momjian
Date:
Tom, was this done?


> I believe I've identified the main cause of the peculiar behavior we
> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
> parser bug.
> 
> Here is the test case I'm looking at:
> 
> CREATE TABLE si_tmpverifyaccountbalances (
>         type int4 NOT NULL,
>         memberid int4 NOT NULL,
>         categoriesid int4 NOT NULL,
>         amount numeric);
> 
> CREATE TABLE invoicelinedetails (
>         invoiceid int4,
>         memberid int4,
>         totshippinghandling numeric,
>         invoicelinesid int4);
> 
> INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
> memberid, 1, totshippinghandling FROM invoicelinedetails
> GROUP BY invoiceid+3, memberid, totshippinghandling;
> 
> ERROR:  INSERT has more expressions than target columns
> 
> The reason this is coming out is that the matching of GROUP BY (also
> ORDER BY) items to targetlist entries is fundamentally broken in this
> context.  The GROUP BY items "memberid" and "totshippinghandling" are
> simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
> in parse_clause.c; what findTargetlistEntry() does with them is to try
> to match them against the resdom names of the existing targetlist items.
> I think that's correct behavior in the plain SELECT case (but note it
> means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
> --- is that per spec??).  But it fails miserably in the INSERT/SELECT
> case, because by the time control gets here, the targetlist items have
> been given resdom names *corresponding to the column names of the target
> table*.
> 
> So, in the example at hand, "memberid" is matched to the correct column
> by pure luck (because it has the same name in the destination table),
> and then "totshippinghandling" is not recognized as one of the existing
> TLEs because it does not match any destination column name.
> 
> Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
> to mean the same thing no matter whether there is an INSERT in front of
> it or not, and thus that letting target column names affect the meaning
> of GROUP BY items is dead wrong.  (Don't have a spec to check this with,
> however.)
> 
> I believe the most reasonable fix for this is to postpone relabeling
> of the targetlist entries with destination column names until after
> analysis of the SELECT's subsidiary clauses is complete.  In particular,
> it should *not* be done instantly when each TLE is made, which is what
> MakeTargetEntryIdent currently does.  The TLEs should have the same
> resnames as in the SELECT case until after subsidiary clause processing
> is done.
> 
> (MakeTargetEntryIdent is broken anyway because it tries to associate
> a destination column with every TLE, even the resjunk ones.  The reason
> we see the quoted error message in this situation is that after
> findTargetlistEntry fails to detect that totshippinghandling is already
> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
> totshippinghandling, and then MakeTargetEntryIdent tries to find a
> target column to go with the junk TLE.  So the revised code should only
> assign dest column names to non-junk TLEs.)
> 
> I'm not really familiar enough with the parser to want to tackle this
> size of change by myself --- Thomas, do you want to do it?  I think it's
> largely a matter of moving code around, but I'm not sure where is the
> right place for it...
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, was this done?

This is not done --- I wasn't willing to try to do such a thing by
myself when we were already in 6.5 beta.  It's on my todo list for 6.6.

6.5 fails in a different way than 6.4 did, for reasons that I don't
recall offhand, but the only real fix is to restructure the analyzer.
        regards, tom lane


>> I believe I've identified the main cause of the peculiar behavior we
>> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
>> parser bug.
>> 
>> Here is the test case I'm looking at:
>> 
>> CREATE TABLE si_tmpverifyaccountbalances (
>> type int4 NOT NULL,
>> memberid int4 NOT NULL,
>> categoriesid int4 NOT NULL,
>> amount numeric);
>> 
>> CREATE TABLE invoicelinedetails (
>> invoiceid int4,
>> memberid int4,
>> totshippinghandling numeric,
>> invoicelinesid int4);
>> 
>> INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
>> memberid, 1, totshippinghandling FROM invoicelinedetails
>> GROUP BY invoiceid+3, memberid, totshippinghandling;
>> 
>> ERROR:  INSERT has more expressions than target columns
>> 
>> The reason this is coming out is that the matching of GROUP BY (also
>> ORDER BY) items to targetlist entries is fundamentally broken in this
>> context.  The GROUP BY items "memberid" and "totshippinghandling" are
>> simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
>> in parse_clause.c; what findTargetlistEntry() does with them is to try
>> to match them against the resdom names of the existing targetlist items.
>> I think that's correct behavior in the plain SELECT case (but note it
>> means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
>> --- is that per spec??).  But it fails miserably in the INSERT/SELECT
>> case, because by the time control gets here, the targetlist items have
>> been given resdom names *corresponding to the column names of the target
>> table*.
>> 
>> So, in the example at hand, "memberid" is matched to the correct column
>> by pure luck (because it has the same name in the destination table),
>> and then "totshippinghandling" is not recognized as one of the existing
>> TLEs because it does not match any destination column name.
>> 
>> Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
>> to mean the same thing no matter whether there is an INSERT in front of
>> it or not, and thus that letting target column names affect the meaning
>> of GROUP BY items is dead wrong.  (Don't have a spec to check this with,
>> however.)
>> 
>> I believe the most reasonable fix for this is to postpone relabeling
>> of the targetlist entries with destination column names until after
>> analysis of the SELECT's subsidiary clauses is complete.  In particular,
>> it should *not* be done instantly when each TLE is made, which is what
>> MakeTargetEntryIdent currently does.  The TLEs should have the same
>> resnames as in the SELECT case until after subsidiary clause processing
>> is done.
>> 
>> (MakeTargetEntryIdent is broken anyway because it tries to associate
>> a destination column with every TLE, even the resjunk ones.  The reason
>> we see the quoted error message in this situation is that after
>> findTargetlistEntry fails to detect that totshippinghandling is already
>> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
>> totshippinghandling, and then MakeTargetEntryIdent tries to find a
>> target column to go with the junk TLE.  So the revised code should only
>> assign dest column names to non-junk TLEs.)
>> 
>> I'm not really familiar enough with the parser to want to tackle this
>> size of change by myself --- Thomas, do you want to do it?  I think it's
>> largely a matter of moving code around, but I'm not sure where is the
>> right place for it...
>> 
>> regards, tom lane
>> 
>> 


> -- 
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Tom, was this done?
> 
> This is not done --- I wasn't willing to try to do such a thing by
> myself when we were already in 6.5 beta.  It's on my todo list for 6.6.



On your list.  Good.  I can't possibly figure out how to describe this
bug.


> 
> 6.5 fails in a different way than 6.4 did, for reasons that I don't
> recall offhand, but the only real fix is to restructure the analyzer.
> 
>             regards, tom lane
> 
> 
> >> I believe I've identified the main cause of the peculiar behavior we
> >> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
> >> parser bug.
> >> 
> >> Here is the test case I'm looking at:
> >> 
> >> CREATE TABLE si_tmpverifyaccountbalances (
> >> type int4 NOT NULL,
> >> memberid int4 NOT NULL,
> >> categoriesid int4 NOT NULL,
> >> amount numeric);
> >> 
> >> CREATE TABLE invoicelinedetails (
> >> invoiceid int4,
> >> memberid int4,
> >> totshippinghandling numeric,
> >> invoicelinesid int4);
> >> 
> >> INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
> >> memberid, 1, totshippinghandling FROM invoicelinedetails
> >> GROUP BY invoiceid+3, memberid, totshippinghandling;
> >> 
> >> ERROR:  INSERT has more expressions than target columns
> >> 
> >> The reason this is coming out is that the matching of GROUP BY (also
> >> ORDER BY) items to targetlist entries is fundamentally broken in this
> >> context.  The GROUP BY items "memberid" and "totshippinghandling" are
> >> simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
> >> in parse_clause.c; what findTargetlistEntry() does with them is to try
> >> to match them against the resdom names of the existing targetlist items.
> >> I think that's correct behavior in the plain SELECT case (but note it
> >> means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
> >> --- is that per spec??).  But it fails miserably in the INSERT/SELECT
> >> case, because by the time control gets here, the targetlist items have
> >> been given resdom names *corresponding to the column names of the target
> >> table*.
> >> 
> >> So, in the example at hand, "memberid" is matched to the correct column
> >> by pure luck (because it has the same name in the destination table),
> >> and then "totshippinghandling" is not recognized as one of the existing
> >> TLEs because it does not match any destination column name.
> >> 
> >> Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
> >> to mean the same thing no matter whether there is an INSERT in front of
> >> it or not, and thus that letting target column names affect the meaning
> >> of GROUP BY items is dead wrong.  (Don't have a spec to check this with,
> >> however.)
> >> 
> >> I believe the most reasonable fix for this is to postpone relabeling
> >> of the targetlist entries with destination column names until after
> >> analysis of the SELECT's subsidiary clauses is complete.  In particular,
> >> it should *not* be done instantly when each TLE is made, which is what
> >> MakeTargetEntryIdent currently does.  The TLEs should have the same
> >> resnames as in the SELECT case until after subsidiary clause processing
> >> is done.
> >> 
> >> (MakeTargetEntryIdent is broken anyway because it tries to associate
> >> a destination column with every TLE, even the resjunk ones.  The reason
> >> we see the quoted error message in this situation is that after
> >> findTargetlistEntry fails to detect that totshippinghandling is already
> >> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
> >> totshippinghandling, and then MakeTargetEntryIdent tries to find a
> >> target column to go with the junk TLE.  So the revised code should only
> >> assign dest column names to non-junk TLEs.)
> >> 
> >> I'm not really familiar enough with the parser to want to tackle this
> >> size of change by myself --- Thomas, do you want to do it?  I think it's
> >> largely a matter of moving code around, but I'm not sure where is the
> >> right place for it...
> >> 
> >> regards, tom lane
> >> 
> >> 
> 
> 
> > -- 
> >   Bruce Momjian                        |  http://www.op.net/~candle
> >   maillist@candle.pha.pa.us            |  (610) 853-3000
> >   +  If your life is a hard drive,     |  830 Blythe Avenue
> >   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> This is not done --- I wasn't willing to try to do such a thing by
>> myself when we were already in 6.5 beta.  It's on my todo list for 6.6.

> On your list.  Good.  I can't possibly figure out how to describe this
> bug.

If you want a TODO entry try * INSERT ... SELECT ... GROUP BY groups by target columns not source columns

There are other failure modes associated with this bug but that one will
do for the list.
        regards, tom lane


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Bruce Momjian
Date:
Tom, is this fixed?


> I believe I've identified the main cause of the peculiar behavior we
> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
> parser bug.
> 
> Here is the test case I'm looking at:
> 
> CREATE TABLE si_tmpverifyaccountbalances (
>         type int4 NOT NULL,
>         memberid int4 NOT NULL,
>         categoriesid int4 NOT NULL,
>         amount numeric);
> 
> CREATE TABLE invoicelinedetails (
>         invoiceid int4,
>         memberid int4,
>         totshippinghandling numeric,
>         invoicelinesid int4);
> 
> INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
> memberid, 1, totshippinghandling FROM invoicelinedetails
> GROUP BY invoiceid+3, memberid, totshippinghandling;
> 
> ERROR:  INSERT has more expressions than target columns
> 
> The reason this is coming out is that the matching of GROUP BY (also
> ORDER BY) items to targetlist entries is fundamentally broken in this
> context.  The GROUP BY items "memberid" and "totshippinghandling" are
> simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
> in parse_clause.c; what findTargetlistEntry() does with them is to try
> to match them against the resdom names of the existing targetlist items.
> I think that's correct behavior in the plain SELECT case (but note it
> means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
> --- is that per spec??).  But it fails miserably in the INSERT/SELECT
> case, because by the time control gets here, the targetlist items have
> been given resdom names *corresponding to the column names of the target
> table*.
> 
> So, in the example at hand, "memberid" is matched to the correct column
> by pure luck (because it has the same name in the destination table),
> and then "totshippinghandling" is not recognized as one of the existing
> TLEs because it does not match any destination column name.
> 
> Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
> to mean the same thing no matter whether there is an INSERT in front of
> it or not, and thus that letting target column names affect the meaning
> of GROUP BY items is dead wrong.  (Don't have a spec to check this with,
> however.)
> 
> I believe the most reasonable fix for this is to postpone relabeling
> of the targetlist entries with destination column names until after
> analysis of the SELECT's subsidiary clauses is complete.  In particular,
> it should *not* be done instantly when each TLE is made, which is what
> MakeTargetEntryIdent currently does.  The TLEs should have the same
> resnames as in the SELECT case until after subsidiary clause processing
> is done.
> 
> (MakeTargetEntryIdent is broken anyway because it tries to associate
> a destination column with every TLE, even the resjunk ones.  The reason
> we see the quoted error message in this situation is that after
> findTargetlistEntry fails to detect that totshippinghandling is already
> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
> totshippinghandling, and then MakeTargetEntryIdent tries to find a
> target column to go with the junk TLE.  So the revised code should only
> assign dest column names to non-junk TLEs.)
> 
> I'm not really familiar enough with the parser to want to tackle this
> size of change by myself --- Thomas, do you want to do it?  I think it's
> largely a matter of moving code around, but I'm not sure where is the
> right place for it...
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, is this fixed?

>> I believe I've identified the main cause of the peculiar behavior we
>> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
>> parser bug.

>> I believe the most reasonable fix for this is to postpone relabeling
>> of the targetlist entries with destination column names until after
>> analysis of the SELECT's subsidiary clauses is complete.

Yes, for 6.6.  There are some other INSERT ... SELECT cases that can't
be fixed until we have separate targetlists for the INSERT and the
source SELECT --- but I did take care of this particular issue.  The
column relabeling etc doesn't happen until after we've finished with
analyzing the SELECT subclause.
        regards, tom lane