Some progress on INSERT/SELECT/GROUP BY bugs - Mailing list pgsql-hackers

From Tom Lane
Subject Some progress on INSERT/SELECT/GROUP BY bugs
Date
Msg-id 13811.926640105@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.

Here is the test case I'm looking at:

CREATE TABLE si_tmpverifyaccountbalances (       type int4 NOT NULL,       memberid int4 NOT NULL,       categoriesid
int4NOT NULL,       amount numeric);
 

CREATE TABLE invoicelinedetails (       invoiceid int4,       memberid int4,       totshippinghandling numeric,
invoicelinesidint4);
 

INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;

ERROR:  INSERT has more expressions than target columns

The reason this is coming out is that the matching of GROUP BY (also
ORDER BY) items to targetlist entries is fundamentally broken in this
context.  The GROUP BY items "memberid" and "totshippinghandling" are
simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
in parse_clause.c; what findTargetlistEntry() does with them is to try
to match them against the resdom names of the existing targetlist items.
I think that's correct behavior in the plain SELECT case (but note it
means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
--- is that per spec??).  But it fails miserably in the INSERT/SELECT
case, because by the time control gets here, the targetlist items have
been given resdom names *corresponding to the column names of the target
table*.

So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.

Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong.  (Don't have a spec to check this with,
however.)

I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete.  In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does.  The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.

(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones.  The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE.  So the revised code should only
assign dest column names to non-junk TLEs.)

I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it?  I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: Re: Report on NetBSD/mac port of Postgres 6.4.2
Next
From: Tom Lane
Date:
Subject: Progress on char(n) default-value problem