Thread: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
"Sergey Burladyan"
Date:
The following bug has been logged online:

Bug reference:      5234
Logged by:          Sergey Burladyan
Email address:      eshkinkot@gmail.com
PostgreSQL version: 8.3.8
Operating system:   Debian GNU/Linux 5.0.3 (lenny) + testing
Description:        ALTER TABLE ... RENAME COLUMN change view definition
incorrectly
Details:

reported by Weed at http://www.sql.ru/forum/actualthread.aspx?tid=717835
(Russian)

Example:
create table a (i int, v text);
create table b (j int, v text);
create view v_using as select * from a left join b using (v);
alter table a rename v to o;
\d v_using

CREATE TABLE
CREATE TABLE
CREATE VIEW
ALTER TABLE
    View "public.v_using"
 Column |  Type   | Modifiers
--------+---------+-----------
 v      | text    |
 i      | integer |
 j      | integer |
View definition:
 SELECT a.o AS v, a.i, b.j
   FROM a
   LEFT JOIN b USING (v);

View is still working, but it text definition is incorrect:
t1=> select * from v_using ;
 v | i | j
---+---+---
(0 rows)

t1=>  SELECT a.o AS v, a.i, b.j
t1->    FROM a
t1->    LEFT JOIN b USING (v);
ERROR:  42703: column "v" specified in USING clause does not exist in left
table
LOCATION:  transformFromClauseItem, parse_clause.c:813

If you dump database in this state, when you cannot restore this dump
without manual fix:
$ pg_dump -Fc -f dump t1
$ pg_restore dump | grep -A2 VIEW
-- Name: v_using; Type: VIEW; Schema: public; Owner: seb
--

CREATE VIEW v_using AS
    SELECT a.o AS v, a.i, b.j FROM (a LEFT JOIN b USING (v));

$ LANG=C sudo -u postgres pg_restore -c -d t1 dump
. . .
pg_restore: [archiver (db)] could not execute query: ERROR:  column "v"
specified in USING clause does not exist in left table
    Command was: CREATE VIEW v_using AS
            SELECT a.o AS v, a.i, b.j FROM (a LEFT JOIN b USING (v));
. . .

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Sun, Dec 6, 2009 at 5:41 PM, Sergey Burladyan <eshkinkot@gmail.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5234
> Logged by:          Sergey Burladyan
> Email address:      eshkinkot@gmail.com
> PostgreSQL version: 8.3.8
> Operating system:   Debian GNU/Linux 5.0.3 (lenny) + testing
> Description:        ALTER TABLE ... RENAME COLUMN change view definition
> incorrectly
> Details:
>
> reported by Weed at http://www.sql.ru/forum/actualthread.aspx?tid=717835
> (Russian)
>
> Example:
> create table a (i int, v text);
> create table b (j int, v text);
> create view v_using as select * from a left join b using (v);
> alter table a rename v to o;
> \d v_using
>
> CREATE TABLE
> CREATE TABLE
> CREATE VIEW
> ALTER TABLE
>    View "public.v_using"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  v      | text    |
>  i      | integer |
>  j      | integer |
> View definition:
>  SELECT a.o AS v, a.i, b.j
>   FROM a
>   LEFT JOIN b USING (v);
>
> View is still working, but it text definition is incorrect:
> t1=> select * from v_using ;
>  v | i | j
> ---+---+---
> (0 rows)
>
> t1=>  SELECT a.o AS v, a.i, b.j
> t1->    FROM a
> t1->    LEFT JOIN b USING (v);
> ERROR:  42703: column "v" specified in USING clause does not exist in left
> table
> LOCATION:  transformFromClauseItem, parse_clause.c:813
>
> If you dump database in this state, when you cannot restore this dump
> without manual fix:
> $ pg_dump -Fc -f dump t1
> $ pg_restore dump | grep -A2 VIEW
> -- Name: v_using; Type: VIEW; Schema: public; Owner: seb
> --
>
> CREATE VIEW v_using AS
>    SELECT a.o AS v, a.i, b.j FROM (a LEFT JOIN b USING (v));
>
> $ LANG=C sudo -u postgres pg_restore -c -d t1 dump
> . . .
> pg_restore: [archiver (db)] could not execute query: ERROR:  column "v"
> specified in USING clause does not exist in left table
>    Command was: CREATE VIEW v_using AS
>            SELECT a.o AS v, a.i, b.j FROM (a LEFT JOIN b USING (v));

The problem here seems to be that get_from_clause_item() is a bit
naive about the JoinExpr representation.  usingClause is stored as a
text argument, but get_from_clause_item() is assuming that the schema
definitions have not changed since the parse tree was created.
isNatural has the same problem.

I'm not an expert on this area of the code, but can we just ignore
isNatural and usingClause when deparsing?  The attached patch fixes
the bug for me, although I couldn't swear as to whether it breaks
anything else.

...Robert

Attachment
Robert Haas <robertmhaas@gmail.com> writes:
> I'm not an expert on this area of the code, but can we just ignore
> isNatural and usingClause when deparsing?

No.  These properties are *not* ignorable because doing so changes the
set of returned columns --- you should get only one column out not two.

My reading of the spec is that USING (and therefore NATURAL) is defined
to join identically named columns.  Therefore, renaming one of the input
columns as the OP did *should* indeed *must* break the view.  The problem
is not how to make it work, it's how to give an error message that
doesn't look like an internal failure.

(This wasn't one of the SQL committee's better ideas IMO, but ours not
to reason why.)

            regards, tom lane

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm not an expert on this area of the code, but can we just ignore
>> isNatural and usingClause when deparsing?
>
> No. =A0These properties are *not* ignorable because doing so changes the
> set of returned columns --- you should get only one column out not two.
>
> My reading of the spec is that USING (and therefore NATURAL) is defined
> to join identically named columns. =A0Therefore, renaming one of the input
> columns as the OP did *should* indeed *must* break the view. =A0The probl=
em
> is not how to make it work, it's how to give an error message that
> doesn't look like an internal failure.

That seems ugly and unnecessary.  I think we might be able to define
ourselves out of this problem.  We don't guarantee (and have never
guaranteed) that selecting from a stored view will produce the same
results as re-executing the original query.  For example, * refers the
list of columns at definition-time, not execution-time, and if a
column is renamed, the view still refers to the same column; it
doesn't start crashing, nor would we want it to.  Similarly, here, the
USING is internally converted to an equality join on the two columns,
and the ambiguous output column is, I think, resolved in favor of one
of them.  I think we can just say that that conversion happens in toto
at parse-time, just as the *-to-column-list conversion and the
column-name-to-column-reference conversions do.  This seems like a
significantly more useful behavior and as a fringe benefit it
simplifies the code.

Moreover, it's basically what we're already doing for so long as the
view remains safely stored in the database; it just becomes
undumpable.  If we adopt your solution, backpatching it might break
working applications and not backpatching it will leave residual
breakage when people try to upgrade.  If we just make the dumping code
work the same way the executor already does, we don't have that
problem.

...Robert

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Andrew Gierth
Date:
>>>>> "Robert" =3D=3D Robert Haas <robertmhaas@gmail.com> writes:

 > On Thu, Dec 10, 2009 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
 >>=20
 >> My reading of the spec is that USING (and therefore NATURAL) is
 >> defined to join identically named columns. =A0Therefore, renaming
 >> one of the input columns as the OP did *should* indeed *must*
 >> break the view. =A0The problem is not how to make it work, it's how
 >> to give an error message that doesn't look like an internal
 >> failure.

 Robert> That seems ugly and unnecessary.  I think we might be able to
 Robert> define ourselves out of this problem.  We don't guarantee
 Robert> (and have never guaranteed) that selecting from a stored view
 Robert> will produce the same results as re-executing the original
 Robert> query.  For example, * refers the list of columns at
 Robert> definition-time, not execution-time, and if a column is
 Robert> renamed, the view still refers to the same column; it doesn't
 Robert> start crashing, nor would we want it to.  Similarly, here,
 Robert> the USING is internally converted to an equality join on the
 Robert> two columns, and the ambiguous output column is, I think,
 Robert> resolved in favor of one of them.  I think we can just say
 Robert> that that conversion happens in toto at parse-time, just as
 Robert> the *-to-column-list conversion and the
 Robert> column-name-to-column-reference conversions do.  This seems
 Robert> like a significantly more useful behavior and as a fringe
 Robert> benefit it simplifies the code.

There's another possible solution (albeit a somewhat nontrivial one)
which came up when a bunch of us were talking about this one on IRC;
which is to handle the problem in the view deparse: if a column used
in a USING clause has been renamed, add an alias to the query that
renames it back, e.g.
  select ... from table1 as table1(v,a) join ... using (v)

This would have to affect all the other references to that same column
in the query, so you'd need to do something like this: before deparsing,
walk the query looking for offending USING clauses, and make a list of
renamings to apply to column names.

I haven't tried actually implementing this, but I believe it is
possible.

--=20
Andrew (irc:RhodiumToad)

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 10:48 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Robert" =3D=3D Robert Haas <robertmhaas@gmail.com> writes:
>
> =A0> On Thu, Dec 10, 2009 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> =A0>>
> =A0>> My reading of the spec is that USING (and therefore NATURAL) is
> =A0>> defined to join identically named columns. =A0Therefore, renaming
> =A0>> one of the input columns as the OP did *should* indeed *must*
> =A0>> break the view. =A0The problem is not how to make it work, it's how
> =A0>> to give an error message that doesn't look like an internal
> =A0>> failure.
>
> =A0Robert> That seems ugly and unnecessary. =A0I think we might be able to
> =A0Robert> define ourselves out of this problem. =A0We don't guarantee
> =A0Robert> (and have never guaranteed) that selecting from a stored view
> =A0Robert> will produce the same results as re-executing the original
> =A0Robert> query. =A0For example, * refers the list of columns at
> =A0Robert> definition-time, not execution-time, and if a column is
> =A0Robert> renamed, the view still refers to the same column; it doesn't
> =A0Robert> start crashing, nor would we want it to. =A0Similarly, here,
> =A0Robert> the USING is internally converted to an equality join on the
> =A0Robert> two columns, and the ambiguous output column is, I think,
> =A0Robert> resolved in favor of one of them. =A0I think we can just say
> =A0Robert> that that conversion happens in toto at parse-time, just as
> =A0Robert> the *-to-column-list conversion and the
> =A0Robert> column-name-to-column-reference conversions do. =A0This seems
> =A0Robert> like a significantly more useful behavior and as a fringe
> =A0Robert> benefit it simplifies the code.
>
> There's another possible solution (albeit a somewhat nontrivial one)
> which came up when a bunch of us were talking about this one on IRC;
> which is to handle the problem in the view deparse: if a column used
> in a USING clause has been renamed, add an alias to the query that
> renames it back, e.g.
> =A0select ... from table1 as table1(v,a) join ... using (v)
>
> This would have to affect all the other references to that same column
> in the query, so you'd need to do something like this: before deparsing,
> walk the query looking for offending USING clauses, and make a list of
> renamings to apply to column names.
>
> I haven't tried actually implementing this, but I believe it is
> possible.

What advantage does this offer?

...Robert
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 10, 2009 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My reading of the spec is that USING (and therefore NATURAL) is defined
>> to join identically named columns.  Therefore, renaming one of the input
>> columns as the OP did *should* indeed *must* break the view.  The problem
>> is not how to make it work, it's how to give an error message that
>> doesn't look like an internal failure.

> That seems ugly and unnecessary.  I think we might be able to define
> ourselves out of this problem.  We don't guarantee (and have never
> guaranteed) that selecting from a stored view will produce the same
> results as re-executing the original query.  For example, * refers the
> list of columns at definition-time, not execution-time,

Um, aren't you contradicting yourself there?

The problem with USING is that it is not merely a join condition but
affects the set of columns emitted by the join.  It can't be converted
to a simple ON without changing the semantics, and I don't believe we
should try.

            regards, tom lane
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> There's another possible solution (albeit a somewhat nontrivial one)
> which came up when a bunch of us were talking about this one on IRC;
> which is to handle the problem in the view deparse: if a column used
> in a USING clause has been renamed, add an alias to the query that
> renames it back, e.g.
>   select ... from table1 as table1(v,a) join ... using (v)

Hmm.  Cute, but I wonder why we shouldn't just be throwing an error.
As I said last night, the only thing I see wrong with the current
behavior is that the error message isn't phrased in a way that makes
it obvious it's the user's fault.

            regards, tom lane

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 11:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> There's another possible solution (albeit a somewhat nontrivial one)
>> which came up when a bunch of us were talking about this one on IRC;
>> which is to handle the problem in the view deparse: if a column used
>> in a USING clause has been renamed, add an alias to the query that
>> renames it back, e.g.
>> =A0 select ... from table1 as table1(v,a) join ... using (v)
>
> Hmm. =A0Cute, but I wonder why we shouldn't just be throwing an error.
> As I said last night, the only thing I see wrong with the current
> behavior is that the error message isn't phrased in a way that makes
> it obvious it's the user's fault.

I think the problem we're trying to solve is that the view works but
it isn't dumpable.

...Robert

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 10, 2009 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My reading of the spec is that USING (and therefore NATURAL) is defined
>>> to join identically named columns. =A0Therefore, renaming one of the in=
put
>>> columns as the OP did *should* indeed *must* break the view. =A0The pro=
blem
>>> is not how to make it work, it's how to give an error message that
>>> doesn't look like an internal failure.
>
>> That seems ugly and unnecessary. =A0I think we might be able to define
>> ourselves out of this problem. =A0We don't guarantee (and have never
>> guaranteed) that selecting from a stored view will produce the same
>> results as re-executing the original query. =A0For example, * refers the
>> list of columns at definition-time, not execution-time,
>
> Um, aren't you contradicting yourself there?

Not that I can see.

> The problem with USING is that it is not merely a join condition but
> affects the set of columns emitted by the join. =A0It can't be converted
> to a simple ON without changing the semantics, and I don't believe we
> should try.

The OP seemed to be pretty clear on what the semantics should be, and
I'm not confused either.  Why are you?  :-)

...Robert
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 10, 2009 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The problem with USING is that it is not merely a join condition but
>> affects the set of columns emitted by the join.  It can't be converted
>> to a simple ON without changing the semantics, and I don't believe we
>> should try.

> The OP seemed to be pretty clear on what the semantics should be, and
> I'm not confused either.  Why are you?  :-)

I'm not confused: it's an error condition.  You have not explained why
it isn't.

What would actually be ideal is to throw an error on the attempted column
rename, but I doubt the problem is worth the additions to system
infrastructure that would be necessary to make that happen.

            regards, tom lane

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Thu, Dec 10, 2009 at 11:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > Hmm.  Cute, but I wonder why we shouldn't just be throwing an error.
> > As I said last night, the only thing I see wrong with the current
> > behavior is that the error message isn't phrased in a way that makes
> > it obvious it's the user's fault.
>
> I think the problem we're trying to solve is that the view works but
> it isn't dumpable.

So it would also be a solution that the ALTER TABLE that renamed the
column throwed an error if there's a view that requires the name.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 1:02 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Robert Haas escribi=F3:
>> On Thu, Dec 10, 2009 at 11:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> > Hmm. =A0Cute, but I wonder why we shouldn't just be throwing an error.
>> > As I said last night, the only thing I see wrong with the current
>> > behavior is that the error message isn't phrased in a way that makes
>> > it obvious it's the user's fault.
>>
>> I think the problem we're trying to solve is that the view works but
>> it isn't dumpable.
>
> So it would also be a solution that the ALTER TABLE that renamed the
> column throwed an error if there's a view that requires the name.

Yes.  I don't think it's as good of a solution, but it is a solution.

...Robert

Re: BUG #5234: ALTER TABLE ... RENAME COLUMN change view definition incorrectly

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 12:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 10, 2009 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The problem with USING is that it is not merely a join condition but
>>> affects the set of columns emitted by the join. =A0It can't be converted
>>> to a simple ON without changing the semantics, and I don't believe we
>>> should try.
>
>> The OP seemed to be pretty clear on what the semantics should be, and
>> I'm not confused either. =A0Why are you? =A0:-)
>
> I'm not confused: it's an error condition. =A0You have not explained why
> it isn't.

Well, it's obviously not an error condition at present, because the
view works perfectly well.  As far as running SELECT statements
against the view, the QUERY structure in ev_rewrite is semantically
equivalent to one without the usingClause.  buildMergedJoinVar()
builds a target list entry that refers to one or both of the
underlying base tables, using COALESCE if necessary.  Nothing in that
join var depends IN ANY WAY on the usingClause.  The JoinExpr
similarly gets rewritten in terms of an equality construct - while
it's true that the usingClause is still present, nothing outside the
parser/deparser actually looks at it.

For that reason, your claim that we CAN'T rewrite USING to a simple ON
without changing the semantics seems to me to be false.  For purposes
of 99.44% of the code, we are already doing it.  The only problem is,
after we do it, and after using that rewritten version to execute the
query, we maintain the fiction that we haven't done it when we dump
the view back out, leading to inconsistent results.

...Robert