Thread: autogenerated column names + views are a dump hazard

autogenerated column names + views are a dump hazard

From

Andres Freund

Date:

02 March 2015, 23:58:07

Hi,

Kathy (CCed) just found a bug in BDR that turned out to actually be a
bug in postgres.

CREATE VIEW v_03 AS
SELECT * FROM (SELECT '2' ORDER BY 1) s;

postgres[l]=# \d+ v_03                  View "public.v_03"
┌──────────┬──────┬───────────┬──────────┬─────────────┐
│  Column  │ Type │ Modifiers │ Storage  │ Description │
├──────────┼──────┼───────────┼──────────┼─────────────┤
│ ?column? │ text │           │ extended │             │
└──────────┴──────┴───────────┴──────────┴─────────────┘
View definition:SELECT s."?column?"  FROM ( SELECT '2'::text         ORDER BY '2'::text) s;

Note the added cast to determine the type of the expression and the
generated column name.

If you dump/load that view definition, or even just execute the query
the fun starts:
postgres[1]=#  SELECT s."?column?"  FROM ( SELECT '2'::text            ORDER BY '2'::text) s;
ERROR:  column s.?column? does not exist
LINE 1: SELECT s."?column?

Due to the added explicit cast the generated column name now isn't
"?column?" but "text".


The easiest way to solve this would teach ruleutils.c to simply always
attach AS clauses for auto-generated columnnames. Won't look too pretty
though. Does somebody have a better idea?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: autogenerated column names + views are a dump hazard

From

Tom Lane

Date:

03 March 2015, 00:33:04

Andres Freund <andres@2ndquadrant.com> writes:
> CREATE VIEW v_03 AS
> SELECT * FROM (SELECT '2' ORDER BY 1) s;

> View definition:
>  SELECT s."?column?"
>    FROM ( SELECT '2'::text
>           ORDER BY '2'::text) s;

> Note the added cast to determine the type of the expression and the
> generated column name.

Meh.

> The easiest way to solve this would teach ruleutils.c to simply always
> attach AS clauses for auto-generated columnnames. Won't look too pretty
> though. Does somebody have a better idea?

No, it would look awful :-(.  But I think we might have enough
infrastructure in there to notice whether the column is actually
referenced or not.  So we could label the columns only if necessary,
which would at least limit the ugliness.

Another idea would be to reconsider whether we need to force display
of the cast, though I'm a bit worried about the possible side-effects.
        regards, tom lane

Re: autogenerated column names + views are a dump hazard

From

Andres Freund

Date:

03 March 2015, 00:45:38

On 2015-03-02 16:32:53 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > The easiest way to solve this would teach ruleutils.c to simply always
> > attach AS clauses for auto-generated columnnames. Won't look too pretty
> > though. Does somebody have a better idea?
> 
> No, it would look awful :-(.  But I think we might have enough
> infrastructure in there to notice whether the column is actually
> referenced or not.  So we could label the columns only if necessary,
> which would at least limit the ugliness.

Yea, that'd be better. I guess I'll try to figure out how that code
works...

> Another idea would be to reconsider whether we need to force display
> of the cast, though I'm a bit worried about the possible side-effects.

Seems like a fair bit riskier approach to me, so I'm not inclined to
follow it. Might have been the better idea originally, but it seems like
a change would be far more likely to affect people.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: autogenerated column names + views are a dump hazard

From

Andres Freund

Date:

03 March 2015, 15:34:30

On 2015-03-02 16:32:53 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > The easiest way to solve this would teach ruleutils.c to simply always
> > attach AS clauses for auto-generated columnnames. Won't look too pretty
> > though. Does somebody have a better idea?
> 
> No, it would look awful :-(.

Hm, so I looked into it, and I think the problem is actually restricted
to columns where the typename that FigureColname() assigns is different
from the one that will result after ger_rule_expr()/get_const_expr()'s
implicit cast is added.

For this case it seems easiest if we'd make get_rule_expr() (and some of
its helpers) return whether an implicit cast has been added. Whenever an
implicit cast is added in the target list we should add an alias - as
the generated column name will be different from before. That's a bit
invasive, but doesn't seem too bad.  A quick hack doing so shows that
there's no changes in the regression output when doing it for consts.

> But I think we might have enough
> infrastructure in there to notice whether the column is actually
> referenced or not.  So we could label the columns only if necessary,
> which would at least limit the ugliness.

I don't think looking for referenced columns is actually sufficient,
unless you define "referenced" pretty widely at least.

Before the dump CREATE VIEW ... AS SELECT '2' ORDER BY 1; will yield a
'?column?'  columnname. After a dump/restore it'll be 'text'. That
doesn't seem desireable.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: autogenerated column names + views are a dump hazard

From

Tom Lane

Date:

03 March 2015, 17:39:17

Andres Freund <andres@2ndquadrant.com> writes:
> Hm, so I looked into it, and I think the problem is actually restricted
> to columns where the typename that FigureColname() assigns is different
> from the one that will result after ger_rule_expr()/get_const_expr()'s
> implicit cast is added.

> For this case it seems easiest if we'd make get_rule_expr() (and some of
> its helpers) return whether an implicit cast has been added.

Aside from being pretty ugly, that doesn't seem particularly bulletproof.
It might deal all right with this specific manifestation, but now that
we've seen the problem, who's to say that there aren't other ways to get
bit?  Or that some innocent future change to FigureColname() might not
create a new one?  It's not like the behavior of that function was handed
down on stone tablets --- we do change it from time to time.

I wondered whether there was a way to directly test whether
FigureColname() gives a different result from what we have recorded
as the effective column name.  Unfortunately, it wants a raw parse tree
whereas what we have is an analyzed parse tree, so there's no easy way
to apply it and see.

Now, we do have the deparsed text of the column expression at hand,
so in principle you could run that back through the grammar to get a raw
parse tree, and then apply FigureColname() to that.  Not sure what that
would be like from a performance standpoint, but it would provide a pretty
robust fix for this class of problem.

And on the third hand ... doing that would really only be robust as long
as you assume that the output will be read by a server using exactly the
same FigureColname() logic as what we are using.  So maybe the whole idea
is a bad one, and we should just bite the bullet and print AS clauses
always.

Or we could consider printing them always if the "pretty" flag isn't
on.  IIRC, pg_dump doesn't enable that.
        regards, tom lane

Re: autogenerated column names + views are a dump hazard

From

Andres Freund

Date:

03 March 2015, 18:58:10

On 2015-03-03 09:39:03 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > For this case it seems easiest if we'd make get_rule_expr() (and some of
> > its helpers) return whether an implicit cast has been added.
>
> Aside from being pretty ugly, that doesn't seem particularly
> bulletproof.

Right.

> It might deal all right with this specific manifestation, but now that
> we've seen the problem, who's to say that there aren't other ways to get
> bit? Or that some innocent future change to FigureColname() might not
> create a new one?  It's not like the behavior of that function was handed
> down on stone tablets --- we do change it from time to time.

Sure it'd not be a guarantee, but what is?

> I wondered whether there was a way to directly test whether
> FigureColname() gives a different result from what we have recorded
> as the effective column name.  Unfortunately, it wants a raw parse tree
> whereas what we have is an analyzed parse tree, so there's no easy way
> to apply it and see.

Additionally the casts added by get_rule_expr() show up in neither tree,
so that'd not in itself help.

> Now, we do have the deparsed text of the column expression at hand,
> so in principle you could run that back through the grammar to get a raw
> parse tree, and then apply FigureColname() to that.  Not sure what that
> would be like from a performance standpoint, but it would provide a pretty
> robust fix for this class of problem.

Yea, I've wondered about that as well. Given that this is pretty much
only run during deparsing I don't think the overhead would be relevant.

> And on the third hand ... doing that would really only be robust as long
> as you assume that the output will be read by a server using exactly the
> same FigureColname() logic as what we are using.  So maybe the whole idea
> is a bad one, and we should just bite the bullet and print AS clauses
> always.

I think this is the way to go though. There's different extremes we can
go to though - the easiest is to simply remove the attname = "?column?"
assignment from get_target_list(). That way plain Var references (aside
from whole row vars/subplans and such that get_variable() deals with)
don't get a forced alias, but everything else does.  That seems like a
good compromise between readability and safety. get_rule_expr() deals
with most of the "nasty" stuff.

I did this, and the noise in the regression test output isn't too
bad. Primarily that a fair number of of EXISTS(SELECT 1 .. ) grow an
alias.

> Or we could consider printing them always if the "pretty" flag isn't
> on.  IIRC, pg_dump doesn't enable that.

Not a fan of that, I'd rather not output queries that can't be executed.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: autogenerated column names + views are a dump hazard

From

Tom Lane

Date:

03 March 2015, 19:09:39

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-03-03 09:39:03 -0500, Tom Lane wrote:
>> And on the third hand ... doing that would really only be robust as long
>> as you assume that the output will be read by a server using exactly the
>> same FigureColname() logic as what we are using.  So maybe the whole idea
>> is a bad one, and we should just bite the bullet and print AS clauses
>> always.

> I think this is the way to go though. There's different extremes we can
> go to though - the easiest is to simply remove the attname = "?column?"
> assignment from get_target_list(). That way plain Var references (aside
> from whole row vars/subplans and such that get_variable() deals with)
> don't get a forced alias, but everything else does.  That seems like a
> good compromise between readability and safety. get_rule_expr() deals
> with most of the "nasty" stuff.

I wasn't aware that there was any room for "compromise" on the safety
aspect.  If pg_dump gets this wrong, that means pg_upgrade is broken,
for example.  That's why I was thinking that relying on the pretty
flag might be a reasonable thing.  pg_dump would be unconditionally
right, and we'd not uglify EXPLAIN or \d+ output.
        regards, tom lane

Re: autogenerated column names + views are a dump hazard

From

Andres Freund

Date:

03 March 2015, 19:28:18

On 2015-03-03 11:09:29 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2015-03-03 09:39:03 -0500, Tom Lane wrote:
> > I think this is the way to go though. There's different extremes we can
> > go to though - the easiest is to simply remove the attname = "?column?"
> > assignment from get_target_list(). That way plain Var references (aside
> > from whole row vars/subplans and such that get_variable() deals with)
> > don't get a forced alias, but everything else does.  That seems like a
> > good compromise between readability and safety. get_rule_expr() deals
> > with most of the "nasty" stuff.
> 
> I wasn't aware that there was any room for "compromise" on the safety
> aspect.

Meh.

I think that *not* printing an alias for a plain column reference is
reasonable and doesn't present a relevant risk even in the face of
future changes. "foo.bar AS bar" is just a bit too redundant imo.

> That's why I was thinking that relying on the pretty flag might be a
> reasonable thing.  pg_dump would be unconditionally right, and we'd
> not uglify EXPLAIN or \d+ output.

I don't think EXPLAIN VERBOSE currently is affected - it doesn't seem to
show aliases/generated column names at all.

I personally do occasionally copy & paste from from \d+ output, so I'd
rather have slightly more verbose output rather than wrong output.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services