Thread: autogenerated column names + views are a dump hazard
Hi, Kathy (CCed) just found a bug in BDR that turned out to actually be a bug in postgres. CREATE VIEW v_03 AS SELECT * FROM (SELECT '2' ORDER BY 1) s; postgres[l]=# \d+ v_03 View "public.v_03" ┌──────────┬──────┬───────────┬──────────┬─────────────┐ │ Column │ Type │ Modifiers │ Storage │ Description │ ├──────────┼──────┼───────────┼──────────┼─────────────┤ │ ?column? │ text │ │ extended │ │ └──────────┴──────┴───────────┴──────────┴─────────────┘ View definition:SELECT s."?column?" FROM ( SELECT '2'::text ORDER BY '2'::text) s; Note the added cast to determine the type of the expression and the generated column name. If you dump/load that view definition, or even just execute the query the fun starts: postgres[1]=# SELECT s."?column?" FROM ( SELECT '2'::text ORDER BY '2'::text) s; ERROR: column s.?column? does not exist LINE 1: SELECT s."?column? Due to the added explicit cast the generated column name now isn't "?column?" but "text". The easiest way to solve this would teach ruleutils.c to simply always attach AS clauses for auto-generated columnnames. Won't look too pretty though. Does somebody have a better idea? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > CREATE VIEW v_03 AS > SELECT * FROM (SELECT '2' ORDER BY 1) s; > View definition: > SELECT s."?column?" > FROM ( SELECT '2'::text > ORDER BY '2'::text) s; > Note the added cast to determine the type of the expression and the > generated column name. Meh. > The easiest way to solve this would teach ruleutils.c to simply always > attach AS clauses for auto-generated columnnames. Won't look too pretty > though. Does somebody have a better idea? No, it would look awful :-(. But I think we might have enough infrastructure in there to notice whether the column is actually referenced or not. So we could label the columns only if necessary, which would at least limit the ugliness. Another idea would be to reconsider whether we need to force display of the cast, though I'm a bit worried about the possible side-effects. regards, tom lane
On 2015-03-02 16:32:53 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > The easiest way to solve this would teach ruleutils.c to simply always > > attach AS clauses for auto-generated columnnames. Won't look too pretty > > though. Does somebody have a better idea? > > No, it would look awful :-(. But I think we might have enough > infrastructure in there to notice whether the column is actually > referenced or not. So we could label the columns only if necessary, > which would at least limit the ugliness. Yea, that'd be better. I guess I'll try to figure out how that code works... > Another idea would be to reconsider whether we need to force display > of the cast, though I'm a bit worried about the possible side-effects. Seems like a fair bit riskier approach to me, so I'm not inclined to follow it. Might have been the better idea originally, but it seems like a change would be far more likely to affect people. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2015-03-02 16:32:53 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > The easiest way to solve this would teach ruleutils.c to simply always > > attach AS clauses for auto-generated columnnames. Won't look too pretty > > though. Does somebody have a better idea? > > No, it would look awful :-(. Hm, so I looked into it, and I think the problem is actually restricted to columns where the typename that FigureColname() assigns is different from the one that will result after ger_rule_expr()/get_const_expr()'s implicit cast is added. For this case it seems easiest if we'd make get_rule_expr() (and some of its helpers) return whether an implicit cast has been added. Whenever an implicit cast is added in the target list we should add an alias - as the generated column name will be different from before. That's a bit invasive, but doesn't seem too bad. A quick hack doing so shows that there's no changes in the regression output when doing it for consts. > But I think we might have enough > infrastructure in there to notice whether the column is actually > referenced or not. So we could label the columns only if necessary, > which would at least limit the ugliness. I don't think looking for referenced columns is actually sufficient, unless you define "referenced" pretty widely at least. Before the dump CREATE VIEW ... AS SELECT '2' ORDER BY 1; will yield a '?column?' columnname. After a dump/restore it'll be 'text'. That doesn't seem desireable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Hm, so I looked into it, and I think the problem is actually restricted > to columns where the typename that FigureColname() assigns is different > from the one that will result after ger_rule_expr()/get_const_expr()'s > implicit cast is added. > For this case it seems easiest if we'd make get_rule_expr() (and some of > its helpers) return whether an implicit cast has been added. Aside from being pretty ugly, that doesn't seem particularly bulletproof. It might deal all right with this specific manifestation, but now that we've seen the problem, who's to say that there aren't other ways to get bit? Or that some innocent future change to FigureColname() might not create a new one? It's not like the behavior of that function was handed down on stone tablets --- we do change it from time to time. I wondered whether there was a way to directly test whether FigureColname() gives a different result from what we have recorded as the effective column name. Unfortunately, it wants a raw parse tree whereas what we have is an analyzed parse tree, so there's no easy way to apply it and see. Now, we do have the deparsed text of the column expression at hand, so in principle you could run that back through the grammar to get a raw parse tree, and then apply FigureColname() to that. Not sure what that would be like from a performance standpoint, but it would provide a pretty robust fix for this class of problem. And on the third hand ... doing that would really only be robust as long as you assume that the output will be read by a server using exactly the same FigureColname() logic as what we are using. So maybe the whole idea is a bad one, and we should just bite the bullet and print AS clauses always. Or we could consider printing them always if the "pretty" flag isn't on. IIRC, pg_dump doesn't enable that. regards, tom lane
On 2015-03-03 09:39:03 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > For this case it seems easiest if we'd make get_rule_expr() (and some of > > its helpers) return whether an implicit cast has been added. > > Aside from being pretty ugly, that doesn't seem particularly > bulletproof. Right. > It might deal all right with this specific manifestation, but now that > we've seen the problem, who's to say that there aren't other ways to get > bit? Or that some innocent future change to FigureColname() might not > create a new one? It's not like the behavior of that function was handed > down on stone tablets --- we do change it from time to time. Sure it'd not be a guarantee, but what is? > I wondered whether there was a way to directly test whether > FigureColname() gives a different result from what we have recorded > as the effective column name. Unfortunately, it wants a raw parse tree > whereas what we have is an analyzed parse tree, so there's no easy way > to apply it and see. Additionally the casts added by get_rule_expr() show up in neither tree, so that'd not in itself help. > Now, we do have the deparsed text of the column expression at hand, > so in principle you could run that back through the grammar to get a raw > parse tree, and then apply FigureColname() to that. Not sure what that > would be like from a performance standpoint, but it would provide a pretty > robust fix for this class of problem. Yea, I've wondered about that as well. Given that this is pretty much only run during deparsing I don't think the overhead would be relevant. > And on the third hand ... doing that would really only be robust as long > as you assume that the output will be read by a server using exactly the > same FigureColname() logic as what we are using. So maybe the whole idea > is a bad one, and we should just bite the bullet and print AS clauses > always. I think this is the way to go though. There's different extremes we can go to though - the easiest is to simply remove the attname = "?column?" assignment from get_target_list(). That way plain Var references (aside from whole row vars/subplans and such that get_variable() deals with) don't get a forced alias, but everything else does. That seems like a good compromise between readability and safety. get_rule_expr() deals with most of the "nasty" stuff. I did this, and the noise in the regression test output isn't too bad. Primarily that a fair number of of EXISTS(SELECT 1 .. ) grow an alias. > Or we could consider printing them always if the "pretty" flag isn't > on. IIRC, pg_dump doesn't enable that. Not a fan of that, I'd rather not output queries that can't be executed. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2015-03-03 09:39:03 -0500, Tom Lane wrote: >> And on the third hand ... doing that would really only be robust as long >> as you assume that the output will be read by a server using exactly the >> same FigureColname() logic as what we are using. So maybe the whole idea >> is a bad one, and we should just bite the bullet and print AS clauses >> always. > I think this is the way to go though. There's different extremes we can > go to though - the easiest is to simply remove the attname = "?column?" > assignment from get_target_list(). That way plain Var references (aside > from whole row vars/subplans and such that get_variable() deals with) > don't get a forced alias, but everything else does. That seems like a > good compromise between readability and safety. get_rule_expr() deals > with most of the "nasty" stuff. I wasn't aware that there was any room for "compromise" on the safety aspect. If pg_dump gets this wrong, that means pg_upgrade is broken, for example. That's why I was thinking that relying on the pretty flag might be a reasonable thing. pg_dump would be unconditionally right, and we'd not uglify EXPLAIN or \d+ output. regards, tom lane
On 2015-03-03 11:09:29 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2015-03-03 09:39:03 -0500, Tom Lane wrote: > > I think this is the way to go though. There's different extremes we can > > go to though - the easiest is to simply remove the attname = "?column?" > > assignment from get_target_list(). That way plain Var references (aside > > from whole row vars/subplans and such that get_variable() deals with) > > don't get a forced alias, but everything else does. That seems like a > > good compromise between readability and safety. get_rule_expr() deals > > with most of the "nasty" stuff. > > I wasn't aware that there was any room for "compromise" on the safety > aspect. Meh. I think that *not* printing an alias for a plain column reference is reasonable and doesn't present a relevant risk even in the face of future changes. "foo.bar AS bar" is just a bit too redundant imo. > That's why I was thinking that relying on the pretty flag might be a > reasonable thing. pg_dump would be unconditionally right, and we'd > not uglify EXPLAIN or \d+ output. I don't think EXPLAIN VERBOSE currently is affected - it doesn't seem to show aliases/generated column names at all. I personally do occasionally copy & paste from from \d+ output, so I'd rather have slightly more verbose output rather than wrong output. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services