Re: UNION and pg_restore - Mailing list pgsql-general

From Tom Lane
Subject Re: UNION and pg_restore
Date
Msg-id 10880.1356129420@sss.pgh.pa.us
Whole thread Raw
In response to UNION and pg_restore  (Bryan Lee Nuse <nuse@uga.edu>)
Responses Re: UNION and pg_restore
List pgsql-general
Bryan Lee Nuse <nuse@uga.edu> writes:
> My question is, then, how is it that the query embodied in "view_1" below executes fine, but cannot seem to be
restored? Is this telling me my query is dumb?  If so, any advice on how to easily derive "view_1" from "tab_1" and
"tab_2"below, without baffling pg_restore, would be welcome. 

The core issue is that you've got

    SELECT * FROM (SELECT (-999), (-999)) b;

If you execute this on its own, you'll get

 ?column? | ?column?
----------+----------
     -999 |     -999
(1 row)

that is, both columns have been assigned the arbitrary name "?column?".
That's okay in itself, because the "*" is expanded positionally and
doesn't care whether the columns have distinct names or not.  However,
when the view definition is dumped out, you have

> View definition:
>          SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2
>            FROM tab_1
> UNION
>          SELECT a.id_1, a.id_2, a.id_3, b."?column?" AS data_1, b."?column?" AS data_2
>            FROM ( SELECT tab_2.id_1, tab_2.id_2, tab_2.id_3
>                    FROM tab_2) a
>      CROSS JOIN ( SELECT (-999), (-999)) b;

and now there are ambiguous references to b."?column?" in there.

The short answer to this is that you ought to take some care that the
columns of the sub-select have distinct names.  You could do it in
at least two ways:

    (SELECT (-999), (-999)) b(x,y)
    (SELECT (-999) AS x, (-999) AS y) b

Now you're doubtless wondering why Postgres doesn't dodge this ambiguity
for you.  One way would be to dump out the view using the "*" notation,
but it turns out that that is contrary to SQL standard: "*" must be
expanded at parse time.  (The place in the standard that mandates this
is a little note under ALTER TABLE ADD COLUMN that says that adding a
column doesn't affect the meaning of any previously defined views.
Hence "SELECT * FROM foo" has to be interpreted as expanding the "*"
immediately, lest its meaning change due to subsequent addition of a
column to foo.)

The other thing we might do is to generate distinct names for unnamed
columns.  That would probably be a good idea and maybe we'll do it
sooner or later, but it's not required by (recent versions of) the SQL
standard.  It would in any case not eliminate the risk entirely, because
this is perfectly legal SQL:

    SELECT * FROM (SELECT (-999) AS x, (-999) AS x) b

but once the "*" has been expanded there is actually no spec-compliant
representation of this query.  So there's always going to be some "don't
do that" aspect here.

There are some other fun aspects of trying to ensure that dumped views
can be reloaded.  In particular, the possibility that columns of
underlying tables can be renamed introduces all sorts of hazards :-(.
There's been some discussion of how we might make that safer, but a
bulletproof solution seems to require introducing nonstandard syntax
for things like JOIN USING.  People haven't been too excited about that.

            regards, tom lane


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Coalesce bug ?
Next
From: Chris Angelico
Date:
Subject: Re: Coalesce bug ?