Thread: ambiguous column names in subqueries

ambiguous column names in subqueries

From
Greg Stark
Date:
I have a query where I want to override one of the output column names. The
problem is that the columns are coming from a subquery. So I have do something
like:

select *, coalesce(a,b) as a from <subquery>

The problem is that * still includes column a. And replacing * with a complete
list of every column coming from the subquery is a non-starter. That would
make maintaining the query a total nightmare. Every change to the subquery
would require editing multiple levels of these explicit lists.

I thought Postgres already allowed for this by taking only the last column by
a given name. At least that's what I had observed in practice. It turns out it
wasn't postgres it was the driver that was doing it. 

Drivers obviously have no way to disambiguate either so apparently they just
return the last column by the specified name. The problem is this doesn't help
me when I want to use the column in an ORDER BY clause or elsewhere in the
query.

Is there any help in the SQL spec on this? Some syntax for disambiguating
references or removing columns from the output list?

Otherwise, I think Postgres should behave differently in this case:

For example:

slo=> select * from (select 1 as foo, 2 as foo);
ERROR:  subquery in FROM must have an alias
HINT:  For example, FROM (SELECT ...) [AS] foo.
slo=> select * from (select 1 as foo, 2 as foo) as x;foo | foo 
-----+-----  1 |   2
(1 row)


What purpose is there to returning both columns to the outer query? The
columns become effectively inaccessible. There's no syntax for disambiguating
any reference.

I think postgres should treat the second alias as hiding the first. Currently
there's no way to selectively override a single output column. The only way to
do is to put your query in a subquery and list every single output column
again except the one you want to override.

Note that I'm not saying Postgres should remove ambiguous columns from
different tables for the inner query. Only for subsequent layers where they
have no way to access them anyways.

-- 
greg



Re: ambiguous column names in subqueries

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Is there any help in the SQL spec on this?

Rename the columns at the output of the subselect, eg

select * from (select 1 as foo, 2 as foo) as x(foo1, foo2);
        regards, tom lane


Re: ambiguous column names in subqueries

From
Oliver Jowett
Date:
Greg Stark wrote:

> What purpose is there to returning both columns to the outer query? The
> columns become effectively inaccessible. There's no syntax for disambiguating
> any reference.
> 
> I think postgres should treat the second alias as hiding the first. Currently
> there's no way to selectively override a single output column. The only way to
> do is to put your query in a subquery and list every single output column
> again except the one you want to override.
> 
> Note that I'm not saying Postgres should remove ambiguous columns from
> different tables for the inner query. Only for subsequent layers where they
> have no way to access them anyways.

Please don't. JDBC (for example) has no problem with ambiguous columns, 
you just access them by index, and you have resultset metadata available 
if you want to implement your own rules for finding those indexes. It 
sounds like your problem really lies in the API you are using to access 
the results.

-O


Re: ambiguous column names in subqueries

From
Greg Stark
Date:
Oliver Jowett <oliver@opencloud.com> writes:

> Please don't. JDBC (for example) has no problem with ambiguous columns, you
> just access them by index, and you have resultset metadata available if you
> want to implement your own rules for finding those indexes. It sounds like your
> problem really lies in the API you are using to access the results.

The API with the problem is the SQL language. It's there that the columns
become inaccessible. It seems silly for the language to let you create
variables that you can't reference. 

Yes it's true that you could access them from the result set but that's even
worse. You have variables that you can't access from within the language but
that can escape to the outside world to cause real effects.

-- 
greg



Re: ambiguous column names in subqueries

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> select * from (select 1 as foo, 2 as foo) as x(foo1, foo2);

How is this different than simply listing all the columns instead of the "*"?

I still have the maintenance problem of having to edit the outer query every
time the list of columns from the inner query changes.

-- 
greg



Re: ambiguous column names in subqueries

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> select * from (select 1 as foo, 2 as foo) as x(foo1, foo2);

> I still have the maintenance problem of having to edit the outer query every
> time the list of columns from the inner query changes.

Yeah, but at least you only have to do it in one place.

AFAICS removing columns from the inner query because they have duplicate
names would violate the SQL spec, so it's not going to happen.
        regards, tom lane


Re: ambiguous column names in subqueries

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> AFAICS removing columns from the inner query because they have duplicate
> names would violate the SQL spec, so it's not going to happen.

That's really what I was asking I guess. Does the spec require the current
behaviour.

An alternative would be some way to explicitly remove columns from a result
set.

Something like

SELECT a+1 AS a, b+1 AS b, * EXCEPT (a,b) FROM (SELECT 1 AS a, 2 as b, 3 as c)

-- 
greg