Thread: Schemas vs. PostQUEL: resolving qualified identifiers

Schemas vs. PostQUEL: resolving qualified identifiers

From

Tom Lane

Date:

18 January 2002, 19:40:26

I'm starting to think about how to do SQL-compatible schemas in Postgres.

One of the first issues that comes up is that Postgres has interpretations
of qualified (dotted) names that conflict with what the standard says to
do.  This stuff is a hangover from Berkeley days and PostQUEL.  Briefly,
since there are no schemas in PG, anytime we have a dotted name we know
that the first component *must* be a table name.  Successive components are
then resolved as either column or function names applied to whatever we
have so far.  For example,
a.b.c    is equivalent to c(a.b) if c is a function

The equivalence works the other way too: you can write colname(tabname)
and it will be taken as tabname.colname (though this seems to work only
if tabname already has a rangetable entry).

This is not going to fly together with SQL92, which wants a.b.c to mean
schema a, table b, column c.

I believe we can resolve the conflict without breaking any cases that
are likely to be used much in practice.  Here's my proposal:

First, I'd like to allow the notation "table.*" to be used as a function
argument, representing passing a whole-row value to a function (the
function's argument would be declared as the table rowtype).  Presently
this is done by writing just the undecorated table name, but that is
ambiguous if the same name is also used as a column name in the query.
Also, we need this notation for use with qualified table names.

Having done that, we can resolve names appearing in expressions thus:

foo        First try to resolve as unqualified column name;    if not found, try to resolve as unqualified table name
(in which case this is a whole-row value, equivalent    to foo.*).
 

foo.bar        foo is an unqualified table name.  Try first to resolve    bar as a column name of foo; if not found,
tryto resolve    bar as a function taking the rowtype of foo.
 

foo.bar.baz    This refers to table bar in schema foo.  baz is either    a column or function name, as above.

foo.bar.baz.quux    Refers to table baz in schema bar in catalog foo.    quux is either a column or function name, as
above.

foo.*        foo is an unqualified table name; means whole-row value.

foo.bar.*    Whole row of table bar in schema foo.

foo.bar.baz.*    Whole row of table baz in schema bar in catalog foo.

The first two of these rules are the same as current behavior; the next
two are additions mandated by SQL92; the last three are a proposed
extension to allow unambiguous reference to whole-row values.

With these rules, we do not lose any functionality that we have now,
but some PostQUEL-ish constructs will need to be rewritten into an
equivalent form.

PostQUEL-isms that still work:

table.func    equivalent to func(table.*)

func(table)    equivalent to func(table.*), as long as table name    doesn't match any column name of the query

PostQUEL-isms that no longer work:

table.col.func        Must rewrite as func(table.col)

table.func1.func2    Must rewrite as, eg, func2(func1(table.*))

While there are a couple of examples of the above two constructs in the
regression tests, I doubt they are used much in the real world.

Another PostQUEL-ism that I would like to remove is col(tab) to mean
tab.col; I'd prefer to restrict the function syntax to functions only.
We could continue to support this for unqualified table names, but
given the above rules it could not work for a qualified table name,
eg, col(schema.tab) would be misinterpreted.  Again, it seems unlikely
that many people are using this, so we may as well try to regularize
the syntax as much as we can.

You may wonder why I'm still allowing the construction tab.func ---
why not get rid of that too?  Well, mainly because I'd like to support
the Oracle-ish syntax for nextval: seqname.nextval.  Together with the
desire not to break existing code unnecessarily, that seems a good enough
reason to leave it in.

Comments?
        regards, tom lane

Re: Schemas vs. PostQUEL: resolving qualified identifiers

From

"Zeugswetter Andreas SB SD"

Date:

23 January 2002, 04:33:06

> PostQUEL-isms that no longer work:
> 
> table.col.func        Must rewrite as func(table.col)
> 
> table.func1.func2    Must rewrite as, eg, func2(func1(table.*))
> 
> While there are a couple of examples of the above two constructs in the
> regression tests, I doubt they are used much in the real world.

Imho OK. I only need table.func1 .

> Another PostQUEL-ism that I would like to remove is col(tab) to mean
> tab.col; I'd prefer to restrict the function syntax to functions only.
> We could continue to support this for unqualified table names, but
> given the above rules it could not work for a qualified table name,
> eg, col(schema.tab) would be misinterpreted.  Again, it seems unlikely
> that many people are using this, so we may as well try to regularize
> the syntax as much as we can.

Sounds ok too. Imho very hard to grok syntax anyway.

What I do not like at all is the notion that "table" == "table".* .
(IIRC there has already been some discussion where I objected to that.)

"table" as function parameter imho passes an object of type "table"
to the function. This involves type checking, and that the function only 
has one argument.

"table".* to the contrary is not an object, but one object (one parameter)
per table column. This is imho easier to understand, since select table.* 
also does it like that. Thus calling func(table.*) should imho rather be 
mapped to func (table.col1, table.col2 ...).

Also remember that the standard now has room for nested types.
create table atab (
x int,
y int );

( We use create table, standard iirc uses "create type".)

create table btab (
position atab,
info text );

How would you map btab.* here ? (x, y, info) ? Imho no.
Imho it should map to (atab, info) thus you need the 
notion of passing an object anyway.

Andreas

Re: Schemas vs. PostQUEL: resolving qualified identifiers

From

Tom Lane

Date:

23 January 2002, 10:13:08

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> What I do not like at all is the notion that "table" == "table".* .
> (IIRC there has already been some discussion where I objected to that.)

> "table" as function parameter imho passes an object of type "table"
> to the function. This involves type checking, and that the function only 
> has one argument.

> "table".* to the contrary is not an object, but one object (one parameter)
> per table column. This is imho easier to understand, since select table.* 
> also does it like that. Thus calling func(table.*) should imho rather be 
> mapped to func (table.col1, table.col2 ...).

Okay, but then how will you refer unambiguously to the rowtype object?
If you write func(schema.tab) the system will misinterpret it as
func(tab.col) --- which, in the worst case, might fail to fail because
there actually is such a column.  We have to make some compromises here.
I'm not all that thrilled with foo.* for rowtype either, but you haven't
offered a workable alternative.
        regards, tom lane

Re: Schemas vs. PostQUEL: resolving qualified identifiers

From

Fernando Nasser

Date:

23 January 2002, 13:35:38

Tom Lane wrote:
> 
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> > What I do not like at all is the notion that "table" == "table".* .
> > (IIRC there has already been some discussion where I objected to that.)
> 
> > "table" as function parameter imho passes an object of type "table"
> > to the function. This involves type checking, and that the function only
> > has one argument.
> 
> > "table".* to the contrary is not an object, but one object (one parameter)
> > per table column. This is imho easier to understand, since select table.*
> > also does it like that. Thus calling func(table.*) should imho rather be
> > mapped to func (table.col1, table.col2 ...).
> 
> Okay, but then how will you refer unambiguously to the rowtype object?
> If you write func(schema.tab) the system will misinterpret it as
> func(tab.col) --- which, in the worst case, might fail to fail because
> there actually is such a column.  We have to make some compromises here.
> I'm not all that thrilled with foo.* for rowtype either, but you haven't
> offered a workable alternative.
> 

What about casting with the keyord ROW?

func(ROW table) 

always refers to the row-type of table "table" even if there is
a column called "table".


-- 
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9

Re: Schemas vs. PostQUEL: resolving qualified identifiers

From

Tom Lane

Date:

23 January 2002, 14:13:05

Fernando Nasser <fnasser@redhat.com> writes:
> Tom Lane wrote:
>> Okay, but then how will you refer unambiguously to the rowtype object?

> What about casting with the keyord ROW?
> func(ROW table) 
> always refers to the row-type of table "table" even if there is
> a column called "table".

Strikes me as gratuituously different from the way everything else is
done.  We have .* and %ROWTYPE and so forth, and they're all suffixes.
The closest analogy to your ROW syntax is CAST, but it doesn't alter the
initial interpretation of its argument.

I was toying with the notion of inventing some new notation liketable.**
I don't like double-asterisk much, but maybe there's some other symbol
we could use here?
        regards, tom lane

Re: Schemas vs. PostQUEL: resolving qualified identifiers

From

Fernando Nasser

Date:

23 January 2002, 17:13:00

Tom Lane wrote:
> 
> Fernando Nasser <fnasser@redhat.com> writes:
> > Tom Lane wrote:
> >> Okay, but then how will you refer unambiguously to the rowtype object?
> 
> > What about casting with the keyord ROW?
> > func(ROW table)
> > always refers to the row-type of table "table" even if there is
> > a column called "table".
> 
> Strikes me as gratuituously different from the way everything else is
> done.  We have .* and %ROWTYPE and so forth, and they're all suffixes.
> The closest analogy to your ROW syntax is CAST, but it doesn't alter the
> initial interpretation of its argument.
> 

I didn't mean literally that way, I just wanted to add a keyword for
solving ambiguity (when there is one).

You are right, it should be:

func(table%ROWTYPE)  




-- 
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9