resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name) - Mailing list pgsql-hackers

From Darren Duncan
Subject resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Date
Msg-id 4D8D627D.9010001@darrenduncan.net
Whole thread Raw
In response to Re: WIP: Allow SQL-language functions to reference parameters by parameter name  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
List pgsql-hackers
Robert Haas wrote:
> On Mar 25, 2011, at 9:22 PM, Joshua Berkus <josh@agliodbs.com> wrote:
>> Tom,
>>
>>> Personally I'd vote for *not* having any such dangerous semantics as
>>> that. We should have learned better by now from plpgsql experience.
>>> I think the best idea is to throw error for ambiguous references,
>>> period. 
>> As a likely heavy user of this feature, I agree with Tom here.  I really don't want the column being silently
preferredin SQL functions, when PL/pgSQL functions are throwing an error.  I'd end up spending hours debugging this.
 
>>
>> Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work.
> 
> Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a
parameterwith the same name.
 

I mention 2 possible solutions here, both which involve syntax alterations, each 
between the ---------- lines.  I personally like the second/lower option more.

------------

Might it be reasonable, perhaps as a 9.2 feature, to add top-level-namespaces so 
that one could always explicitly qualify what they are referring to?

For example, you could have the 3 "sch", "lex", "attr" (I may have missed some 
useful ones).

The "sch" TLN would unambiguously refer directly to a schema object, such as a 
database table.

The "lex" TLN would unambiguously refer directly to a lexical, either a 
parameter of the current routine or to a lexical variable.

The "attr" TLN would unambiguously refer to a table/etc column/attribute in the 
manner typical for SQL.

Use them like:
  sch.foo - the table/etc foo  lex.foo - the lexical variable foo  attr.foo - the column foo

Use of these TLN are optional where there is no ambiguity.

The TLN are not reserved words, but if one has an entity named the same, then 
references to it must be TLN-qualified; eg:
  lex.sch  lex.lex  lex.attr

Now these are just examples.  You may find a different set works better.

--------------

There are also alternate solutions.

For example, it could be mandated that lexical-scope aliases for any 
data/var-like schema object are required in routines, where the aliases are 
distinct from all lexical vars/params/etc, and then all SQL/code in the routines 
may only refer to the schema objects by the aliases.

Effectively this makes it so that routines can no longer see non-lexical vars 
but for those from parameters, and this aliasing is defining a parameter whose 
argument is supplied by the DBMS automatically rather than as an explicit 
routine caller argument.

That way, inside a routine body there are only lexical names for things, and so 
no namespace-qualification is ever needed by the regular SQL.

Similarly, if you always think of table column names as referring to an 
attribute or element of a table variable, then just reference the column 
qualified by the table name (or the lexical alias thereof).  Same as you do in 
any other programming language.  Of course, sometimes you don't have to qualify 
column name references as context could make it unambiguous.  Or, a shorthand 
like a simple leading "." could unambiguously say you're referring to a column 
of the particular table in context.

With those in place, all unqualified references are straight to lexical 
variables or parameters.

And so, this is also an effective way to resolve the ambiguity and I prefer the 
latter design personally.

Here's an example in quasi-PL/PgSQL:
  create function myrtn (myparam integer, mytbl ::= mydb.myschema.mytbl) as  declare      myvar integer := 5;  $body$
begin     select (.mycol + myvar * myparam) as mynewcol from mytbl;  end;  $body$
 

Note that I've already thought through this last example as these methods of 
avoiding ambiguity are loosely-speaking how my language Muldis D avoids the 
problem faced by many SQL procedures.

The ".mycol" syntax specifically was inspired originally for me by Perl 6 where 
the lack of something just before the "." means that the implicit topic variable 
is referred to, like if you said "$_.mycol".

A Perl 6 analogy being something like:
  $mytbl.map:{ .mycol + $myvar * $myparam }

aka:
  $mytbl.map:{ $_.mycol + $myvar * $myparam }

--------------

-- Darren Duncan


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP: Allow SQL-language functions to reference parameters by parameter name
Next
From: Tom Lane
Date:
Subject: Open issues for collations