Thread: Parser translations and schemas

Parser translations and schemas

From
Tom Lane
Date:
Here's today's tidbit for those who like to argue about nitty little
details of behavior ...

Presently, the parser (in particular gram.y) has quite a few special
transformations for certain type and function names.  For example,

You write            You get

char(N)                bpchar
trim(BOTH foo)            btrim(foo)

The question for the day is: should these transformations be applied to
schema-qualified names?  And should the parser force the transformed
names to be looked up in the system schema (pg_catalog), or should it
allow them to be searched for using the regular namespace search path?

I want to make the following proposal:

1. Transformations are applied only to unqualified names.  If you
write a qualified name then it is treated as a plain-vanilla identifier
and looked up in the catalogs without transformation, even if the name
component happens to match a name that would be transformed standing
alone.

2. If a transformation is applied then the resulting name will always
be forced to be looked up in the system schema; ie, the output will
effectively be "pg_catalog.something" not just "something".

Some examples:

You write            You get

char(N)                pg_catalog.bpchar
pg_catalog.char            pg_catalog.char (not bpchar)
real                pg_catalog.float4
myschema.real            myschema.real (not float4)
trim(BOTH foo)            pg_catalog.btrim(foo)
pg_catalog.trim(BOTH foo)    an error (since the special production            allowing BOTH won't be used)

I have a number of reasons for thinking that this is a reasonable way to
go.  Point one: transforming qualified names seems to violate the
"principle of least surprise".  If I write myschema.real I would not
expect that to be converted to myschema.float4, especially if I weren't
aware that Postgres internally calls REAL "float4".  Point two: I don't
believe that we need to do it to meet the letter of the SQL spec.
AFAICT the spec treats all the names of built-in types and functions as
keywords, not as names belonging to a system schema.  So special
behavior is required for TRIM(foo) but not for DEFINITION_SCHEMA.TRIM(foo).
Point three: if we do transform a name, then we are expecting a
particular system type or function to be selected, and we ought to
ensure that that happens; thus explicitly qualifying the output name
seems proper.  Again, this seems less surprising than other alternatives.
If I have a datatype myschema.float4 that I've put into the search path
in front of pg_catalog, I think I'd be surprised to have it get picked
when I write REAL.

Another reason for doing it this way is that I think it's necessary for
reversibility.  For example, consider what format_type should put out
when it's trying to write a special-cased type name.  If it needs to
emit numeric(10,4) then it *cannot* stick "pg_catalog." on the front of
that --- the result wouldn't parse.  (At least not unless we uglify the
grammar a whole lot more to allow pg_catalog.FOO everywhere that just
FOO currently has a special production.)  So we need to make parsing
rules that guarantee that numeric(10,4) will be interpreted as
pg_catalog.numeric and not something else, regardless of the active
search path.  On the other hand, a plain user datatype that happens
to be named "real" should be accessible as myschema.real without
interference from the real->float4 transformation.

A corner case that maybe requires more discussion is what about type and
function names that are reserved per spec, but which we do not need any
special transformation for?  For example, the spec thinks that
OCTET_LENGTH() is a keyword, but our implementation treats it as an
ordinary function name.  I feel that the grammar should not prefix
"pg_catalog." to any name that it hasn't transformed or treated
specially in any way, even if that name is reserved per spec.  Note that
this will not actually lead to any non-spec-compliant behavior as long
as one allows the system to search pg_catalog before any user-provided
schemas --- which is in fact the default behavior, as it's currently set
up.

Another point is that I believe that REAL should be transformed to
pg_catalog.float4, but the quoted identifier "real" should not be.
This would require a bit of surgery --- presently xlateSqlType is
applied to pretty much everything whether it's a quoted identifier
or not.  But if we allow xlateSqlType to continue to work that way,
then user-schema names that happen to match one of its target names
are going to behave strangely.  I think we will need to get rid of
xlateSqlType/xlateSqlFunc and instead implement all the name
transformations we want as special productions, so that the target
names are shown as keywords in keywords.c.  Without this, ruleutils.c
will not have a clue that the user type name "real" needs to be quoted
to keep it from being transformed.

BTW: as the code stands today, gram.y is prefixing "pg_catalog." to
system function names but not to type names, which is inconsistent;
I hadn't thought carefully enough about these issues when I was hacking
the grammar for schematized datatypes.  So in any case I have some work
to do...

Comments, concerns, better ideas?
        regards, tom lane


Re: Parser translations and schemas

From
Peter Eisentraut
Date:
Tom Lane writes:

> You write            You get
>
> char(N)                pg_catalog.bpchar
> pg_catalog.char            pg_catalog.char (not bpchar)
> real                pg_catalog.float4
> myschema.real            myschema.real (not float4)
> trim(BOTH foo)            pg_catalog.btrim(foo)
> pg_catalog.trim(BOTH foo)    an error (since the special production
>                 allowing BOTH won't be used)

Exactly my thoughts.

> A corner case that maybe requires more discussion is what about type and
> function names that are reserved per spec, but which we do not need any
> special transformation for?  For example, the spec thinks that
> OCTET_LENGTH() is a keyword, but our implementation treats it as an
> ordinary function name.  I feel that the grammar should not prefix
> "pg_catalog." to any name that it hasn't transformed or treated
> specially in any way, even if that name is reserved per spec.

I agree.

-- 
Peter Eisentraut   peter_e@gmx.net