Thread: Simplifying OID lookups in the presence of namespaces

Simplifying OID lookups in the presence of namespaces

From
Tom Lane
Date:
We are about to need to fix a fair number of places in client code
(eg, psql and pg_dump) that presently do things like

SELECT * FROM pg_attribute
WHERE attrelid = (SELECT oid FROM pg_class WHERE relname = 'foo');

This does not work reliably anymore because there could be multiple
relations named 'foo' in different namespaces.  The sub-select to
get the relation OID will fail because it'll return multiple results.

The brute-force answer is

SELECT * FROM pg_attribute
WHERE attrelid = (SELECT oid FROM pg_class WHERE relname = 'foo'                 AND relnamespace = (SELECT oid FROM
pg_namespace                                    WHERE nspname = 'bar'));
 

But aside from being really ugly, this requires that the client code
know exactly which namespace contains the relation it's after.  If
the client is relying on namespace search then it may not know that;
in fact, the client code very possibly isn't even aware of the exact
namespace search path it's using.  I am planning to introduce an
informational function CURRENT_SCHEMAS() (or some such name) that
returns the current effective search path, probably as a NAME[] array.
But it looks really, really messy to write an SQL query that makes
use of such a function to look up the first occurrence of 'foo' in
the search path.  We need to encapsulate the lookup procedure somehow
so that we don't have lots of clients reinventing this wheel.

We already have some functions that accept a text string and do a
suitable lookup of a relation; an example is nextval(), for which
you can presently write
nextval('foo')        --- searches namespace path for foonextval('foo.bar')    --- looks only in namespace
foonextval('"Foo".bar')   --- quoting works for mixed-case names
 

Seems like what we want to do is make the lookup part of this available
separately, as a function that takes such a string and returns an OID.
We'd need such functions for each of the namespace-ified object kinds:
relations, datatypes, functions, and operators.

A variant of the idea of inventing functions is to extend the existing
datatype 'regproc' to do this, and invent also 'regclass', 'regtype',
'regoperator' datatypes to do the lookups for the other object kinds.
I proposed this in a different context last year,http://archives.postgresql.org/pgsql-hackers/2001-08/msg00589.php
but it seemed too late to do anything with the idea for 7.2.

If we went with the datatype approach then we'd be able to write
queries like

SELECT * FROM pg_attribute WHERE attrelid = 'foo'::regclass;

or

SELECT * FROM pg_attribute WHERE attrelid = 'foo.bar'::regclass;

or for that matter you could do

SELECT * FROM pg_attribute WHERE attrelid = regclass('foo');

which'd be syntactically indistinguishable from using a function.

The datatype approach seems a little bit odder at first glance, but it
has some interesting possibilities with respect to implicit casting
(see above-referenced thread).  So I'm inclined to go that route unless
someone's got an objection.

With a datatype, we also have outbound conversion to think of: so there
must be a function that takes an OID and produces a string.  What I am
inclined to do on that side is emit an unqualified name if the OID
refers to a relation/type/etc that would be found first in the current
namespace search path.  Otherwise, a qualified name (foo.bar) would be
emitted.  This will have usefulness for applications like pg_dump, which
will have exactly this requirement (per discussion a few days ago that
pg_dump should not qualify names unnecessarily).

One question is what to do with invalid input.  For example, if table
foo doesn't exist then what should 'foo'::regclass do?  The existing
regproc datatype throws an error, but I wonder whether it wouldn't be
more useful to return NULL.  Any thoughts on that?

Also, for functions and operators the name alone is not sufficient to
uniquely identify the object.  Type regproc currently throws an error
if asked to convert a nonunique function name; that severely limits its
usefulness.  I'm toying with allowing datatypes in the input string,
eg'sum(bigint)'::regproc
but I wonder if this will create compatibility problems.  In particular,
should the regproc and regoperator output converters include datatype
indicators in the output string?  (Always, never, only if not unique?)
Doing so would be a non-backwards-compatible change for regproc.
We might avoid that complaint by leaving regproc as-is and instead
inventing a parallel datatype (say regfunction) that supports datatype
indications.  But I'm not sure whether regproc is used enough to make
this an important concern.

Comments?
        regards, tom lane


Re: Simplifying OID lookups in the presence of namespaces

From
Joe Conway
Date:
Tom Lane wrote:
> A variant of the idea of inventing functions is to extend the existing
> datatype 'regproc' to do this, and invent also 'regclass', 'regtype',
> 'regoperator' datatypes to do the lookups for the other object kinds.
> I proposed this in a different context last year,
>     http://archives.postgresql.org/pgsql-hackers/2001-08/msg00589.php
> but it seemed too late to do anything with the idea for 7.2.
> 

Interesting thread. It seems like the same basic facility could also 
support an enum datatype that people migrating from mysql are always 
looking for.



> One question is what to do with invalid input.  For example, if table
> foo doesn't exist then what should 'foo'::regclass do?  The existing
> regproc datatype throws an error, but I wonder whether it wouldn't be
> more useful to return NULL.  Any thoughts on that?

NULL makes sense.

> 
> Also, for functions and operators the name alone is not sufficient to
> uniquely identify the object.  Type regproc currently throws an error
> if asked to convert a nonunique function name; that severely limits its
> usefulness.  I'm toying with allowing datatypes in the input string,
> eg
>     'sum(bigint)'::regproc
> but I wonder if this will create compatibility problems.  In particular,
> should the regproc and regoperator output converters include datatype
> indicators in the output string?  (Always, never, only if not unique?)

I'd be inclined to include datatype always. If you don't, how can you 
use this for pg_dump, etc?


Joe



Re: Simplifying OID lookups in the presence of namespaces

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> Also, for functions and operators the name alone is not sufficient to
>> uniquely identify the object.  Type regproc currently throws an error
>> if asked to convert a nonunique function name; that severely limits its
>> usefulness.  I'm toying with allowing datatypes in the input string,
>> eg
>> 'sum(bigint)'::regproc
>> but I wonder if this will create compatibility problems.  In particular,
>> should the regproc and regoperator output converters include datatype
>> indicators in the output string?  (Always, never, only if not unique?)

> I'd be inclined to include datatype always. If you don't, how can you 
> use this for pg_dump, etc?

pg_dump would probably actually prefer not having type info in the
output string; it'll just have to strip it off in most places.  But
I don't have a good feeling for the needs of other applications,
so I was asking what other people thought.

If we supported both ways via two datatypes, we'd have all the bases
covered; I'm just wondering if it's worth the trouble.
        regards, tom lane

PS: interesting thought about enum ...