Thread: Functions on tables

Functions on tables

From

"Brendan Jurd"

Date:

16 December 2006, 16:52:24

In the object-relational context, the definition of a "relation" is
much the same as the idea of a "class"; the columns in a table are
analogous to the attributes of a class.  The names of Postgres' system
catalogs reflect this correlation (pg_class, pg_attribute).

Likewise, each tuple within a relation is like an instance of the class.

So I was thinking, why is it we have such a direct representation of
class attributes (in the form of columns), but no equivalent
representation of class methods?

Say you have the following table:

CREATE TABLE person (
 id serial PRIMARY KEY,
 firstname text NOT NULL,
 lastname text NOT NULL
);

Then you define a function:

CREATE FUNCTION person_name(firstname text, lastname text) RETURNS text AS $$
 SELECT $1 || ' ' || $2;
$$ LANGUAGE SQL IMMUTABLE;

So now you can do

SELECT id, person_name(firstname, lastname) AS name FROM person ORDER BY name;

That works fine, but wouldn't it be far more elegant if you could do
this instead:

CREATE TABLE person (
 id SERIAL PRIMARY KEY,
 firstname TEXT NOT NULL,
 lastname TEXT NOT NULL,
 FUNCTION name() RETURNS text AS $$ SELECT firstname || ' ' ||
lastname; $$ LANGUAGE SQL IMMUTABLE
);

Now the function name() belongs to the "person" table: it is, in
effect, a method of the "person" class.  Which means we can do this:

SELECT id, name() FROM person ORDER BY name();

Just as with methods in an OO programming language, when you call
name() on a tuple of the "person" relation, it has access to the
attributes of that tuple (here firstname and lastname).  There is no
need to pass arguments to the function, nor any need to actually know
which attributes of "person" go into making up the return value of
name().  You could later decide to add an attribute for a person's
preferred name, or middle initial, and then factor that into the logic
of name() without the query author needing to know anything about it.

Of course there would be implementation challenges, and admittedly I
haven't considered those, but on the surface this feels like a good
idea.  It taps into some of the real usefulness of OOP, and it uses a
feature we already have: user-defined functions.

I look forward to your comments.

Regards,
BJ

Re: Functions on tables

From

Tom Lane

Date:

16 December 2006, 17:11:41

"Brendan Jurd" <direvus@gmail.com> writes:
> That works fine, but wouldn't it be far more elegant if you could do
> this instead:

> CREATE TABLE person (
>  id SERIAL PRIMARY KEY,
>  firstname TEXT NOT NULL,
>  lastname TEXT NOT NULL,
>  FUNCTION name() RETURNS text AS $$ SELECT firstname || ' ' ||
> lastname; $$ LANGUAGE SQL IMMUTABLE
> );

90% of the value this would have is already available with views,
I think, without going outside bog-standard SQL:

    create view ...
        firstname || ' ' || lastname as name,
        ...

Also, there's already a Berkeley-era syntax hack in PG that gets much of
the rest: if x is of composite type, the notations x.y and y(x) are
interchangeable.  Thus:

regression=# create function name(person) returns text as $$
regression$# select $1.firstname || ' ' || $1.lastname
regression$# $$ language sql immutable;
CREATE FUNCTION
regression=# select person.name from person;
   name
----------
 joe blow
(1 row)

> Now the function name() belongs to the "person" table: it is, in
> effect, a method of the "person" class.  Which means we can do this:
> SELECT id, name() FROM person ORDER BY name();

[ itch... ]  That seems to risk breaking a whole lot of existing code by
introducing name collisions --- the entire namespace of ordinary
functions is at risk as soon as you have any of these per-table
functions, if they can be called like that.

But having said all that, I think there are bits of SQL2003 that do some
of what you're after.  I don't think anyone has looked hard at what
would be involved in merging those new SQL features with historical
Postgres behaviors.

            regards, tom lane

Re: Functions on tables

From

"Brendan Jurd"

Date:

16 December 2006, 18:26:05

On 12/17/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 90% of the value this would have is already available with views,
> I think, without going outside bog-standard SQL:

Views also work fine, but one of the big advantages of having table
methods is that all the things your table can do are contained within
the table definition.  Your table definition becomes like the API of a
class.  When you use separate views and/or functions to provide
methods for a table, your schema becomes pretty scattered (I know mine
is).  It can be non-trivial to find, say, all functions that relate to
a person tuple.

> Also, there's already a Berkeley-era syntax hack in PG that gets much of
> the rest: if x is of composite type, the notations x.y and y(x) are
> interchangeable.  Thus:

I didn't know about that, and you're right, this actually delivers a
lot of the notational convenience that I'm looking for.  But again,
you don't win the ability to look at your table definition and
immediately understand what methods the table exposes.

It looks like you can't use this hack to conveniently handle methods
that have arguments.  For example, say you had a method to return a
person's birthday in a given year, defined as "birthday(person, int)
returns date".  You wouldn't be able to write person.birthday(2007) in
a query and get the expected result.

> [ itch... ]  That seems to risk breaking a whole lot of existing code by
> introducing name collisions --- the entire namespace of ordinary
> functions is at risk as soon as you have any of these per-table
> functions, if they can be called like that.
>

What if we used scope resolution?  That is, Postgres first looks for
functions which are local to the table, and if it doesn't find a
match, then looks for functions in the normal namespace, perhaps
raising a notice to warn the user of the ambiguity.

Say there was a function name() in the public namespace.  You can then
easily resolve the ambiguity by specifying either person.name() or
public.name().

Most front-end queries have more than one table in them, so in
practice I think you'll almost always be using table aliases and
identifying your methods explicitly anyway, e.g.:

SELECT p.name(), a.label() as address
FROM person p INNER JOIN address a ON p.postal_address = a.id;

... so I'm not passionately attached to the idea of being able to call
the method without prefixing the table name.

Re: Functions on tables

From

TJ O'Donnell

Date:

16 December 2006, 22:29:02

I wholeheartedly support the approach BJ is advocating.
The notion that methods (functions) and variables (tables)
can be defined together is a very useful OO approach.
I too find it difficult to recall which functions "belong"
to which tables.  Of course, some of my functions are very
generic and wouldn't appropriately "belong" to any one table,
but many are meant solely to operate on data in one type of
object (table).

I've taken to using schemas to collect together functions and tables
that "belong" together.  This requires the use of the schema name,
as you say BJ,
> ... so I'm not passionately attached to the idea of being able to call
> the method without prefixing the table name.

In my approach, the schema name becomes the object name and the
functions "belong" to the schema.  Most OO approaches only allow one
definition of variables (tables), and I can easily allow each schema
to have only one table.  But I can also use multiple tables.
The extra tables can be instances, much like BJ's use of rows as
instances.  Using separate tables allows me to have groups of
instances that are grouped together for some reason.  I can also
have tables that are sub-classes of the original table.

TJ
http://www.gnova.com/

Re: Functions on tables

From

"Brendan Jurd"

Date:

17 December 2006, 12:15:51

On 12/17/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> But having said all that, I think there are bits of SQL2003 that do some
> of what you're after.  I don't think anyone has looked hard at what
> would be involved in merging those new SQL features with historical
> Postgres behaviors.

I've been looking into SQL2003, and there are indeed some features
there that correspond to what I want.

ISO/IEC 9075-2:2003 - Foundation (SQL/Foundation) talks about
"structured types" (like a user-defined composite type in Postgres,
and somewhat like a "struct" in C).  It applies many OO concepts to
these structured types: inheritance, encapsulation, overloading.  It
goes so far as to say that every structured type has an implied
constructor method, and for every attribute within the type, one
implied "observer" method and one implied "mutator" method.

The standard provides for adding user-defined methods to these types,
which have an implied first parameter "SELF", which is exactly the
sort of thing I am looking for.  The standard doesn't specifically
mention doing this with tables, or row types.  However, the conceptual
distinction between a user-defined composite type and a table is not
vast, and AIUI Postgres already implies a composite type for every
table, with each tuple in the table being an object of that composite
type.

Frankly I don't have much experience reading SQL standards, and the
language they use is a bit abstruse.  But as far as I can tell, my
suggestion is quite nicely compliant with the behaviour the standard
recommends ... indeed the standard takes the OO idea much further than
I initially hoped to.

Alternatively, the standard also specifies "generated columns" within
a table, which would allow you to achieve a similar effect to my
person.name() method like so:

CREATE TABLE person (
 id serial PRIMARY KEY,
 firstname text NOT NULL,
 lastname text NOT NULL,
 name GENERATED ALWAYS AS (firstname || ' ' || lastname)
);

I think the generated column idea has some usefulness, but isn't quite
what I'm after.  You can't use it with additional parameters (c.f. my
earlier birthday(int) example), and I like the idea of keeping
attributes and methods totally separate.  A generated column is made
to appear like an attribute, and it is referenced like an attribute,
but in actual fact it is not an attribute, it is a derived value.

Re: Functions on tables

From

elein

Date:

23 December 2006, 00:48:57

On Mon, Dec 18, 2006 at 12:15:34AM +1100, Brendan Jurd wrote:
> On 12/17/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >But having said all that, I think there are bits of SQL2003 that do some
> >of what you're after.  I don't think anyone has looked hard at what
> >would be involved in merging those new SQL features with historical
> >Postgres behaviors.
>
> I've been looking into SQL2003, and there are indeed some features
> there that correspond to what I want.
>
> ISO/IEC 9075-2:2003 - Foundation (SQL/Foundation) talks about
> "structured types" (like a user-defined composite type in Postgres,
> and somewhat like a "struct" in C).  It applies many OO concepts to
> these structured types: inheritance, encapsulation, overloading.  It
> goes so far as to say that every structured type has an implied
> constructor method, and for every attribute within the type, one
> implied "observer" method and one implied "mutator" method.
>
> The standard provides for adding user-defined methods to these types,
> which have an implied first parameter "SELF", which is exactly the
> sort of thing I am looking for.  The standard doesn't specifically
> mention doing this with tables, or row types.  However, the conceptual
> distinction between a user-defined composite type and a table is not
> vast, and AIUI Postgres already implies a composite type for every
> table, with each tuple in the table being an object of that composite
> type.
>
> Frankly I don't have much experience reading SQL standards, and the
> language they use is a bit abstruse.  But as far as I can tell, my
> suggestion is quite nicely compliant with the behaviour the standard
> recommends ... indeed the standard takes the OO idea much further than
> I initially hoped to.
>
> Alternatively, the standard also specifies "generated columns" within
> a table, which would allow you to achieve a similar effect to my
> person.name() method like so:
>
> CREATE TABLE person (
> id serial PRIMARY KEY,
> firstname text NOT NULL,
> lastname text NOT NULL,
> name GENERATED ALWAYS AS (firstname || ' ' || lastname)
> );

Illustra had the equivalent of columns defined as expressions.
I don't remember the syntax, but it was simpler than "GENERATED".
But better go with the spec.

Late on the thread,

--elein


>
> I think the generated column idea has some usefulness, but isn't quite
> what I'm after.  You can't use it with additional parameters (c.f. my
> earlier birthday(int) example), and I like the idea of keeping
> attributes and methods totally separate.  A generated column is made
> to appear like an attribute, and it is referenced like an attribute,
> but in actual fact it is not an attribute, it is a derived value.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>