Thread: Views versus user-defined functions: formatting, comments, performance, etc.

Views versus user-defined functions: formatting, comments, performance, etc.

From
Adam Mackler
Date:
Hi:

I notice when I save a view, I lose all the formatting and comments.
As I was writing a complicated view, wanting to retain the format and
comments, I thought I could just save it as a function that returns a
table value.  A function would evaluate to the same value as a view,
but changing it later might be less confusing.

However, I'm guessing (since I don't know anything about the
internals) that the loss of formatting and comments is a result of the
view being processed and stored in a more computer-friendly format,
while functions are simply stored as the text that I type.  That gives
me reason to suspect there may be performance or other differences
between the same SQL statement stored either as a view or a
user-defined function.

So that's my question: as someone who doesn't have a problem with
putting a pair of empty parentheses at the end of a table variable
name, what factors should I be thinking of while deciding whether to
store my self-composed, multi-hundred-line long SQL statement as a
view or a function?

--
Adam Mackler


Adam Mackler <adammackler@gmail.com> writes:
> I notice when I save a view, I lose all the formatting and comments.
> As I was writing a complicated view, wanting to retain the format and
> comments, I thought I could just save it as a function that returns a
> table value.  A function would evaluate to the same value as a view,
> but changing it later might be less confusing.

A lot of people choose to save the source text in some sort of SCM
(eg git), and then just import via CREATE OR REPLACE VIEW when they
change it.  This tends to soon scale up to large scripts that define
collections of related objects.

> However, I'm guessing (since I don't know anything about the
> internals) that the loss of formatting and comments is a result of the
> view being processed and stored in a more computer-friendly format,
> while functions are simply stored as the text that I type.

Correct.  The reparse time per se is generally not a big deal, but the
execution penalty associated with a function can be.  If you go this way
you'll want to make sure that your function can be "inlined" --- use
EXPLAIN to make sure you get a plan matching the bare view, and not just
something that says "Function Scan".

            regards, tom lane


Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Merlin Moncure
Date:
On Fri, Aug 17, 2012 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Adam Mackler <adammackler@gmail.com> writes:
>> I notice when I save a view, I lose all the formatting and comments.
>> As I was writing a complicated view, wanting to retain the format and
>> comments, I thought I could just save it as a function that returns a
>> table value.  A function would evaluate to the same value as a view,
>> but changing it later might be less confusing.
>
> A lot of people choose to save the source text in some sort of SCM
> (eg git), and then just import via CREATE OR REPLACE VIEW when they
> change it.  This tends to soon scale up to large scripts that define
> collections of related objects.
>
>> However, I'm guessing (since I don't know anything about the
>> internals) that the loss of formatting and comments is a result of the
>> view being processed and stored in a more computer-friendly format,
>> while functions are simply stored as the text that I type.
>
> Correct.  The reparse time per se is generally not a big deal, but the
> execution penalty associated with a function can be.  If you go this way
> you'll want to make sure that your function can be "inlined" --- use
> EXPLAIN to make sure you get a plan matching the bare view, and not just
> something that says "Function Scan".

For various reasons, this often goes the wrong way.  Views are often
the right way to go.  +1 on your comment above -- the right way to do
views (and SQL in general) is to organize scripts and to try and avoid
managing everything through GUI tools.  It works.

merlin



> Hi:
>
> I notice when I save a view, I lose all the formatting and comments.
> As I was writing a complicated view, wanting to retain the format and
> comments, I thought I could just save it as a function that returns a
> table value.  A function would evaluate to the same value as a view,
> but changing it later might be less confusing.
>
> However, I'm guessing (since I don't know anything about the
> internals) that the loss of formatting and comments is a result of the
> view being processed and stored in a more computer-friendly format,
> while functions are simply stored as the text that I type.  That gives
> me reason to suspect there may be performance or other differences
> between the same SQL statement stored either as a view or a
> user-defined function.
>
> So that's my question: as someone who doesn't have a problem with
> putting a pair of empty parentheses at the end of a table variable
> name, what factors should I be thinking of while deciding whether to
> store my self-composed, multi-hundred-line long SQL statement as a
> view or a function?

To put it simple: views are basically stored SQL queries. Functions are just
that - functions. The later require parsing most of the time, the former are
simply executed like any other query (I think views are actually kind of
prepared, so less overhead than executing the same query several times, but I
may be wrong and there's far more qualified people on this list to answer that)

I run a website with a few million pages a month and every page is assembled
out of database records (around 200 tables). I use mostly views for pretty
much everything, since a normalized database almost always requires joins over
several tables to get the result you want.  The framework I use requires
SQLAlchemy (a python object oriented SQL mapper) - and I'm just not the person
coding SQL in python when I have a perfectly good database much better at that
task. So I use views and only use the mapper to map the result from SQL to
python.
I learned by experience that functions/procedures are slower and in my eyes
more cumbersome to maintain and debug.  I maintain all the database source in
a SCM and I don't use any graphical tools for the SQL - just a good old emacs
does it for me nicely.  I also have SQL scripts that allow me to update views.
My views often depend on each other, so replacing one ususally cascades to
others breaking the whole scheme. A script applying the views in correct order
helps a lot on that one.

So basically, use views for performance , maintenance and sometimes
programming reasons (views behave like tables and often your application layer
can't tell the difference, which helps) and use functions where you need the
extra functionality which a view simply can't provide (i.e. you need to update
a record when someone views a different record.) Also think about triggers,
they can be quite useful for i.e. my example about needing to update a record.
Triggers don't require you to explicitly call the function - the database will
do that for you (which kind of obscures that there is something happening...)

Uwe




Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
"David Johnston"
Date:
Included below:

1) Question regarding the ability to inline set-returning functions
2) A comment that not keeping the content between the "CREATE VIEW ... AS"
and the trailing ";|EOF" is losing good information to have inside the
database.


> Correct.  The reparse time per se is generally not a big deal, but the
> execution penalty associated with a function can be.  If you go this
> way you'll want to make sure that your function can be "inlined" ---
> use EXPLAIN to make sure you get a plan matching the bare view, and
> not just something that says "Function Scan".

>>>>>>   Is this even possible???  <<<<<<

Related Question:  If an inlined function is called and not all of its
output columns referenced does the planner avoid pulling the data for the
not-referenced columns?

E.G.:   SELECT col1, col2 FROM function_with_three_cols();  Does the planner
ignore whatever "col3" would resolve to or are the contents of "col3"
outputted in the "sub-query" and then simply ignored further up the tree?
If "col3" contains a significant amount of text then its inclusion or
exclusion could significantly impact performance.  I am guessing that it
would have to process and return "col3".

Trying to answer the previous question this one presented itself:  I just
tried a couple of very simple queries and couldn't get them give me a plan
that wasn't a "Function Scan".  Is it possible that only "scalar" functions
can be inlined?

CREATE OR REPLACE FUNCTION three_col_func()
RETURNS TABLE (col1 varchar, col2 varchar, col3 varchar)
AS $$

   SELECT '1'::varchar, '2'::varchar, '3'::varchar;

$$
LANGUAGE sql
VOLATILE
ROWS 1
;

EXPLAIN SELECT * FROM three_col_func()

QUERY PLAN
Function Scan on three_col_func  (cost=0.25..0.26 rows=1 width=96)

This is on 9.0.X


> For various reasons, this often goes the wrong way.  Views are often the
> right way to go.  +1 on your comment above -- the right way to do views
(and
> SQL in general) is to organize scripts and to try and avoid managing
> everything through GUI tools.  It works.
>
> merlin


While this is generally sound advice having a read-only version of the
comments available on the server has merit.  Not everyone who uses the
database needs or wants to have access to the source scripts in order to
learn how/why a particular object works or what it is intended for.

COMMENT ON xxx IS 'text';

For much of the "public API" commenting that is desired the above command
works OK but it is not a valid substitute in many situations.

Not really looking to get into a deep discussion on this topic at the moment
but the point is that not maintaining the entire text between the "AS" and
the final ";" causes a loss of information that has value being stored in
the database and thus becomes accessible to - admittedly PostgreSQL specific
- meta-data tools.

For the OP: as Tom indicated in-lining is key; and you cannot make use of
parameters.  You also cannot add a trigger to a function like you can a VIEW
- functions are read-only.

My $0.02

David J.




"David Johnston" <polobo@yahoo.com> writes:
> Trying to answer the previous question this one presented itself:  I just
> tried a couple of very simple queries and couldn't get them give me a plan
> that wasn't a "Function Scan".  Is it possible that only "scalar" functions
> can be inlined?

> CREATE OR REPLACE FUNCTION three_col_func()
> RETURNS TABLE (col1 varchar, col2 varchar, col3 varchar)
> AS $$
>    SELECT '1'::varchar, '2'::varchar, '3'::varchar;
> $$
> LANGUAGE sql
> VOLATILE
> ROWS 1
> ;

No, the problem with that one is that it's marked VOLATILE, which has a
whole set of implications that wouldn't be replicated by an inline
subquery.  Try marking it STABLE instead.

There are a pile of other poorly-documented restrictions as well...
one that I notice in a quick look at inline_set_returning_function
is that the function can't be marked STRICT.

            regards, tom lane




On Fri, Aug 17, 2012 at 2:35 PM, Adam Mackler <adammackler@gmail.com> wrote:
Hi:

I notice when I save a view, I lose all the formatting and comments.
As I was writing a complicated view, wanting to retain the format and
comments, I thought I could just save it as a function that returns a
table value.  A function would evaluate to the same value as a view,
but changing it later might be less confusing.

However, I'm guessing (since I don't know anything about the
internals) that the loss of formatting and comments is a result of the
view being processed and stored in a more computer-friendly format,
while functions are simply stored as the text that I type.  That gives
me reason to suspect there may be performance or other differences
between the same SQL statement stored either as a view or a
user-defined function.

So that's my question: as someone who doesn't have a problem with
putting a pair of empty parentheses at the end of a table variable
name, what factors should I be thinking of while deciding whether to
store my self-composed, multi-hundred-line long SQL statement as a
view or a function?

As someone who does a lot of both, here's my take.

Functions are often relatively opaque planner-wise.  Some SQL-language functions can be inlined.  Not all can be.  I usually assume that a function which is hitting a table cannot be inlined.  This is a useful assumption even though it is wrong in some cases.  Also views and functions create interfaces for your data, and both can be unit-tested (which is something that should not be estimated in importance).

In general if I want a re-usable quasi-relation, I create a view.  If I want a data interface, I create a function.  If I want a data transformation interface, then a function is the right tool.

Because of the work I do I create a lot more functions than views, but both have their uses,.  Also I would second Tom's suggestion regarding use of an external source code management solution.  It's not that hard to create shell scripts that reload a set of db scripts, and this way you can use your scm versions as authoritative.  However with these be aware of what happens when you make certain changes.  For example it isn't too hard to unintentionally end up with overloaded functions, or have a view refuse to be rebuilt because of some changes made.  These require some knowledge of what's safe to in your specific application.

Best Wishes,
Chris Travers
On 08/18/2012 06:49 AM, Merlin Moncure wrote:
> For various reasons, this often goes the wrong way.  Views are often
> the right way to go.

Indeed. I've had queries speed up *hundreds* of times when I convert a
function the planner didn't seem to want to inline into a view it can
push conditions down into.

The key thing to remember with views is that - unlike CTE "WITH"
expressions - they generally aren't fully evaluated to get all their
rows if most of them aren't needed. The query optimiser can typically
push filters (like "where customer_id = 4" or whatever) down into the
index- and table-scans used by the view, reducing the amount of data
that has to be processed.

That's not always the case, so use of EXPLAIN ANALYZE and some tweaking
of a view or query that uses a view is sometimes necessary. Mostly it
"just works" though.

--
Craig Ringer


Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Dmitriy Igrishin
Date:


2012/8/18 Merlin Moncure <mmoncure@gmail.com>
On Fri, Aug 17, 2012 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Adam Mackler <adammackler@gmail.com> writes:
>> I notice when I save a view, I lose all the formatting and comments.
>> As I was writing a complicated view, wanting to retain the format and
>> comments, I thought I could just save it as a function that returns a
>> table value.  A function would evaluate to the same value as a view,
>> but changing it later might be less confusing.
>
> A lot of people choose to save the source text in some sort of SCM
> (eg git), and then just import via CREATE OR REPLACE VIEW when they
> change it.  This tends to soon scale up to large scripts that define
> collections of related objects.
>
>> However, I'm guessing (since I don't know anything about the
>> internals) that the loss of formatting and comments is a result of the
>> view being processed and stored in a more computer-friendly format,
>> while functions are simply stored as the text that I type.
>
> Correct.  The reparse time per se is generally not a big deal, but the
> execution penalty associated with a function can be.  If you go this way
> you'll want to make sure that your function can be "inlined" --- use
> EXPLAIN to make sure you get a plan matching the bare view, and not just
> something that says "Function Scan".

For various reasons, this often goes the wrong way.  Views are often
the right way to go.  +1 on your comment above -- the right way to do
views (and SQL in general) is to organize scripts and to try and avoid
managing everything through GUI tools.  It works.
The drawback of this approach is that in some cases we need a
factory function(s) (in terms of the OOP) which returns one or a
set of objects (i.e. the function returns the view type). But since
the views are not in the dump we are forced to abandon this solution
and go with workarounds (such as creating extra composite types
to use as returning values or use the tables).

PS. I'm tried to found a SQL formatter for format views definitions
stored in the database, but unsuccessful.

--
// Dmitriy.


Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Merlin Moncure
Date:
On Sun, Aug 19, 2012 at 8:14 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
>> For various reasons, this often goes the wrong way.  Views are often
>> the right way to go.  +1 on your comment above -- the right way to do
>> views (and SQL in general) is to organize scripts and to try and avoid
>> managing everything through GUI tools.  It works.
>
> The drawback of this approach is that in some cases we need a
> factory function(s) (in terms of the OOP) which returns one or a
> set of objects (i.e. the function returns the view type). But since
> the views are not in the dump we are forced to abandon this solution
> and go with workarounds (such as creating extra composite types
> to use as returning values or use the tables).

Could you elaborate on this?

> PS. I'm tried to found a SQL formatter for format views definitions
> stored in the database, but unsuccessful.

Even if you could find one, I wouldn't use it: the database changes
the structure of you query.  Suppose you had:
CREATE VIEW v AS SELECT *, a*b AS product FROM foo;

The database converts that to:
CREATE VIEW v AS SELECT a,b, a*b AS product FROM foo;

That means that if you add columns after the fact, the view definition
in the database will diverge from what the source would create.
Besides that, there are a number of other things that the database
does like add unnecessary casts, column aliases and parentheses that
make a purely textual solution impossible.

merlin


Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Dmitriy Igrishin
Date:


2012/8/20 Merlin Moncure <mmoncure@gmail.com>
On Sun, Aug 19, 2012 at 8:14 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
>> For various reasons, this often goes the wrong way.  Views are often
>> the right way to go.  +1 on your comment above -- the right way to do
>> views (and SQL in general) is to organize scripts and to try and avoid
>> managing everything through GUI tools.  It works.
>
> The drawback of this approach is that in some cases we need a
> factory function(s) (in terms of the OOP) which returns one or a
> set of objects (i.e. the function returns the view type). But since
> the views are not in the dump we are forced to abandon this solution
> and go with workarounds (such as creating extra composite types
> to use as returning values or use the tables).

Could you elaborate on this?
 Suppose we've designed a simple class hierarchy (I'll use C++ notation):
class User { ... };
class Real_user : public User { ... };
class Pseudo_user : public User { ... };

Suppose we've decided that objects of these classes will be stored
in one database table:
CREATE TYPE user_type AS ENUM ('real', 'pseudo');
CREATE TABLE user (id serial NOT NULL,
                                 tp user_type NOT NULL,
                                 user_property1 text NOT NULL,
                                 user_property2 text NOT NULL,
                                 real_user_property1 text NULL,
                                 real_user_property2 text NULL,
                                 pseudo_user_property1 text NULL);

For simple mapping we've creating the (updatable, with rules) views:
CREATE VIEW real_user_view
  AS SELECT * FROM user WHERE tp = 'real';

CREATE VIEW pseudo_user_view
  AS SELECT * FROM user WHERE tp = 'pseudo';

CREATE VIEW user_view
  AS SELECT * FROM real_user_view
        UNION ALL SELECT * FROM pseudo_user_view;

The C++ classes above will operate on these views.
Finally, suppose we need a function which gets a Real_user's
instance by known identifier (or a key):
The C++ function may be defined as:
Real_user* real_user(int id);

At the same time this function can call PL/pgSQL's function:
CREATE FUNCTION real_user(id integer)
RETURNS real_user_view ...

So, the factory function real_user() is depends on the view. And
when the views are not in the dump (stored in the separate place)
this is an annoying limitation and we must use some of the
workarounds. (Use the table "user" as a return value or create
an extra composite type with the same structure as for the real_user_view).


> PS. I'm tried to found a SQL formatter for format views definitions
> stored in the database, but unsuccessful.

Even if you could find one, I wouldn't use it: the database changes
the structure of you query.  Suppose you had:
CREATE VIEW v AS SELECT *, a*b AS product FROM foo;

The database converts that to:
CREATE VIEW v AS SELECT a,b, a*b AS product FROM foo;

That means that if you add columns after the fact, the view definition
in the database will diverge from what the source would create.
Besides that, there are a number of other things that the database
does like add unnecessary casts, column aliases and parentheses that
make a purely textual solution impossible.

merlin



--
// Dmitriy.


Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Merlin Moncure
Date:
On Tue, Aug 28, 2012 at 3:25 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
> 2012/8/20 Merlin Moncure <mmoncure@gmail.com>
>>
>> On Sun, Aug 19, 2012 at 8:14 AM, Dmitriy Igrishin <dmitigr@gmail.com>
>> wrote:
>> >> For various reasons, this often goes the wrong way.  Views are often
>> >> the right way to go.  +1 on your comment above -- the right way to do
>> >> views (and SQL in general) is to organize scripts and to try and avoid
>> >> managing everything through GUI tools.  It works.
>> >
>> > The drawback of this approach is that in some cases we need a
>> > factory function(s) (in terms of the OOP) which returns one or a
>> > set of objects (i.e. the function returns the view type). But since
>> > the views are not in the dump we are forced to abandon this solution
>> > and go with workarounds (such as creating extra composite types
>> > to use as returning values or use the tables).
>>
>> Could you elaborate on this?
>
>  Suppose we've designed a simple class hierarchy (I'll use C++ notation):
> class User { ... };
> class Real_user : public User { ... };
> class Pseudo_user : public User { ... };
>
> Suppose we've decided that objects of these classes will be stored
> in one database table:
> CREATE TYPE user_type AS ENUM ('real', 'pseudo');
> CREATE TABLE user (id serial NOT NULL,
>                                  tp user_type NOT NULL,
>                                  user_property1 text NOT NULL,
>                                  user_property2 text NOT NULL,
>                                  real_user_property1 text NULL,
>                                  real_user_property2 text NULL,
>                                  pseudo_user_property1 text NULL);
>
> For simple mapping we've creating the (updatable, with rules) views:
> CREATE VIEW real_user_view
>   AS SELECT * FROM user WHERE tp = 'real';
>
> CREATE VIEW pseudo_user_view
>   AS SELECT * FROM user WHERE tp = 'pseudo';
>
> CREATE VIEW user_view
>   AS SELECT * FROM real_user_view
>         UNION ALL SELECT * FROM pseudo_user_view;
>
> The C++ classes above will operate on these views.
> Finally, suppose we need a function which gets a Real_user's
> instance by known identifier (or a key):
> The C++ function may be defined as:
> Real_user* real_user(int id);
>
> At the same time this function can call PL/pgSQL's function:
> CREATE FUNCTION real_user(id integer)
> RETURNS real_user_view ...
>
> So, the factory function real_user() is depends on the view. And
> when the views are not in the dump (stored in the separate place)
> this is an annoying limitation and we must use some of the
> workarounds. (Use the table "user" as a return value or create
> an extra composite type with the same structure as for the real_user_view).

Hm, couple points (and yes, this is a common problem):
*) how come you don't have your function depend on the table instead
of the view?  this has the neat property of having the function
automatically track added columns to the table.

*) if that's still a headache from dependency point of view, maybe you
can use composite-type implemented table:
postgres=# create type foo as  (a int, b int);
CREATE TYPE
postgres=# create table bar of foo;
CREATE TABLE
postgres=# create view baz as select * from bar;
CREATE VIEW
postgres=# alter type foo add attribute c int cascade;
ALTER TYPE
postgres=# \d bar
      Table "public.bar"
 Column |  Type   | Modifiers
--------+---------+-----------
 a      | integer |
 b      | integer |
 c      | integer |
Typed table of type: foo

*) do you really need a factory function to create 'user'  -- why not
allow regular inserts?


*) I usually do some variant of this:

create table fruit
(
  fruit_id int primary key,
  type text,
  freshness numeric
);

create table apple
(
  fruit_id int primary key references fruit on delete cascade
    deferrable initially deferred,
  cyanide_content numeric
);

create table orange
(
  fruit_id int primary key references fruit on delete cascade
    deferrable initially deferred,
  vitamin_c_content numeric
);

create or replace function hs(r anyelement) returns hstore as
$$
  select hstore($1);
$$ language sql immutable strict;

create or replace view fruit_ext as
  select f.*,
    coalesce(hs(a), hs(o)) as properties
  from fruit f
  left join apple a using(fruit_id)
  left join orange o using(fruit_id);

insert into fruit values(1, 'apple', 2.0);
insert into fruit values(2, 'orange', 3.5);

insert into apple values(1, 0.00003);
insert into orange values(2, 0.012);

This seems to work well especially if you have a lot of
specializations of the 'base type' and you can season deletions to
taste with appropriate RI triggers if you want.  An alternate way to
do it is to include fruit.type in the primary key, forcing the
dependent fruit back to the proper record though.  My main gripe about
it is that it there's no way to make sure that a 'fruit' points at the
proper dependent table based on type with a pure constraint.

Yet another way of doing this is to simple hstore the extended
properties into the base table so that everything is stuffed in one
table -- that discards all type safety though.  I'm curious about what
others have come up with in terms of solving this problem.

Aside: a better way of doing this is the problem that table
inheritance was trying to solve (and didn't).

merlin




On Wed, Aug 29, 2012 at 6:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Hm, couple points (and yes, this is a common problem):
*) how come you don't have your function depend on the table instead
of the view?  this has the neat property of having the function
automatically track added columns to the table.

Plus there are lots of really cool things about function/table dependencies. 

*) if that's still a headache from dependency point of view, maybe you
can use composite-type implemented table:
postgres=# create type foo as  (a int, b int);
CREATE TYPE
postgres=# create table bar of foo;
CREATE TABLE

That's a useful syntax there.
 

*) do you really need a factory function to create 'user'  -- why not
allow regular inserts?


I don't know about his case but with LedgerSMB our new code all maps inserts into stored procedures.  These have a useful property of discoverability although I suppose an insert would too.    However if you are going the insert route, it may be good to separate physical from logical storage, which may be where the views and functions come in.

Also showing before where table inheritance can make this a bit better and how to solve your key issue (which again inheritance provides a solution for, properly used).
 

*) I usually do some variant of this:

create table fruit
(
  fruit_id int primary key,
  type text,
  freshness numeric
);

I would suggest adding fruit.type to the primary key.  From here there are all sorts of things you can do and depending on number of types, inheritance can be a net win (see my upcoming post tomorrow on this). 

Also create another table:

CREATE TABLE fruit_ref (
   fruit_id int,
   fruit_type text
);

This is then useful for creating inherited interfaces.  More on this below.

create table apple
(
  fruit_id int primary key references fruit on delete cascade
    deferrable initially deferred,
  cyanide_content numeric
);

Change this to:

CREATE TABLE apple (
   cyanide_content numeric,
   primary key (fruit_id, fruit_type)
   check (fruit_type = 'apple'),
   foreign key (fruit_id, fruit_type) references fruit (fruit_id, type) 
       deferrable initially immediate,
) inherits (fruit_ref); 

Make similar changes to other tables below....
 

create table orange
(
  fruit_id int primary key references fruit on delete cascade
    deferrable initially deferred,
  vitamin_c_content numeric
);

Now, you can also do as follows:

CREATE OR REPLACE FUNCTION fruit(fruit_ref) returns fruit
language sql
as $$
select * from fruit where fruit_id = $1.fruit_id; $$;

This way you technically can do something like:

select (a.fruit).* from apple a; 

although that will essentially force a nested loop join, and probably even worse than that so keep  those to the minimum.

But now we can decide on how to enforce the type constraint.  The simplest way if you don't have too many subtypes is probably to add the following columns to fruit:

ALTER TABLE fruit ADD apple_id int;
ALTER TABLE fruit ADD FOREIGN KEY (apple_id, type) 
           REFERENCES apple (fruit_id, type)
           DEFERRABLE INITIALLY DEFERRED;

And then do the same for orange etc.  you can then:

ALTER TABLE fruit ADD CHECK ((type = 'apple' and apple_id IS NOT NULL) OR (type = 'orange' AND orange_id IS NOT NULL)
etc....
;

This way you only get a fairly complicated set of type constraints, and the interface to fruit is in fact guaranteed to be unique and enforced.

The problem though is that if you have a very large number of subtypes, this becomes sufficiently complex that constraint triggers against the fruit_ref inheritance tree may become a net win over individual foreign keys.  Inheritance also simplifies adding sub-types because you inherit the subtype interface and then work from there for joins and doesn't have to be used to enforce keys to get there however.
 

This seems to work well especially if you have a lot of
specializations of the 'base type' and you can season deletions to
taste with appropriate RI triggers if you want.  An alternate way to
do it is to include fruit.type in the primary key, forcing the
dependent fruit back to the proper record though.  My main gripe about
it is that it there's no way to make sure that a 'fruit' points at the
proper dependent table based on type with a pure constraint.

deferred foreign key constraints can do that if you add one key to fruit for each base table. 

Yet another way of doing this is to simple hstore the extended
properties into the base table so that everything is stuffed in one
table -- that discards all type safety though.  I'm curious about what
others have come up with in terms of solving this problem.

Aside: a better way of doing this is the problem that table
inheritance was trying to solve (and didn't).

Inheritance certainly hasn't become hassle-free in terms of this sort of modelling and in fact is usually on the balance not a net gain.  It may be helpful in defining interfaces however, and as we get more and better tools (excited about NO INHERIT constraints in 9.2) it may yet develop into a more generally useful tool of this sort.  What it can do here however is define an interface for a join.  You could probably inherit fruit from the join class to but if you do that, if you ever have to do a constraint trigger, against the tree, you will run into problems.  It's better to have the two sides of the foreign key not in the same inheritance subtree.

Best Wishes,
Chris Travers

Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Dmitriy Igrishin
Date:


2012/8/29 Merlin Moncure <mmoncure@gmail.com>
On Tue, Aug 28, 2012 at 3:25 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
> 2012/8/20 Merlin Moncure <mmoncure@gmail.com>
>>
>> On Sun, Aug 19, 2012 at 8:14 AM, Dmitriy Igrishin <dmitigr@gmail.com>
>> wrote:
>> >> For various reasons, this often goes the wrong way.  Views are often
>> >> the right way to go.  +1 on your comment above -- the right way to do
>> >> views (and SQL in general) is to organize scripts and to try and avoid
>> >> managing everything through GUI tools.  It works.
>> >
>> > The drawback of this approach is that in some cases we need a
>> > factory function(s) (in terms of the OOP) which returns one or a
>> > set of objects (i.e. the function returns the view type). But since
>> > the views are not in the dump we are forced to abandon this solution
>> > and go with workarounds (such as creating extra composite types
>> > to use as returning values or use the tables).
>>
>> Could you elaborate on this?
>
>  Suppose we've designed a simple class hierarchy (I'll use C++ notation):
> class User { ... };
> class Real_user : public User { ... };
> class Pseudo_user : public User { ... };
>
> Suppose we've decided that objects of these classes will be stored
> in one database table:
> CREATE TYPE user_type AS ENUM ('real', 'pseudo');
> CREATE TABLE user (id serial NOT NULL,
>                                  tp user_type NOT NULL,
>                                  user_property1 text NOT NULL,
>                                  user_property2 text NOT NULL,
>                                  real_user_property1 text NULL,
>                                  real_user_property2 text NULL,
>                                  pseudo_user_property1 text NULL);
>
> For simple mapping we've creating the (updatable, with rules) views:
> CREATE VIEW real_user_view
>   AS SELECT * FROM user WHERE tp = 'real';
>
> CREATE VIEW pseudo_user_view
>   AS SELECT * FROM user WHERE tp = 'pseudo';
>
> CREATE VIEW user_view
>   AS SELECT * FROM real_user_view
>         UNION ALL SELECT * FROM pseudo_user_view;
>
> The C++ classes above will operate on these views.
> Finally, suppose we need a function which gets a Real_user's
> instance by known identifier (or a key):
> The C++ function may be defined as:
> Real_user* real_user(int id);
>
> At the same time this function can call PL/pgSQL's function:
> CREATE FUNCTION real_user(id integer)
> RETURNS real_user_view ...
>
> So, the factory function real_user() is depends on the view. And
> when the views are not in the dump (stored in the separate place)
> this is an annoying limitation and we must use some of the
> workarounds. (Use the table "user" as a return value or create
> an extra composite type with the same structure as for the real_user_view).

Hm, couple points (and yes, this is a common problem):
*) how come you don't have your function depend on the table instead
of the view?  this has the neat property of having the function
I always do emphasis on the code style and on the easiness of
maintenance. And I looks at the views as on the classes (aka abstractions).
In many cases I don't want to care how (and where) the data is actually
stored -- in the one table, or in the many tables, or whatever.
AFAIK, the main goal of the views to provide such abstraction.
automatically track added columns to the table.
Agreed, this is a nice feature.

*) if that's still a headache from dependency point of view, maybe you
can use composite-type implemented table:
postgres=# create type foo as  (a int, b int);
CREATE TYPE
postgres=# create table bar of foo;
CREATE TABLE
postgres=# create view baz as select * from bar;
CREATE VIEW
postgres=# alter type foo add attribute c int cascade;
ALTER TYPE
postgres=# \d bar
      Table "public.bar"
 Column |  Type   | Modifiers
--------+---------+-----------
 a      | integer |
 b      | integer |
 c      | integer |
Typed table of type: foo
Thanks for the solution! But it seems like a workaround here.

*) do you really need a factory function to create 'user'  -- why not
allow regular inserts?
By "factory function" I mean the function which creates an instance
for the client -- i.e. selecting object from the data source :-)


*) I usually do some variant of this:

create table fruit
(
  fruit_id int primary key,
  type text,
  freshness numeric
);

create table apple
(
  fruit_id int primary key references fruit on delete cascade
    deferrable initially deferred,
  cyanide_content numeric
);

create table orange
(
  fruit_id int primary key references fruit on delete cascade
    deferrable initially deferred,
  vitamin_c_content numeric
);

create or replace function hs(r anyelement) returns hstore as
$$
  select hstore($1);
$$ language sql immutable strict;

create or replace view fruit_ext as
  select f.*,
    coalesce(hs(a), hs(o)) as properties
  from fruit f
  left join apple a using(fruit_id)
  left join orange o using(fruit_id);

insert into fruit values(1, 'apple', 2.0);
insert into fruit values(2, 'orange', 3.5);

insert into apple values(1, 0.00003);
insert into orange values(2, 0.012);

This seems to work well especially if you have a lot of
specializations of the 'base type' and you can season deletions to
taste with appropriate RI triggers if you want.  An alternate way to
do it is to include fruit.type in the primary key, forcing the
dependent fruit back to the proper record though.  My main gripe about
it is that it there's no way to make sure that a 'fruit' points at the
proper dependent table based on type with a pure constraint.
Yes, it seems to work in very simple cases. But I would not have to deal with it
because it's seems to hard to maintaince. And for the project with hundred classes
it seems to be a nightmare! :-)

Yet another way of doing this is to simple hstore the extended
properties into the base table so that everything is stuffed in one
table -- that discards all type safety though.  I'm curious about what
others have come up with in terms of solving this problem.
Agree, hstore is useful for this cases. But again, it's just an implementation
detail *how* to store object properties.

Aside: a better way of doing this is the problem that table
inheritance was trying to solve (and didn't).
Btw, is there are some ideas to implement virtual functions in Postgres? :-)
Without these functions iheritance are useless from the point of the OOP.

--
// Dmitriy.


Re: Views versus user-defined functions: formatting, comments, performance, etc.

From
Merlin Moncure
Date:
On Wed, Aug 29, 2012 at 8:52 AM, Chris Travers <chris.travers@gmail.com> wrote:
> ALTER TABLE fruit ADD apple_id int;
> ALTER TABLE fruit ADD FOREIGN KEY (apple_id, type)
>            REFERENCES apple (fruit_id, type)
>            DEFERRABLE INITIALLY DEFERRED;
>
> And then do the same for orange etc.  you can then:
>
> ALTER TABLE fruit ADD CHECK ((type = 'apple' and apple_id IS NOT NULL) OR
> (type = 'orange' AND orange_id IS NOT NULL)
> etc....

Doing the above for me is a 'bridge too far'.  Sure, It's the only way
to make sure the base type is properly specialized but it just sucks.
If there was some way to 'C union' the value into a single column
(there isn't) i'd be all over it...

merlin