88, CREATE FUNCTION ON TABLE - Mailing list pgsql-general

From Clark Evans
Subject 88, CREATE FUNCTION ON TABLE
Date
Msg-id 367EED8A.A9620A1B@manhattanproject.com
Whole thread Raw
List pgsql-general
On several occasions I've heard my father, an old cobol hack, cry that
none of the current databases nor programming languages have an "88".
Since he's my dad I often ignored him, but finally, some 5 years later
I now know what he was talking about.   In my last contract
assignment, I was given the job of training / helping a bunch of
COBOL programmers to convert their code to PL/SQL.
Their code was "beautiful"... really.  It's unfortunate that they had
to convert to Oracle PL/SQL, which is very much inferior.  Anyway,
in this e-mail I describe exactly what an COBOL level 88 is, and
suggest an improvement to PostgreSQL.

When defining a data structure, much like a table description,
a COBOl programmer describes the record by listing its members
and providing a data type for each.  Here is psuto code:

01   customer
     10  customer_id          pic9(10)
     10  customer_name        picX(30)
     10  customer_status      picX(01)
         88 active-client       value "A" "a".
         88 historical-client   value "H" "h".
         88 invalid-client      value "I" "i".
         88 potential-client    value "P" "p".
         88 current-client      value "A" "a" "P" "p"
     10  sales_rep            pic9(10)

000-process
    Display "processing based on 88"
    Evaluate
       When active-client
          ... do stuff ...
       When historica-client
          ... do other stuff ...
       When current-client
          ... do even more stuff ...

First, I'm sure this is not valid COBOL, I'm only trying to
demonstrate what an 88 is.  In logical terms, it is a
predicate, which takes the current fields as parameters.

To write our example above (in oracle),

create table customer
( customer_id      number(10),
  customer_name    varchar(30),
  customer_status  char(1),
  sales_rep        number(10)
);


And then to select all "current"
clients we get something like this:

select customer_id, customer_name, sales_rep
from customer
where customer_status in ('A','a','P'.'p');

Then we can progress.... right?

Well, this is fine on a small scale, but when you
have 30+ tables with code schemes like this (as I've
seen in many-a-companies production database)
it gets to be a horrendous nightmare.

At my assignment before the last assignment, I was
dealing with this problem, but in a rather annoying way.
For every table, I'd define a view.  The view would
do the decode logic... like:

create view customer_view as
select client_id, client_name, sales_rep,
   DECODE(client_status,'A',TRUE,'a',TRUE,FALSE) active_client,
   DECODE(client_status,'H',TRUE,'h',TRUE,FALSE) historical_client,
   DECODE(client_status,'I',TRUE,'i',TRUE,FALSE) invalid_client,
   DECODE(client_status,'P',TRUE,'p',TRUE,FALSE) potential_client,
   DECODE(client_status,'A',TRUE,'a',TRUE,'P',TRUE,'p',TRUE,FALSE)
current_client
from customer;

so that you can do:

select customer_id, customer_name, sales_rep
from customer
where current_client IS TRUE

huge improvement from a maintenance standpoint.
A few problems:
  * It invalidates the indexes *cry* (_big_ problem)
  * Having IS TRUE is unsightly and annoying.
  * Using views doubles the number of relations
  * etc.

Once you get to 30+ tables, this solution,
aside from the index problem, is very good, but
it could be better.

I was thinking......

Why couldn't we define a function (sorry I dont' have PosgreSQL
syntax down yet... so I'm reverting to Oracle PL/SQL) like so:

CREATE FUNCTION is_active_client
ON client
IS
BEGIN
  RETURN ( client_staus IN ('A','a') );
END;

Or... better yet, use the "embedded" syntax like
constraints...

create table customer
( customer_id      number(10),
  customer_name    varchar(30),
  customer_status  char(1),
  sales_rep        number(10),
  --
  is_active_client customer_status in ('A','a'),
  etc.
);


Ok, the syntax needs help, but I hope the point makes sence,
Then we can have queries like:

select customer_id, customer_name, sales_rep
from customer
where is_active_client;

Now!  That is much much more readable and maintainable.
Also, the "language" for functions "on" a relation could
be limited to that which will pass through the query
optimizer, thus the above would merely be "expanded"
to the appropriate back end sql... traditional "functions"
have the problem that they typically disable indexes,
are always called, etc.  These type of boolean functions
or predicate, if you may, would only be evaluated when
if needed...

After writing this, I guess this is similar to adding
a "method" to the table, although I think that
what I'm proposing is far more limited in scope.

You could have these things to "existence" checks, etc.

aka

CREATE FUNCTION has_line_items
ON order
IS
BEGIN
   RETURN exists (select 'x' from line_items where line_items.order_no =
order.order_no);
END;


Hmm.  I guess what'm trying to do is allow the programmer
who defines the tables to define the "chunks of logic" that
make sense in a where clause for the application programmers.

In this way, shielding application programmers from database
logic changes and also providing self-documentation for the system.
It would also allow end-users with a MS Query like tool to
be much more efficient... instead of exposing 4! = 4*3*2*1 = 24 views
with
all the possible permutations of queries that an end user
might want, you could expose the table and 4 of these "predicates".
A visual tool could then help the manager build their query.

I have implemented this in a corporate system using very large
views (over 40 additional columns)...but it quickly became painful
and unmanageable.  I feel that 40 "predicates" would be a
completely different story...

Advanced version of this feature would allow the function
to be valid for more than one table, if every target table
had the required columns.  In postgreSQL, a clear strategy
would have to be developed for such a feature to work with
inherited classes, and a nice polymorphism scheme created.

But even if it does not go that far... the "basic" version
is more than wonderful.

Anyway... what do you all think?  Sorry for being so wordy.

Best,

Clark






pgsql-general by date:

Previous
From: David Hartwig
Date:
Subject: Re: [GENERAL] Slow Searches using MSAccess and ODBC to aPostGreSQL database
Next
From: Adriaan Joubert
Date:
Subject: Postgres 6.4.1 on DEC-ALPHA