88, CREATE FUNCTION ON TABLE - Mailing list pgsql-general
From | Clark Evans |
---|---|
Subject | 88, CREATE FUNCTION ON TABLE |
Date | |
Msg-id | 367EED8A.A9620A1B@manhattanproject.com Whole thread Raw |
List | pgsql-general |
On several occasions I've heard my father, an old cobol hack, cry that none of the current databases nor programming languages have an "88". Since he's my dad I often ignored him, but finally, some 5 years later I now know what he was talking about. In my last contract assignment, I was given the job of training / helping a bunch of COBOL programmers to convert their code to PL/SQL. Their code was "beautiful"... really. It's unfortunate that they had to convert to Oracle PL/SQL, which is very much inferior. Anyway, in this e-mail I describe exactly what an COBOL level 88 is, and suggest an improvement to PostgreSQL. When defining a data structure, much like a table description, a COBOl programmer describes the record by listing its members and providing a data type for each. Here is psuto code: 01 customer 10 customer_id pic9(10) 10 customer_name picX(30) 10 customer_status picX(01) 88 active-client value "A" "a". 88 historical-client value "H" "h". 88 invalid-client value "I" "i". 88 potential-client value "P" "p". 88 current-client value "A" "a" "P" "p" 10 sales_rep pic9(10) 000-process Display "processing based on 88" Evaluate When active-client ... do stuff ... When historica-client ... do other stuff ... When current-client ... do even more stuff ... First, I'm sure this is not valid COBOL, I'm only trying to demonstrate what an 88 is. In logical terms, it is a predicate, which takes the current fields as parameters. To write our example above (in oracle), create table customer ( customer_id number(10), customer_name varchar(30), customer_status char(1), sales_rep number(10) ); And then to select all "current" clients we get something like this: select customer_id, customer_name, sales_rep from customer where customer_status in ('A','a','P'.'p'); Then we can progress.... right? Well, this is fine on a small scale, but when you have 30+ tables with code schemes like this (as I've seen in many-a-companies production database) it gets to be a horrendous nightmare. At my assignment before the last assignment, I was dealing with this problem, but in a rather annoying way. For every table, I'd define a view. The view would do the decode logic... like: create view customer_view as select client_id, client_name, sales_rep, DECODE(client_status,'A',TRUE,'a',TRUE,FALSE) active_client, DECODE(client_status,'H',TRUE,'h',TRUE,FALSE) historical_client, DECODE(client_status,'I',TRUE,'i',TRUE,FALSE) invalid_client, DECODE(client_status,'P',TRUE,'p',TRUE,FALSE) potential_client, DECODE(client_status,'A',TRUE,'a',TRUE,'P',TRUE,'p',TRUE,FALSE) current_client from customer; so that you can do: select customer_id, customer_name, sales_rep from customer where current_client IS TRUE huge improvement from a maintenance standpoint. A few problems: * It invalidates the indexes *cry* (_big_ problem) * Having IS TRUE is unsightly and annoying. * Using views doubles the number of relations * etc. Once you get to 30+ tables, this solution, aside from the index problem, is very good, but it could be better. I was thinking...... Why couldn't we define a function (sorry I dont' have PosgreSQL syntax down yet... so I'm reverting to Oracle PL/SQL) like so: CREATE FUNCTION is_active_client ON client IS BEGIN RETURN ( client_staus IN ('A','a') ); END; Or... better yet, use the "embedded" syntax like constraints... create table customer ( customer_id number(10), customer_name varchar(30), customer_status char(1), sales_rep number(10), -- is_active_client customer_status in ('A','a'), etc. ); Ok, the syntax needs help, but I hope the point makes sence, Then we can have queries like: select customer_id, customer_name, sales_rep from customer where is_active_client; Now! That is much much more readable and maintainable. Also, the "language" for functions "on" a relation could be limited to that which will pass through the query optimizer, thus the above would merely be "expanded" to the appropriate back end sql... traditional "functions" have the problem that they typically disable indexes, are always called, etc. These type of boolean functions or predicate, if you may, would only be evaluated when if needed... After writing this, I guess this is similar to adding a "method" to the table, although I think that what I'm proposing is far more limited in scope. You could have these things to "existence" checks, etc. aka CREATE FUNCTION has_line_items ON order IS BEGIN RETURN exists (select 'x' from line_items where line_items.order_no = order.order_no); END; Hmm. I guess what'm trying to do is allow the programmer who defines the tables to define the "chunks of logic" that make sense in a where clause for the application programmers. In this way, shielding application programmers from database logic changes and also providing self-documentation for the system. It would also allow end-users with a MS Query like tool to be much more efficient... instead of exposing 4! = 4*3*2*1 = 24 views with all the possible permutations of queries that an end user might want, you could expose the table and 4 of these "predicates". A visual tool could then help the manager build their query. I have implemented this in a corporate system using very large views (over 40 additional columns)...but it quickly became painful and unmanageable. I feel that 40 "predicates" would be a completely different story... Advanced version of this feature would allow the function to be valid for more than one table, if every target table had the required columns. In postgreSQL, a clear strategy would have to be developed for such a feature to work with inherited classes, and a nice polymorphism scheme created. But even if it does not go that far... the "basic" version is more than wonderful. Anyway... what do you all think? Sorry for being so wordy. Best, Clark
pgsql-general by date: