Thread: Would like to contribute a section to docs for 9.3. Where to start?
Hi all;
* Table inheritance
I would like to contribute a "What is an Object Relational database?" section to the documentation for 9.3. Where is the best place to start tools and community-process-wise?
My thinking is that since people are often confused by this label, it would be worth describing what it means, and describing in brief detail object-relational features in PostgreSQL.
My thinking is to cover the following features briefly:
* Table inheritance
* Type Extensibility
* Tuples as Types, casting tuples to various other types.
I am thinking of skipping over things that may be seen as misfeatures, such as class.function syntax although this could be useful in the case of simulating calculated fields. What do people think there? Is this a feature or a misfeature?
Best Wishes,
Chris Travers
Chris Travers <chris.travers@gmail.com> writes: > I would like to contribute a "What is an Object Relational database?" > section to the documentation for 9.3. Where is the best place to start > tools and community-process-wise? > My thinking is that since people are often confused by this label, it would > be worth describing what it means, and describing in brief detail > object-relational features in PostgreSQL. I think there's a discussion that has to happen before that one, which is whether we should continue pushing that term for Postgres. It was originally applied by the Berkeley guys, well over twenty years ago, to code that didn't even speak the same language as now (PostQUEL vs SQL). So it's fair to ask whether the vision of the project is still the same as then. Simon for one thinks differently: http://database-explorer.blogspot.com/2012/08/postgresql-multi-model-database-server.html > My thinking is to cover the following features briefly: > * Table inheritance > * Type Extensibility > * Tuples as Types, casting tuples to various other types. I think PG's type extensibility features come out of the abstract-data-type culture more than than the object culture. In particular, PG data types generally don't have any notion of "IsA" subclass relationships, though the rowtypes of inherited tables do have that. (Well, I guess you could claim that a domain IsA subclass of its base type, but SQL's domain feature is so impoverished that any object hacker would laugh at you.) So really the argument for calling PG object-relational comes down to table inheritance and the IsA relationship between tuples of inherited tables. Which is something I think few people even use anymore ... it definitely doesn't seem like a key selling point. > I am thinking of skipping over things that may be seen as misfeatures, such > as class.function syntax although this could be useful in the case of > simulating calculated fields. Agreed, that's not a major feature; it's just a notational detail that people have got varying opinions about. regards, tom lane
On Mon, Aug 13, 2012 at 7:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Travers <chris.travers@gmail.com> writes:I think there's a discussion that has to happen before that one, which
> I would like to contribute a "What is an Object Relational database?"
> section to the documentation for 9.3. Where is the best place to start
> tools and community-process-wise?
> My thinking is that since people are often confused by this label, it would
> be worth describing what it means, and describing in brief detail
> object-relational features in PostgreSQL.
is whether we should continue pushing that term for Postgres. It was
originally applied by the Berkeley guys, well over twenty years ago, to
code that didn't even speak the same language as now (PostQUEL vs SQL).
So it's fair to ask whether the vision of the project is still the same
as then. Simon for one thinks differently:
http://database-explorer.blogspot.com/2012/08/postgresql-multi-model-database-server.html
Agreed, and actually I came here after discussing this on -advocacy, and I recognize that there is still some controversy but everyone seems to agree that the way the term is currently used is confusing, and PostgreSQL doesn't really resemble the wikipedia article on Object-Relational databases.
However, it is hard to have a discussion regarding how to position PostgreSQL if we don't have a bunch of good alternatives, so I think it would still be worth offering even if the community ultimately decides to move a different direction. And of course these are not mutually exclusive either so if nothing else we have the ability of community members to position the database in other ways.
I think PG's type extensibility features come out of the
> My thinking is to cover the following features briefly:
> * Table inheritance
> * Type Extensibility
> * Tuples as Types, casting tuples to various other types.
abstract-data-type culture more than than the object culture.
That's probably worth noting.
In
particular, PG data types generally don't have any notion of "IsA"
subclass relationships, though the rowtypes of inherited tables do have
that.
So I noticed. You can still do some sorts of inheritance, just like you can do object-oriented programming in C....
(Well, I guess you could claim that a domain IsA subclass of its
base type, but SQL's domain feature is so impoverished that any object
hacker would laugh at you.)
So really the argument for calling PG object-relational comes down to
table inheritance and the IsA relationship between tuples of inherited
tables. Which is something I think few people even use anymore ...
it definitely doesn't seem like a key selling point.
I was looking at it differently, namely that there are a bunch of features that you can use together to build O-R systems. The complex types may not support inheritance, but with casts you can get some limited polymorphism. Moreover the fact that relations are classes means that you can create casts of tuples to other types. For example:
create table foo (
bar text,
baz int
); -- simple union type
insert into foo (bar, baz) values ('test', '1');
create function foo_to_int (foo) returns int as
$$ select $1.baz $$ language sql;
create cast (foo as int) with function foo_to_int(foo) as implicit;
select foo + 1 as value from foo;
value
-------
2
(1 row)
This is a trivial example and I would probably include it only by description, but the point is that the combination of casts, functions, and tables as classes allows you to create some degree of polymorphism. For example we could take an employee table and add a name function that concatenates the first and last name together according to some logic. We could then index the output of that function for full text searching.
While you can't do inheritance easily with complex types, these can still be used to create abstract interfaces and the use of explicit casts might give you something like it though you'd have a fair bit of work to implement such a system.
Agreed, that's not a major feature; it's just a notational detail that
> I am thinking of skipping over things that may be seen as misfeatures, such
> as class.function syntax although this could be useful in the case of
> simulating calculated fields.
people have got varying opinions about.
Heck, I have varying opinions about it and my opinion on this feature is rather fluid at any given poitn. However, I am thinking that maybe mentioning it up front would mean fewer people get taken by surprise by it.
Best Wishes,
Chris Travers
On Mon, 2012-08-13 at 03:41 -0700, Chris Travers wrote: > Hi all; > > > I would like to contribute a "What is an Object Relational database?" > section to the documentation for 9.3. Where is the best place to > start tools and community-process-wise? > > > My thinking is that since people are often confused by this label, it > would be worth describing what it means, and describing in brief > detail object-relational features in PostgreSQL. A concrete example of the confusion caused by our current branding (with no supporting documentation) recently appeared on -novice: http://archives.postgresql.org/message-id/21709bce-0308-4e6f-9e1c-b9dc95360fe4@googlegroups.com That's exactly the kind of (potential) user we should target with this document. Regards, Jeff Davis
As a note here, I think one of the fundamental difficulties in figuring out how to position PostgreSQL (whether using Simon's multi-model idea or Object-Relational, something else entirely, or some combination) is that PostgreSQL is an extraordinarily competent and full-featured database management system. I have a very rough draft of how I'd explain it I will send here for some feedback in terms of general message and accuracy before I look at adapting it as a patch against the docs.
However, while I was going through this and asking "how would I build something utilizing object-oriented approaches in PostgreSQL?" I realized how few of the features of this sort I was currently using. I have been using PostgreSQL since 1999, and been seriously been trying to use advanced features for six, and I realized I have barely begun to scratch the surface. It's really refreshing to look at this and realize that even after 12-13 years of becoming familiar with a piece of software, a little exercise like this provides all sorts of features that would simplify your life.
The fact is that what PostgreSQL really is, inside the box, is a transactional development environment where operations occur in a relational-native way and this is largely how I am approaching it. Object-relational in terms of PostgreSQL seems to mean "relational along with a bunch of tools useful for building object interfaces." I think a lot of the multi-model features that Simon talks about can be understood in these terms as well. If I was going to coin a term to call this, I would call it a "Transactional/relational development environment." Just as you can do object-oriented programming in C, PostgreSQL lets you do this in SQL.
Also in my tests, I found that inherited relations do not inherit casts. Is this intentional? Is there a reason I should be putting into the documentation? Or is it just a gotcha that should be listed as a caveat?
Best Wishes,
Chris Travers
Chris Travers <chris.travers@gmail.com> writes: > Also in my tests, I found that inherited relations do not inherit casts. > Is this intentional? Um, in what way exactly? It's true that we don't consider that a cast from X to Y and a cast from Y to Z should automatically give you a cast from X to Z, but that's not specific to inherited relations. Or did you have something else in mind? regards, tom lane
So here is a very rough draft. I would be interested in feedback as to inaccuracies or omissions. I would like to get the technical side right before going into an editorial phase.
Best Wishes,
Any feedback on the technical side?
Best Wishes,
Chris Travers
How is PostgreSQL "Object-Relational?"
The term Object-Relational has been applied to databases which attempt to bridge the relational and object-oriented worlds with varying degrees of success. Bridging this gap is typically seen as desirable because object-oriented and relational models are very different paradigms and programmers often do not want to switch between them. There are, however, fundamental differences that make this a very hard thing to do well. The best way to think of PostgreSQL in this way is as a relational database management system with some object-oriented features.
PostgreSQL is a development platform in a box. It supports stored procedures written in entirely procedural languages like PL/PGSQL or Perl without loaded modules, and more object-oriented languages like Python or Java, often through third party modules. To be sure you can't write a graphical interface inside PostgreSQL, and it would not be a good idea to write additional network servers, such as web servers, directly inside the database. However the environment allows you to create sophisticated interfaces for managing and transforming your data. Because it is a platform in a box the various components need to be understood as different and yet interoperable. In fact the primary concerns of object-oriented programming are all supported by PostgreSQL, but this is done in a way that is almost, but not quite, entirely unlike traditional object oriented programming. For this reason the "object-relational" label tends to be a frequent source of confusion.
Data storage in PostgreSQL is entirely relational, although this can be degraded using types which are not atomic, such as arrays, XML, JSON, and hstore. Before delving into object-oriented approaches, it is important to master the relational model of databases. For the novice, this section is therefore entirely informational. For the advanced developer, however, it is hoped that it will prove inspirational.
In object-oriented terms, very relation is a class, but not every class is a relation. Operations are performed on sets of objects (an object being a row), and new row structures can be created ad-hoc. PostgreSQL is, however, a strictly typed environment and so in many cases, polymorphism requires some work.
Data Abstraction and Encapsulation in PostgreSQL
The relational model itself provides some tools for data abstraction and encapsulation, and these features are taken to quite some length in PostgreSQL. Taken together these are very powerful tools and allow for things like calculated fields to be simulated in relations and even indexed for high performance.
Views are the primary tool here. With views, you can create an API for your data which is abstracted from the physical storage. Using the rules system, you can redirect inserts, updates, and deletes from the view into underlying relations, preferably using user defined functions. Being relations, views are also classes.
A second important tool here is the ability to define what appear to be calculated fields using stored procedures. If I create a table called "employee" with three fields (first_name, middle_name, last_name) among others, and create a function called "name" which accepts a single employee argument and concatenates these together as "last_name, first_name middle_name" then if I submit a query which says:
select e.name from employee e;
it will transform this into:
select name(e) from employee e;
This gives you a way to do calculated fields in PostgreSQL without resorting to views. Note that these can be done on views as well because views are relations. These are not real fields though. Without the relation reference, it will not do the transformation (so SELECT name from employee will not have the same effect).
Messaging and Class API's in PostgreSQL
A relation is a class. The class is accessed using SQL which defines a new data structure in its output. This data structure unless defined elsewhere in a relation or a complex type cannot have methods attached to it and therefore can not be used with the class.method syntax described above. There are exceptions to this rule, of course, but they are beyond the scope of this introduction. In general it is safest to assume that the output of one query, particularly one with named output fields, cannot safely be used as the input to another.
A second messaging aparatus in PostgreSQL is the LISTEN/NOTIFY framework which can be used along with triggers to issue notifications to other processes when a transaction commits. This approach allows you to create queue tables, use triggers to move data into these tables (creating 'objects' in the process) and then issuing a notification to another process when the data commits and becomes visible. This allows for very complex and and interactive environments to be built from modular pieces.
Polymorphism in PostgreSQL
PostgreSQL is very extensible in terms of all sorts of aspects of the database. Not only can types be created and defined, but also operators can be defined or overloaded.
A more important polymorphism feature is the ability to cast one data type as another. Casts can be implicit or explicit. Implicit casts, which have largely been removed from many areas of PostgreSQL, allow for PostgreSQL to cast data types when necessary to find functions or operators that are applicable. Implicit casting can be dangerous because it can lead to unexpected behavior because minor errors can lead to unexpected results. '2012-05-31' is not 2012-05-31. The latter is an integer expression that reduces to 1976. If you create an implicit cast that turns an integer into a date being the first of the year, the lack of quoting will insert incorrect dates into your database without raising an error ('1976-01-01' instead of the intended '2012-05-31'). Implicit casts can still have some uses.
Inheritance in PostgreSQL
In PostgreSQL tables can inherit from other tables. Their methods are inherited but their castes are not, nor are their indexes. This allows you develop object inheritance hierarchies in PostgreSQL. Multiple inheritance is possible.
Table inheritance is an advanced concept and has many gotchas. Please refer to the proper sections of the manual for more on this topic. On the whole it is probably best to work with table inheritance first in areas where it is more typically used, such as table partitioning, and later look at it in terms of object-relational capabilities.
Overall the best way to look at PostgreSQL as an object-relational database is a database which provides very good relational capabilities plus some advanced features that allows one do create object-relational systems on top of it. These systems can then move freely between object-oriented and relational worldviews but are still more relational than object-oriented. At any rate they bear little resemblance to object-oriented programming environments today. With PostgreSQL this is very much a toolkit approach for object-relational databases building on a solid relational foundation. This means that these are advanced functions which are powerful in the hands of experienced architects, but may be skipped over at first.
On 8/15/12 5:33 AM, Chris Travers wrote: > So here is a very rough draft. I would be interested in feedback as to > inaccuracies or omissions. I would like to get the technical side right > before going into an editorial phase. > > Any feedback on the technical side? [citation needed] Seriously, if we are trying to justify our use of seemingly standard academic terms, we should have some references to where those are defined or at least discussed. Otherwise we are just begging the question: PostgreSQL is object-relational because we say so.
On Fri, Aug 17, 2012 at 1:03 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 8/15/12 5:33 AM, Chris Travers wrote:[citation needed]So here is a very rough draft. I would be interested in feedback as to
inaccuracies or omissions. I would like to get the technical side right
before going into an editorial phase.
Any feedback on the technical side?
Seriously, if we are trying to justify our use of seemingly standard academic terms, we should have some references to where those are defined or at least discussed. Otherwise we are just begging the question: PostgreSQL is object-relational because we say so.
Good point.
I found two interesting resources quickly which seem on point:
http://infolab.usc.edu/csci585/Spring2010/den_ar/ordb.pdf which appears to be chapter 1 of http://www.amazon.com/Object-Relational-Database-Development-Plumbers-CD-ROM/dp/0130194603
and
But this doesn't really get us beyond the "because we say so" given the connection between Informix and PostgreSQL.
It really looks to me like Postges was given the name Object-Relational by Stonebreaker as a way of saying "here's what I am trying to play around with" and the databases which describe themselves in these terms seem either inspired by or forks of Postgres ;-).
Best Wishes,
Chris Travers
On Fri, 2012-08-17 at 16:03 -0400, Peter Eisentraut wrote: > On 8/15/12 5:33 AM, Chris Travers wrote: > > So here is a very rough draft. I would be interested in feedback as to > > inaccuracies or omissions. I would like to get the technical side right > > before going into an editorial phase. > > > > Any feedback on the technical side? > > [citation needed] > > Seriously, if we are trying to justify our use of seemingly standard > academic terms, we should have some references to where those are > defined or at least discussed. Otherwise we are just begging the > question: PostgreSQL is object-relational because we say so. I feel like the bar is becoming pretty high for this document. It must: 1. Settle on an accepted criteria for ORDBMS 2. Describe how postgres meets that criteria in a way that's: a. compelling to users b. connects with OOP so the users don't feel like it's a bait-and-switch or get confused by starting with the wrong expectation I feel like making #1 compatible with 2(a) requires some creativity; and #1 might be incompatible with 2(b) entirely. Regards, Jeff Davis
On Sat, Aug 18, 2012 at 12:12 PM, Jeff Davis <pgsql@j-davis.com> wrote:
I feel like the bar is becoming pretty high for this document. It must:On Fri, 2012-08-17 at 16:03 -0400, Peter Eisentraut wrote:
> On 8/15/12 5:33 AM, Chris Travers wrote:
> > So here is a very rough draft. I would be interested in feedback as to
> > inaccuracies or omissions. I would like to get the technical side right
> > before going into an editorial phase.
> >
> > Any feedback on the technical side?
>
> [citation needed]
>
> Seriously, if we are trying to justify our use of seemingly standard
> academic terms, we should have some references to where those are
> defined or at least discussed. Otherwise we are just begging the
> question: PostgreSQL is object-relational because we say so.
1. Settle on an accepted criteria for ORDBMS
Probably Mike Stonebreaker's paper can be referred to here. Also it looks like Oracle used to have a document describing "object-relational" features in Oracle 10. Reading through other people's views, I think Oracle might actually be ahead of us here, but... The problem here is relatively complex and I am afraid if I go and re-iterate everything I will end up with another book >:-D
Not that this would be a bad thing. I did find Oracle's somewhat short book (of 200 pages) on the subject at http://docs.oracle.com/cd/B19306_01/appdev.102/b14260.pdf
However if I am doing a book by myself I am either going to publish it or release it myself. A document of that scope is a little wider-range than I would like to just hand off to the community.
I think it will be worth pointing out that Oracle is an ORDBMS as well and is really the major non-Pg-descended ORDBMS I can find on the market today.
2. Describe how postgres meets that criteria in a way that's:
a. compelling to users
b. connects with OOP so the users don't feel like it's a
bait-and-switch or get confused by starting with the
wrong expectation
I feel like making #1 compatible with 2(a) requires some creativity; and
#1 might be incompatible with 2(b) entirely.
The more I work with this and am trying to figure out how to apply these in my own work the more I am convinced that this does connect with OOP just, as I said, in a way that is almost but not entirely unlike normal OOP.
The way I would describe it in simple terms is that a standard RDBMS operates on sets of tuples. An ORDBMS operates on sets of objects. Those objects may have methods, may be polymorphic, and may be encapsulated behind interfaces. As Stonebreaker said in his paper, this is a marriage between the set-oriented relational database and the primitives of object oriented programming. Consequently the way to look at it is that you have a relational database with object oriented features which makes this sort of operation possible (and that is, as best as I can see, how Oracle actually positions their product as well).
But more to the point, what do people think would be a valuable role for this document? I was thinking initially of a *brief* description of what was meant so that people didn't get too confused. Maybe it would be better to save the brief description for later and write a longer document first that could be incorporated into the brief document by reference? Maybe a book entitled "Object-Relational Programming in PostgreSQL" since this is something I have started to delve deeply into for LedgerSMB. Maybe by that point we can figure out whether we are pushing Object-Relational features as a subset of a multi-model approach or vice versa. Indeed ontologically speaking, I am not sure what the difference between multi-model and object-relational is since Simon seems to think that object-relational is a subset of multi-model and I think multi-model is a feature of object-relational ;-). This being said, of course there may be marketing reasons to push one or the other as a primary term.
Best wishes,
Chris Travers
Chris Travers
On Sat, 2012-08-18 at 18:56 -0700, Chris Travers wrote: > I was thinking initially of a *brief* description of what was meant so > that people didn't get too confused. +1. I was imagining two contrasting examples, one using the relational mindset and one using the O-R mindset. Inheritance and composite types are easy enough to understand. Perhaps those could make for a short example on the O-R side while still looking different enough from a traditional relational approach. Regards, Jeff Davis
Given the (generally helpful) feedback here what I have decided to do for now is to run a blog series developing the use cases for O-R functionality in PostgreSQL, and return after that with a new draft, possibly distilled from that.
Best Wishes,
I will post again, perhaps, in a couple months.
Best Wishes,
Chris Travers