Thread: An introduction and a plea ...

An introduction and a plea ...

From
Emmanuel Charpentier
Date:
Dear pgsql-hackers list,

First a few words of introduction : I'm 43 and, while I have been 
introduced to computing for a long time (my first exposure was a small 
Fortran exercise I wrote in '74 (!) for a timesharing system on an 
hardcopy terminal ...), my coding abilities are somewhat rusty. I am 
mainly a user by now, no longer a coder, and my interests in computers 
is now in making my life simpler (I'm a biostatistician, among other 
things).I probably won't be contributing any code to PostgreSQL. Some 
bug reports, maybe ...

However, I've lurked on some of the PostgreSQL lists for 2 to 3 months 
(through the Web interface), and I feel that I might offer some 
advice, based on my past experience of seeing a lot of projects 
growing (or dying, due to feeping creaturism(TM) ...).

So I will shamelessly pull my first plea, related to the proposed 
change to the default behaviour of PostgreSQL in querying classes with 
subclasses.

I *strongly* suggest not to change anything in the default behaviour, 
which is what is expected from an SQL-compliant system, even if the 
database in question uses inheritance internally.

The reason for that plea is that a modification would crash any 
program not explicitly written for inheritance features : such 
features might be used by, say, the administrator and coere 
programmers of a database, who are not necessarily publish this 
internal use of inheritance to end-users. Furthermore, such a change 
would forbid evolution of a database from a pure-relational to an 
object-orien,ted one : the two representations would be incompatible.

It should also pointed out that most interface programs (such as ODBC 
or JDBC drivers) are not and will not in a foreseeable future be 
designed for use of these features. Modifying the default behaviour 
would break them.

Apart from that, I am, after 17 years of exposure to the concepts of 
object-oriented programming, still to be convinced of the value of 
this paradigm. This is *not* to suggest that these developments should 
be left over ! However, I *feel* that the real issues behind this 
concept are not yet fully understood, and that some deep theoretical 
work remains to be done (in logic, for example : while the 
well-understood relational theory directly relates to set theory, I 
think that a mathematically correct objects-and-types theory shoud 
emanate from category theory but remains to be created ...).

Your thoughs ?
                    Emmanuel Charpentier






Re: [HACKERS] An introduction and a plea ...

From
Chris Bitmead
Date:
Emmanuel Charpentier wrote:

> I *strongly* suggest not to change anything in the default behaviour,
> which is what is expected from an SQL-compliant system, even if the
> database in question uses inheritance internally.

Can I assure you that these changes have NO EFFECT on anybody who
does not use inheritance. i.e. Postgres will remain as SQL compliant
as it was before.

> The reason for that plea is that a modification would crash any
> program not explicitly written for inheritance features.

No it won't. If you don't use inheritance, you will not be effected in
any way.

> : such
> features might be used by, say, the administrator and coere
> programmers of a database, who are not necessarily publish this
> internal use of inheritance to end-users. Furthermore, such a change
> would forbid evolution of a database from a pure-relational to an
> object-orien,ted one : the two representations would be incompatible.
> 
> It should also pointed out that most interface programs (such as ODBC
> or JDBC drivers) are not and will not in a foreseeable future be
> designed for use of these features. Modifying the default behaviour
> would break them.

In my opinion, this change will give users of ODBC and such tools MORE
useful defaults. Of course if you are using a non-OO interface to an OO
database there will always be things you can't do. But IMHO, this gives
a more useful set of defaults as a trasition phase.

For example, currently if I have student and employee inheriting from
person, ODBC query of SELECT * from person will return... NOTHING! After
these changes the query will return all the persons (which happen to
be students and employees).

> Apart from that, I am, after 17 years of exposure to the concepts of
> object-oriented programming, still to be convinced of the value of
> this paradigm. This is *not* to suggest that these developments should
> be left over ! However, I *feel* that the real issues behind this
> concept are not yet fully understood, and that some deep theoretical
> work remains to be done (in logic, for example : while the
> well-understood relational theory directly relates to set theory, I
> think that a mathematically correct objects-and-types theory shoud
> emanate from category theory but remains to be created ...).

Well, the fact is people are using OO now, and it's hard for me to 
explain the development advantages of an OO database to someone who
is not coding. But if you really want to find out why an OO database
is good, head on over to versant.com or odi.com, download the database
and write a small application. Apart from anything else, some people
need the improved performance NOW, and can't wait for the academics
to give their stamp of approval. And OO database coding simplicity
is saving millions of $$$  NOW.


Re: [HACKERS] An introduction and a plea ...

From
Hannu Krosing
Date:
Emmanuel Charpentier wrote:
>
> However, I've lurked on some of the PostgreSQL lists for 2 to 3 months
> (through the Web interface), and I feel that I might offer some
> advice, based on my past experience of seeing a lot of projects
> growing (or dying, due to feeping creaturism(TM) ...).
> 
> So I will shamelessly pull my first plea, related to the proposed
> change to the default behaviour of PostgreSQL in querying classes with
> subclasses.
> 
> I *strongly* suggest not to change anything in the default behaviour,
> which is what is expected from an SQL-compliant system, even if the
> database in question uses inheritance internally.

I agree wrt the * returning different types of tuples from different 
subtypes.

I somewhata disagree about default selects/updates/deletes working on 
inherited tables by default - If we want PostgreSQL to evolve back 
to an ORDBMS. 

We should not change the defaul _yet_, but we should not exclude
the change in future. rather we should acknowledge the current state of
affairs wrt inheritance and declare it deprecated (dont use in new projects,
start fixing your old ones) 

> The reason for that plea is that a modification would crash any
> program not explicitly written for inheritance features : such
> features might be used by, say, the administrator and coere
> programmers of a database, who are not necessarily publish this
> internal use of inheritance to end-users.

I saw something similar when going from python 1.5.1 to 1.5.2 - suddenly 
some broken usage became a show-stopping bug instead of just ignering it 
with some hidden default usage. It did not byte me directly, but several 
of our developers had never read the introductory parts of docs, or had 
not understood what was said.

Currently inheritance features can be used in a very limited way - 

1. for defining a table that shares some columns with some other table(s)
this usage is actually broken, as it currently results in tables that can't 
be dumped properly after columns are added, and thus should be discouraged 
anyway until it is fixed.

2. for selecting (and not updating/deleting) from a group of said broken  tables, using a non-ansi syntax. The
performanceis also most likely  suboptimal, as indexes are not inherited.
 

Therefore I would propose the following, more radical approach - 

* officially acknowledge the current lacking OO support of PostgreSQL and  declare the current usages deprecated and
soon-to-be-removedin 7.0
 

* not remove the support for them in the backend, but instead start to investigate ways to fix the buga and add the
missingfeatures.
 

* hide the OO development behind "set ORDBMS to 'ON'", which case would behave in the new way for the current two OO
features(create .. inherits .., and select), if it is set to 'off' (the default) spit out a warning on each use but
behavecompatibly. (maybe make psql check if it is invoked as osql and send the set command  automatically)
 

* for migrating databases provide a way to dump inherited tables as standalone so that it would be easy for people to
clearup the inherits-as-macro usage
 

* The OO development should solve the following problems (independent of which syntax will be eventually used)
 1. if a table inherits another table, it has to (at least) inherit the    following by default
   1.1 columns - in a way that allows add/delete column (requires changes to       storage manager, probably
introductionof deleted/missing columns)
 
   1.2 indexes, both unique and ordinary, where unique indexes should be
unique       _over_all_tables_ involved
   1.3 constraints, including being the foreign end of foreign key constraint
 2. a way to go from OID to tuple
   The must efficient solution seems to be a file with a simple structure
that   has records of (TUPLE_OID,TABLE_OID) wher a record is added at each
insert.   As this file is ordered wrt. TULE_OID and has fixed size records, it can   be efficiently searche with binary
search.As it is append-only it is also   quite (probably most) efficient on inserts. I can't think of any solutions
usingcurrent structures which would be nearly as efficient. If we
 
sacrifice   space for lookup speed we may write all oids and never shrink that file
and    have a computed lookup whic would require at most one disk access per oid    lookup. We could use some kind of
weightedbinary search in any case.
 
   The same kind of file could be used for re_introducing time-travel in an   efficient way.   3. a way to get full
tuples(tuple type + all columns) from inherited
 
tables.
   This would require minimal changes to wire protocol, but more changes to    client API's.
 4.possibly a bit unrelated to OO, but still a must-do - Start working on a   binary cross-platform protocol, that
couldbe used for _both_   insert/update/delete and select (instead of current single-platform select   only binary
protocol)
   It would mean adding PREPARE to the backend (already exists in SPI)   as well as smarter client libraries that would
exposeit and that could    marshal binary data given to BIND over wire. Having PREPARE-d queries   can also speed up
ourperformance on standard benchmarks, as much of    prepare/optimise can be skipped.
 
 From there on it gets a bit foggy as it is really a distant future (possibly more than 1 year ;)
 5. become even more object-oriented and add methods to tables that can do   different things depending on which table
theyoperate on.
 
 6. allow writing these mathods in a platform-independent language    (java/python/tcl/perl/...) and also passed from
backendto frontend.
 

> Furthermore, such a change
> would forbid evolution of a database from a pure-relational to an
> object-orien,ted one : the two representations would be incompatible.

Do you propose the two-separate-parsers way of doing things ?
> It should also pointed out that most interface programs (such as ODBC
> or JDBC drivers) are not and will not in a foreseeable future be
> designed for use of these features. Modifying the default behaviour
> would break them.

Standard SQL queries should give standard SQL responses.

OTOH, there is an evolving API for interfacing ObjectDatabases with Java

> Apart from that, I am, after 17 years of exposure to the concepts of
> object-oriented programming, still to be convinced of the value of
> this paradigm.

My experience is exactly the opposite - after zenning the concept I'm unable 
to write anything longer than 15 lines that is not OO, (with the possible 
exclusion of SQL scripts, which do not fit nicely to that concept ;)

It does _not_ mean writing in an "OO language", but just a way of thinking 
about problems and expressing these thoughts.

> This is *not* to suggest that these developments should
> be left over ! However, I *feel* that the real issues behind this
> concept are not yet fully understood, and that some deep theoretical
> work remains to be done

There will _always_ remain theoretical work to be done, at least for any 
live concept.

> (in logic, for example : while the
> well-understood relational theory directly relates to set theory, I
> think that a mathematically correct objects-and-types theory shoud
> emanate from category theory but remains to be created ...).
> 
> Your thoughs ?

I suspect that OO programming as a whole could be complex enough that Goedels 
theorem forbids any complete"mathematically correct objects-and-types theory"

----------------
Hannu


Re: [HACKERS] An introduction and a plea ...

From
Chris Bitmead
Date:
Hannu Krosing wrote:

>   2. a way to go from OID to tuple
> 
>     The must efficient solution seems to be a file with a simple structure
> that
>     has records of (TUPLE_OID,TABLE_OID) wher a record is added at each
> insert.
>     As this file is ordered wrt. TULE_OID and has fixed size records, it can
>     be efficiently searche with binary search. As it is append-only it is also
>     quite (probably most) efficient on inserts. I can't think of any solutions
>     using current structures which would be nearly as efficient. 

If you have your suggested indexes that apply over multiple relations, I
can't
see why that can't be used for this too. It just means that if you use
ODBMS it
is recommended that you do a CREATE INDEX oid_idx ON object (oid), where
"object"
is a conceptual super-class of all other objects.

Your append-only file would grow without limit, which I think is a bit
of a
problem for some apps. Also the way ODBMS will work is an application
will 
ask for a chunk
of oids from the database, some of which may be later "wasted".(This is
how
Versant works and it is also a technique documented by Stonebraker in
his
postgres papers). This technique is so that applications don't have to
talk to the backend to create objects in the front end that need oids.
This means objects may not be created with oids in order.
So you have to store space for oids in your file that may not be used.

I think we need first more conventional style index that works well.
Then we
can experiment with more radical ideas.

>     The same kind of file could be used for re_introducing time-travel in an
>     efficient way.

How?

>   5. become even more object-oriented and add methods to tables that can do
>     different things depending on which table they operate on.

Does this definitely not work now?


Re: [HACKERS] An introduction and a plea ...

From
Hannu Krosing
Date:
Chris Bitmead wrote:
> 
> Hannu Krosing wrote:
> 
> >   2. a way to go from OID to tuple
> >
> >     The must efficient solution seems to be a file with a simple structure
> > that
> >     has records of (TUPLE_OID,TABLE_OID) wher a record is added at each
> > insert.
> >     As this file is ordered wrt. TULE_OID and has fixed size records, it can
> >     be efficiently searche with binary search. As it is append-only it is also
> >     quite (probably most) efficient on inserts. I can't think of any solutions
> >     using current structures which would be nearly as efficient.
> 
> If you have your suggested indexes that apply over multiple relations, I
> can't see why that can't be used for this too.

The insert performance would be much worse for indexes than for append-only
file.

> It just means that if you use ODBMS it is recommended that you do a 
> CREATE INDEX oid_idx ON object (oid), where "object"
> is a conceptual super-class of all other objects.
> 
> Your append-only file would grow without limit, which I think is a bit
> of a problem for some apps.

I meant vacuum to compress it (which AFAIK it does not do for indexes
currently)

> Also the way ODBMS will work is an application will ask for a chunk
> of oids from the database, some of which may be later "wasted".(This is
> how Versant works and it is also a technique documented by Stonebraker in
> his postgres papers). This technique is so that applications don't have to
> talk to the backend to create objects in the front end that need oids.
> This means objects may not be created with oids in order.
> So you have to store space for oids in your file that may not be used.

Yes, it needs some more book-keeping than I thought (keep the oid-file pages 
that could possibly be updated  in memory until the front-end which requested
the oids disconnects), or just assume all oids will be used and compress the 
unused ones below watermark out in VACUUM.

> I think we need first more conventional style index that works well.
> Then we can experiment with more radical ideas.

An index spanning multiple tables is quite radical anyway. Initially we could 
get by with multiple indexes and extra (but slow) check for uniqueness (when 
index is unique).

> 
> >     The same kind of file could be used for re_introducing time-travel in an
> >     efficient way.
> 
> How?

By writing (TID,TIMESTAMP) tuples there and using that info to retrieve tuples 
active at specified time by examinimg TIDs in "deleted" tuples.
As bot TID and TIMESTAMP should be monotonuously growing again binary search 
can be used on retrieve and inserts are append-only (meaning fast)

Both cases assume that we are oriented on fast inserts, as b-tree would
probably 
be faster than binary search on retrieves, but is much slower on inserts.

> 
> >   5. become even more object-oriented and add methods to tables that can do
> >     different things depending on which table they operate on.
> 
> Does this definitely not work now?

AFAIK functions are selected based on their arguments which can be either a
full 
tuple or several simple types, but not both.

So the first kind _may_ actually work, we must ask someone more familiar on
when 
the actual function is selected for "SELECT T.func() from TAB* T" queries.

--------------
Hannu


Re: [HACKERS] An introduction and a plea ...

From
Chris Bitmead
Date:
Hannu Krosing wrote:

> > >     The same kind of file could be used for re_introducing time-travel in an
> > >     efficient way.
> >
> > How?
> 
> By writing (TID,TIMESTAMP) tuples there and using that info to retrieve tuples
> active at specified time by examinimg TIDs in "deleted" tuples.
> As bot TID and TIMESTAMP should be monotonuously growing again binary search
> can be used on retrieve and inserts are append-only (meaning fast)

But since we are already storing all the time travel stuff already in
the
storage pages do we need this to reinstate time travel? Also if you
reinstate
time travel this way it will only work for people using this odbms
feature.
Wouldn't it be better to reinstate the old timetravel so it works for
everyone?


Re: [HACKERS] An introduction and a plea ...

From
Hannu Krosing
Date:
Chris Bitmead wrote:
> 
> Hannu Krosing wrote:
> 
> > > >     The same kind of file could be used for re_introducing time-travel in an
> > > >     efficient way.
> > >
> > > How?
> >
> > By writing (TID,TIMESTAMP) tuples there and using that info to retrieve tuples
> > active at specified time by examinimg TIDs in "deleted" tuples.
> > As bot TID and TIMESTAMP should be monotonuously growing again binary search
> > can be used on retrieve and inserts are append-only (meaning fast)
> 
> But since we are already storing all the time travel stuff already in
> the storage pages do we need this to reinstate time travel?

If we want to query for old tuples by wallclock time (which is not stored) and 
not only by transaction-id (which are) we need something to go from wc-time to
tid
and back.

> Also if you reinstate time travel this way it will only work for people using 
> this odbms feature.
> Wouldn't it be better to reinstate the old timetravel so it works for
> everyone?

It would be probably better to do it under another set, probably at dbinit 
(or createdb) time.

so maybe 

set TIME_TRAVEL to 'on';
CREATE DATABASE TIME_TRAVELLERS_DB;

would create a database that can use the time-travel features.

It could of course be included in the db create statement:

CREATE DATABASE TIME_TRAVELLERS_DB WITH TIME_TRAVEL='ON';

BTW, have you considered making OO a per-database feature or at least the 
default being settable when creating the database.

-----------------------
Hannu