Thread: A post-7.1 wish-list.

A post-7.1 wish-list.

From
Emmanuel Charpentier
Date:
Dear list,

According to this list's content, the upcoming 7.1 release appears to be
in good progress. This version will bring a *lot* on necessary features
for some database work : unions and subselects in views, and the
long-awaited outer joins. In other words, while 7.0 was a large step in
*performance* terms, 7.1 will be a huge advance in *competence*.

These advances will allow me to use PostgreSQL for some work I had to do
until now with (gasp !) MS-Access (which has poor performance but good
competence). And get rid of the damn MS-Windows envoronment for good !

This leads me to express two whishes for future PotgreSQL developments.
These ideas are inpired by my daily work and might or might not be of
great usefulness for other uses.

My daily work (biostatistics) involves managing a lot of small but
complex databases : those are mainly medical records, created for a
study's purposes, wich have a low volume (a dozen or two of tables
having some dozens to some thousands rows) but might have a deeply
nested and irregular structure (not all patients recorded need to have
records of all histories and procedures involved). As a consequence, I
am much more interested in competence than in performance, and so is my
wishlist. Keep that in mind when reading what follows.

1) Updatable views.
==================

According to the current documentation, views are read.only. This
implies some grunt work when creating update forms for the kind of
low-use applications I have to manage.

I know that computing the "updatability" of a view is not a trivial
problem. Furthermore, even when a view is indeed updatable, the
update/append algorithm is not easy to compute. These problems are even
harder in multi-user mode. And I do not have any idea of the feasibility
of such updates in an OO database, where inheritance concerns will
interfere.

However, such updatable views would greatly simplify the end-user work
for creating and maintaining these records (at least when no inheritance
is involved.

I am not able to state the usefulness of such "updatable views" in more
mainstream applications. I note, however, that most "brand-name" RDBMSes
ofer that.

Your thoughs ?

2) External database or table access.
====================================

Quite often, two or more distinct applications have to use common data.
My favourite example is again medical : two othewise unrelated
applications might have to use a common medical thesaurus.

The obvious solution (including the medical thesaurus tables in each and
every application) leads to awful consistency problems. Working this way
can be properly done only with replication, which is not yet available
in PostgreSQL. Furthermore, most applications will use only one or two
views of the thesaurus, while the thesaurus might be both large and
complex.

Another "obvious solution" (delegating the use of the thesaurus to the
client application) is also a non-solution : how do you join your data
and the thesaurus data ?

The ability to "attach" (MS-Access parlance) a table or a view from
another database is quite helpful. And I think that it has a lot of
applications outside my (quite limited) realm.

For example, two different departments of the same company might have
needs for two different management applications, while having to
use/update the same company-wide accounting records. I don't se the
"replication" solution as a good one (data duplication should be
considered harmful in any circumstances).

This could be implemented in different ways. From the easiest to the
hardest :

- Attachment of Postgres databases running on the same server :
relatively easy. The data structures (internal representation) are
known, there is a guarantee of consistency in user identification,
security information is also consistent.

- Attachment of Postgres databases running on another server. There,
while the data structures are known, the user and security informations
can be inconsistent and have to be managed "by hand".

- Attachment of other databases. Ouch : this one is hard. One have to
rely on the information made available by the other database server. And
there lies a problem : there is no universal standard for this.

... or there is ? Two bridges come to mind. Using ODBC or JDBC, provided
the "other" RDBMS has that, allows to use some standard information : at
the very minimum, table names, attribute names and type, and
updatability. In most cases, you will also be able to know whether
indices are available for such and such columns.

This minimal set of information allows you to use these external tables
in your own joins. And, provided that you have update rights, the
ability to use them as "native" tables.

Of course, the use of these bridges involve some (maybe quite serious)
performance loss. But then again, I'm less interested in performance
than in competence ...

What do you think ?
                Emmanuel Charpentier
--
Emmanuel Charpentier


Re: A post-7.1 wish-list.

From
Peter Eisentraut
Date:
Emmanuel Charpentier writes:

> 1) Updatable views.

You can make rules updateable by attaching appropriate rules to them.
The transparent implementation of updateable views would essentially do
that.  It's a planned feature but I don't know of anyone who has made it
his priority.

> 2) External database or table access.

Accessing more than one database from a connection will not happen, at
least as long has the current mindset of the developers persists.  SQL
schema support is planned for 7.2, which will address the same area,
however.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: A post-7.1 wish-list.

From
Horst Herb
Date:
On Sunday 07 January 2001 21:31, Emmanuel Charpentier wrote:

> These advances will allow me to use PostgreSQL for some work I had to do
> until now with (gasp !) MS-Access (which has poor performance but good
> competence). And get rid of the damn MS-Windows envoronment for good !

Excuse me, but where has MS Access competence? It is a pretty useless data 
lottery with an admittedly very capable & easy user interface. The odds of 
data corruption can't possibly be higher with any other system (I used to 
develop hospital information systems prototyping with Access).

> My daily work (biostatistics) involves managing a lot of small but
> complex databases : those are mainly medical records, created for a
> study's purposes, wich have a low volume (a dozen or two of tables
> having some dozens to some thousands rows) but might have a deeply
> nested and irregular structure (not all patients recorded need to have

:-) We are developing gnumed, a secure "electronic health record" system, and 
believe now that Postgres is the ideal candidate (working on it for 10 months 
now). The only thing we would like to see is more & better documentation for 
programmers (the documentation for end users is quite good).

> According to the current documentation, views are read.only. This
> implies some grunt work when creating update forms for the kind of
> low-use applications I have to manage.

You can implement it yourself through trigger functions, no big deal. But, 
typical applications nowadays interact with the user through dialogues with 
text entry fields. Your client has to process this information anyway. 
Doesn't make much of a difference then to split the information to the tables 
it belongs to instead of just putting it into your view.

> Quite often, two or more distinct applications have to use common data.
> My favourite example is again medical : two othewise unrelated
> applications might have to use a common medical thesaurus.
>
> The obvious solution (including the medical thesaurus tables in each and
> every application) leads to awful consistency problems. Working this way
> can be properly done only with replication, which is not yet available

??? why? You are working with a client-server system, and you can have any 
number of databases on a postgres server. No need for replication for this 
purpose. 

> in PostgreSQL. Furthermore, most applications will use only one or two
> views of the thesaurus, while the thesaurus might be both large and
> complex.

That does not matter for the application (client). It will be blissfully 
unaware of the size or complexity of your thesaurus.

> Another "obvious solution" (delegating the use of the thesaurus to the
> client application) is also a non-solution : how do you join your data
> and the thesaurus data ?

The usual way as in any relational data base: by referencing the information.

> The ability to "attach" (MS-Access parlance) a table or a view from
> another database is quite helpful. And I think that it has a lot of
> applications outside my (quite limited) realm.

It is quite disastrous for referential integrity.


-- 
============================================== Dr. Horst Herb, coordinator gnumed project, Australia
==============================================


Re: A post-7.1 wish-list.

From
Emmanuel Charpentier
Date:
Horst Herb wrote:
> 
> On Sunday 07 January 2001 21:31, Emmanuel Charpentier wrote:

[ ... ]

> Excuse me, but where has MS Access competence? It is a pretty useless data
> lottery with an admittedly very capable & easy user interface. The odds of
> data corruption can't possibly be higher with any other system (I used to
> develop hospital information systems prototyping with Access).

Competence : in this context, the ability to define a correct data
structure and the views used tu access and update it. The "easy to use"
interface is a big incentive to end users to define things properly,
instead of relying to the horrible "Excel sheet with forms" I used to
have to cope with.

In a heavy multi-user environment, MS Access might be a "data lottery".
I have no direct experience of this. In my low-use environment, I had no
real reason to complain about data corruption : my problems were more
bound to the lack of protection of the MS Windows environment, and were
solved my migrating data files on a proper server. Performance, however,
can be poor. I have in mind some examples involving Cartesian products
or remerging that were hell to compute :=(( ...

[ ... ]

> > According to the current documentation, views are read.only. This
> > implies some grunt work when creating update forms for the kind of
> > low-use applications I have to manage.
> 
> You can implement it yourself through trigger functions, no big deal. 

Grunt work, as I said. And we are awfully short on time for doing this.

>                                                                     But,
> typical applications nowadays interact with the user through dialogues with
> text entry fields. Your client has to process this information anyway.

Just a minimum, none if possible ... Our processing is made mostly
*post* entry.

> Doesn't make much of a difference then to split the information to the tables
> it belongs to instead of just putting it into your view.

"Just putting it into my view" is a hell of a lot faster and a hell of a
lot less work. And I (and my assistants) are paid to work on our data,
not to write programs to access them ... Data entry and maintainance are
an ancilliary task.

> > Quite often, two or more distinct applications have to use common data.
> > My favourite example is again medical : two othewise unrelated
> > applications might have to use a common medical thesaurus.
> >
> > The obvious solution (including the medical thesaurus tables in each and
> > every application) leads to awful consistency problems. Working this way
> > can be properly done only with replication, which is not yet available
> 
> ??? why? You are working with a client-server system, and you can have any
> number of databases on a postgres server.

Nope : the thesaurus data might not be ours, and live on a server we can
*read*, not *write to*, and certainly not program to our heart's
content, and definitively *not* with PostgreSQL.

Do you really think I can request our financial department to throw off
their beloved Oracle databases, end result of tens of years of work, and
switch to our preffered PostgreSQL server ? And do you think I could put
sensitive medical information on a server accessible to people not
cleared to view any medical data ? If so, you'd better think again
before proposing gnumed ...

For reasons that should be obvious, our data live in a server accessible
to a small number of microcomputers in our department. And there is a
*lot* of good and bad reasons for which they will stay here. For
example, our database authority (the french CNIL) would have a seizure
if those data were to be migrated on a non-restricted server.

Furthermore, we might have to use *several* unrelated database not
belonging to us ... 

The "one server serves all client" is totally unapplicable to our
problems ..

I'm looking for a practical solution, not for an excuse to an
administrative war, that I'd loose, btw ...

>                                           No need for replication for this
> purpose.


> > in PostgreSQL. Furthermore, most applications will use only one or two
> > views of the thesaurus, while the thesaurus might be both large and
> > complex.
> 
> That does not matter for the application (client). It will be blissfully
> unaware of the size or complexity of your thesaurus.

A microcomputer client having to manage the huge size of the original
database would be painfully aware of this ... and the users too ... That
non-solution, therefore, would imply migrating our medical data to
exactly one central server, which is, again, unacceptable.

> > Another "obvious solution" (delegating the use of the thesaurus to the
> > client application) is also a non-solution : how do you join your data
> > and the thesaurus data ?
> 
> The usual way as in any relational data base: by referencing the information.

That makes the client application a relational RDBMS with capability to
access more than one database at a time. This was MS Access for us, up
until now. And I would *love* to get rid of it ...

> > The ability to "attach" (MS-Access parlance) a table or a view from
> > another database is quite helpful. And I think that it has a lot of
> > applications outside my (quite limited) realm.
> 
> It is quite disastrous for referential integrity.

Would you please amplify ? You might have a point, but I do not (yet)
see it.

--
Emmanuel Charpentier


Re: A post-7.1 wish-list.

From
Manuel Cabido
Date:
I would like to inquire if in the next release of postgresql the database
will have to be compacted into a single file like what Interbase
database supports? I find this feature convenient because it will simplify
the updating of your database considering that you will be dealing only
with one single file.

Thanks and MORE POWER TO EVERYONE! 

--                              Manny C. Cabido                             ====================================
                    e-mail:manny@tinago.msuiit.edu.ph                                    manny@sun.msuiit.edu.ph
                    =====================================
 



Re: A post-7.1 wish-list.

From
Emmanuel Charpentier
Date:
Peter Eisentraut wrote:
> 
> Emmanuel Charpentier writes:
> 
> > 1) Updatable views.
> 
> You can make rules updateable by attaching appropriate rules to them.
> The transparent implementation of updateable views would essentially do
> that.  It's a planned feature but I don't know of anyone who has made it
> his priority.

I'd try my hand at it, given enough time ... which I'm awfully short on.
Which is one of my reasons to have that feature : not having to do grunt
work.

> > 2) External database or table access.
> 
> Accessing more than one database from a connection will not happen, at
> least as long has the current mindset of the developers persists.  SQL
> schema support is planned for 7.2, which will address the same area,
> however.

I fail to see how schema support would help me accessing data residing
on different servers (that cannot be moved, for a lot of good and bad
reasons). Could you please amplify ?

--
Emmanuel Charpentier


RE: A post-7.1 wish-list.

From
"Andrew Snow"
Date:
> I'd try my hand at it, given enough time ... which I'm awfully short on.
> Which is one of my reasons to have that feature : not having to do grunt
> work.

It should only take a few seconds to write such rules for simple views --
see the examples in the Programmer documentation.
/programmer/rules1139.htm#AEN1227


- Andrew



Re: A post-7.1 wish-list.

From
Tom Samplonius
Date:
On Mon, 8 Jan 2001, Manuel Cabido wrote:

> I would like to inquire if in the next release of postgresql the database
> will have to be compacted into a single file like what Interbase
> database supports? I find this feature convenient because it will simplify
> the updating of your database considering that you will be dealing only
> with one single file.
 I don't understand how a single file is easier to update.  What kind of
updates are you talking about?
 A single file archicture requires that the database system have a
storage manager and a pseudo filesystem.  It also becomes another thing to
configure and monitor.  Ask anyone who has delt with Oracle's tablespace
and extent management on how complicated this can be.

Tom



Re: A post-7.1 wish-list.

From
"XuYifeng"
Date:
> > > Another "obvious solution" (delegating the use of the thesaurus to the
> > > client application) is also a non-solution : how do you join your data
> > > and the thesaurus data ?
> > 
> > The usual way as in any relational data base: by referencing the information.
> 
> That makes the client application a relational RDBMS with capability to
> access more than one database at a time. This was MS Access for us, up
> until now. And I would *love* to get rid of it ...
> 
> > > The ability to "attach" (MS-Access parlance) a table or a view from
> > > another database is quite helpful. And I think that it has a lot of
> > > applications outside my (quite limited) realm.
> > 
> > It is quite disastrous for referential integrity.
> 
> Would you please amplify ? You might have a point, but I do not (yet)
> see it.
> 
> --
> Emmanuel Charpentier


definitely isolating different  databases is a non-practical idea,  how can you image things will be changed 
in future? do you believe that you'll never reference data in other databases?  it sound likes that you are 
still in 80's, in that time, PC are mostly not connected together via Network, when we need migrate data to 
another PC,  we should use floppy :(,   this time I should use the barbarism method again --- dump table to a 
file then load the file to another database even they are on same server! maybe I should write a stupid program to
do task periodically.

Regards,
XuYifeng