Thread: A post-7.1 wish-list.
Dear list, According to this list's content, the upcoming 7.1 release appears to be in good progress. This version will bring a *lot* on necessary features for some database work : unions and subselects in views, and the long-awaited outer joins. In other words, while 7.0 was a large step in *performance* terms, 7.1 will be a huge advance in *competence*. These advances will allow me to use PostgreSQL for some work I had to do until now with (gasp !) MS-Access (which has poor performance but good competence). And get rid of the damn MS-Windows envoronment for good ! This leads me to express two whishes for future PotgreSQL developments. These ideas are inpired by my daily work and might or might not be of great usefulness for other uses. My daily work (biostatistics) involves managing a lot of small but complex databases : those are mainly medical records, created for a study's purposes, wich have a low volume (a dozen or two of tables having some dozens to some thousands rows) but might have a deeply nested and irregular structure (not all patients recorded need to have records of all histories and procedures involved). As a consequence, I am much more interested in competence than in performance, and so is my wishlist. Keep that in mind when reading what follows. 1) Updatable views. ================== According to the current documentation, views are read.only. This implies some grunt work when creating update forms for the kind of low-use applications I have to manage. I know that computing the "updatability" of a view is not a trivial problem. Furthermore, even when a view is indeed updatable, the update/append algorithm is not easy to compute. These problems are even harder in multi-user mode. And I do not have any idea of the feasibility of such updates in an OO database, where inheritance concerns will interfere. However, such updatable views would greatly simplify the end-user work for creating and maintaining these records (at least when no inheritance is involved. I am not able to state the usefulness of such "updatable views" in more mainstream applications. I note, however, that most "brand-name" RDBMSes ofer that. Your thoughs ? 2) External database or table access. ==================================== Quite often, two or more distinct applications have to use common data. My favourite example is again medical : two othewise unrelated applications might have to use a common medical thesaurus. The obvious solution (including the medical thesaurus tables in each and every application) leads to awful consistency problems. Working this way can be properly done only with replication, which is not yet available in PostgreSQL. Furthermore, most applications will use only one or two views of the thesaurus, while the thesaurus might be both large and complex. Another "obvious solution" (delegating the use of the thesaurus to the client application) is also a non-solution : how do you join your data and the thesaurus data ? The ability to "attach" (MS-Access parlance) a table or a view from another database is quite helpful. And I think that it has a lot of applications outside my (quite limited) realm. For example, two different departments of the same company might have needs for two different management applications, while having to use/update the same company-wide accounting records. I don't se the "replication" solution as a good one (data duplication should be considered harmful in any circumstances). This could be implemented in different ways. From the easiest to the hardest : - Attachment of Postgres databases running on the same server : relatively easy. The data structures (internal representation) are known, there is a guarantee of consistency in user identification, security information is also consistent. - Attachment of Postgres databases running on another server. There, while the data structures are known, the user and security informations can be inconsistent and have to be managed "by hand". - Attachment of other databases. Ouch : this one is hard. One have to rely on the information made available by the other database server. And there lies a problem : there is no universal standard for this. ... or there is ? Two bridges come to mind. Using ODBC or JDBC, provided the "other" RDBMS has that, allows to use some standard information : at the very minimum, table names, attribute names and type, and updatability. In most cases, you will also be able to know whether indices are available for such and such columns. This minimal set of information allows you to use these external tables in your own joins. And, provided that you have update rights, the ability to use them as "native" tables. Of course, the use of these bridges involve some (maybe quite serious) performance loss. But then again, I'm less interested in performance than in competence ... What do you think ? Emmanuel Charpentier -- Emmanuel Charpentier
Emmanuel Charpentier writes: > 1) Updatable views. You can make rules updateable by attaching appropriate rules to them. The transparent implementation of updateable views would essentially do that. It's a planned feature but I don't know of anyone who has made it his priority. > 2) External database or table access. Accessing more than one database from a connection will not happen, at least as long has the current mindset of the developers persists. SQL schema support is planned for 7.2, which will address the same area, however. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
On Sunday 07 January 2001 21:31, Emmanuel Charpentier wrote: > These advances will allow me to use PostgreSQL for some work I had to do > until now with (gasp !) MS-Access (which has poor performance but good > competence). And get rid of the damn MS-Windows envoronment for good ! Excuse me, but where has MS Access competence? It is a pretty useless data lottery with an admittedly very capable & easy user interface. The odds of data corruption can't possibly be higher with any other system (I used to develop hospital information systems prototyping with Access). > My daily work (biostatistics) involves managing a lot of small but > complex databases : those are mainly medical records, created for a > study's purposes, wich have a low volume (a dozen or two of tables > having some dozens to some thousands rows) but might have a deeply > nested and irregular structure (not all patients recorded need to have :-) We are developing gnumed, a secure "electronic health record" system, and believe now that Postgres is the ideal candidate (working on it for 10 months now). The only thing we would like to see is more & better documentation for programmers (the documentation for end users is quite good). > According to the current documentation, views are read.only. This > implies some grunt work when creating update forms for the kind of > low-use applications I have to manage. You can implement it yourself through trigger functions, no big deal. But, typical applications nowadays interact with the user through dialogues with text entry fields. Your client has to process this information anyway. Doesn't make much of a difference then to split the information to the tables it belongs to instead of just putting it into your view. > Quite often, two or more distinct applications have to use common data. > My favourite example is again medical : two othewise unrelated > applications might have to use a common medical thesaurus. > > The obvious solution (including the medical thesaurus tables in each and > every application) leads to awful consistency problems. Working this way > can be properly done only with replication, which is not yet available ??? why? You are working with a client-server system, and you can have any number of databases on a postgres server. No need for replication for this purpose. > in PostgreSQL. Furthermore, most applications will use only one or two > views of the thesaurus, while the thesaurus might be both large and > complex. That does not matter for the application (client). It will be blissfully unaware of the size or complexity of your thesaurus. > Another "obvious solution" (delegating the use of the thesaurus to the > client application) is also a non-solution : how do you join your data > and the thesaurus data ? The usual way as in any relational data base: by referencing the information. > The ability to "attach" (MS-Access parlance) a table or a view from > another database is quite helpful. And I think that it has a lot of > applications outside my (quite limited) realm. It is quite disastrous for referential integrity. -- ============================================== Dr. Horst Herb, coordinator gnumed project, Australia ==============================================
Horst Herb wrote: > > On Sunday 07 January 2001 21:31, Emmanuel Charpentier wrote: [ ... ] > Excuse me, but where has MS Access competence? It is a pretty useless data > lottery with an admittedly very capable & easy user interface. The odds of > data corruption can't possibly be higher with any other system (I used to > develop hospital information systems prototyping with Access). Competence : in this context, the ability to define a correct data structure and the views used tu access and update it. The "easy to use" interface is a big incentive to end users to define things properly, instead of relying to the horrible "Excel sheet with forms" I used to have to cope with. In a heavy multi-user environment, MS Access might be a "data lottery". I have no direct experience of this. In my low-use environment, I had no real reason to complain about data corruption : my problems were more bound to the lack of protection of the MS Windows environment, and were solved my migrating data files on a proper server. Performance, however, can be poor. I have in mind some examples involving Cartesian products or remerging that were hell to compute :=(( ... [ ... ] > > According to the current documentation, views are read.only. This > > implies some grunt work when creating update forms for the kind of > > low-use applications I have to manage. > > You can implement it yourself through trigger functions, no big deal. Grunt work, as I said. And we are awfully short on time for doing this. > But, > typical applications nowadays interact with the user through dialogues with > text entry fields. Your client has to process this information anyway. Just a minimum, none if possible ... Our processing is made mostly *post* entry. > Doesn't make much of a difference then to split the information to the tables > it belongs to instead of just putting it into your view. "Just putting it into my view" is a hell of a lot faster and a hell of a lot less work. And I (and my assistants) are paid to work on our data, not to write programs to access them ... Data entry and maintainance are an ancilliary task. > > Quite often, two or more distinct applications have to use common data. > > My favourite example is again medical : two othewise unrelated > > applications might have to use a common medical thesaurus. > > > > The obvious solution (including the medical thesaurus tables in each and > > every application) leads to awful consistency problems. Working this way > > can be properly done only with replication, which is not yet available > > ??? why? You are working with a client-server system, and you can have any > number of databases on a postgres server. Nope : the thesaurus data might not be ours, and live on a server we can *read*, not *write to*, and certainly not program to our heart's content, and definitively *not* with PostgreSQL. Do you really think I can request our financial department to throw off their beloved Oracle databases, end result of tens of years of work, and switch to our preffered PostgreSQL server ? And do you think I could put sensitive medical information on a server accessible to people not cleared to view any medical data ? If so, you'd better think again before proposing gnumed ... For reasons that should be obvious, our data live in a server accessible to a small number of microcomputers in our department. And there is a *lot* of good and bad reasons for which they will stay here. For example, our database authority (the french CNIL) would have a seizure if those data were to be migrated on a non-restricted server. Furthermore, we might have to use *several* unrelated database not belonging to us ... The "one server serves all client" is totally unapplicable to our problems .. I'm looking for a practical solution, not for an excuse to an administrative war, that I'd loose, btw ... > No need for replication for this > purpose. > > in PostgreSQL. Furthermore, most applications will use only one or two > > views of the thesaurus, while the thesaurus might be both large and > > complex. > > That does not matter for the application (client). It will be blissfully > unaware of the size or complexity of your thesaurus. A microcomputer client having to manage the huge size of the original database would be painfully aware of this ... and the users too ... That non-solution, therefore, would imply migrating our medical data to exactly one central server, which is, again, unacceptable. > > Another "obvious solution" (delegating the use of the thesaurus to the > > client application) is also a non-solution : how do you join your data > > and the thesaurus data ? > > The usual way as in any relational data base: by referencing the information. That makes the client application a relational RDBMS with capability to access more than one database at a time. This was MS Access for us, up until now. And I would *love* to get rid of it ... > > The ability to "attach" (MS-Access parlance) a table or a view from > > another database is quite helpful. And I think that it has a lot of > > applications outside my (quite limited) realm. > > It is quite disastrous for referential integrity. Would you please amplify ? You might have a point, but I do not (yet) see it. -- Emmanuel Charpentier
I would like to inquire if in the next release of postgresql the database will have to be compacted into a single file like what Interbase database supports? I find this feature convenient because it will simplify the updating of your database considering that you will be dealing only with one single file. Thanks and MORE POWER TO EVERYONE! -- Manny C. Cabido ==================================== e-mail:manny@tinago.msuiit.edu.ph manny@sun.msuiit.edu.ph =====================================
Peter Eisentraut wrote: > > Emmanuel Charpentier writes: > > > 1) Updatable views. > > You can make rules updateable by attaching appropriate rules to them. > The transparent implementation of updateable views would essentially do > that. It's a planned feature but I don't know of anyone who has made it > his priority. I'd try my hand at it, given enough time ... which I'm awfully short on. Which is one of my reasons to have that feature : not having to do grunt work. > > 2) External database or table access. > > Accessing more than one database from a connection will not happen, at > least as long has the current mindset of the developers persists. SQL > schema support is planned for 7.2, which will address the same area, > however. I fail to see how schema support would help me accessing data residing on different servers (that cannot be moved, for a lot of good and bad reasons). Could you please amplify ? -- Emmanuel Charpentier
> I'd try my hand at it, given enough time ... which I'm awfully short on. > Which is one of my reasons to have that feature : not having to do grunt > work. It should only take a few seconds to write such rules for simple views -- see the examples in the Programmer documentation. /programmer/rules1139.htm#AEN1227 - Andrew
On Mon, 8 Jan 2001, Manuel Cabido wrote: > I would like to inquire if in the next release of postgresql the database > will have to be compacted into a single file like what Interbase > database supports? I find this feature convenient because it will simplify > the updating of your database considering that you will be dealing only > with one single file. I don't understand how a single file is easier to update. What kind of updates are you talking about? A single file archicture requires that the database system have a storage manager and a pseudo filesystem. It also becomes another thing to configure and monitor. Ask anyone who has delt with Oracle's tablespace and extent management on how complicated this can be. Tom
> > > Another "obvious solution" (delegating the use of the thesaurus to the > > > client application) is also a non-solution : how do you join your data > > > and the thesaurus data ? > > > > The usual way as in any relational data base: by referencing the information. > > That makes the client application a relational RDBMS with capability to > access more than one database at a time. This was MS Access for us, up > until now. And I would *love* to get rid of it ... > > > > The ability to "attach" (MS-Access parlance) a table or a view from > > > another database is quite helpful. And I think that it has a lot of > > > applications outside my (quite limited) realm. > > > > It is quite disastrous for referential integrity. > > Would you please amplify ? You might have a point, but I do not (yet) > see it. > > -- > Emmanuel Charpentier definitely isolating different databases is a non-practical idea, how can you image things will be changed in future? do you believe that you'll never reference data in other databases? it sound likes that you are still in 80's, in that time, PC are mostly not connected together via Network, when we need migrate data to another PC, we should use floppy :(, this time I should use the barbarism method again --- dump table to a file then load the file to another database even they are on same server! maybe I should write a stupid program to do task periodically. Regards, XuYifeng