database schema quality survey & postgresql pitfalls - Mailing list pgsql-hackers

From Fabien COELHO
Subject database schema quality survey & postgresql pitfalls
Date
Msg-id alpine.DEB.2.02.1201231152010.3160@localhost6.localdomain6
Whole thread Raw
List pgsql-hackers
Dear pgdevs,

I've just completed the final version of a survey of database schema quality in
open source software. The survey covers 512 projects which use MySQL and/or
PostgreSQL for storing their data. Automatic analyses are performed by querying
the information schema. Statistical validations are computed on the results.

Here are some findings of specific interest to the list.


About MySQL vs PostgreSQL usage:
 - MySQL is much more often used than PostgreSQL... not a surprise.
 - Even for projects which seems to support PostgreSQL, this support is often   an afterthough and not necessarily
functional.For instance, you may find   MySQL-specific syntax in the PostgreSQL-specific script, which were clearly
nevertested.
 
 - Projects which use PostgreSQL statistically have a better quality compared   to projects which use MySQL. However,
thedifference is mostly due to   issues with MySQL.
 
 - Projects which use PostgreSQL are more often maintained than projects   with MySQL.


Some features or default behavior of PostgreSQL seem especially error-prone:
 - SERIAL attributes seem to be considered automatically as a primary key,   so that the primary key declaration is
oftenforgotten.
 
   This suggests that:
   * the documentation should insist on the potential issue.
   * a WARNING should be displayed when SERIAL is used without an associated     PRIMARY KEY, or possibly UNIQUE. It
shouldbe very rare to desire a     SERIAL which is not a PK, so this is worth a warning.
 
 - when loading a schema definition from an SQL script, the default behavior of   "psql" is to ignore errors and go
on.
   This lead to projects with failing declarations to be ignored because the   ERROR is lost in the flow of WARNING and
NOTICE.If a table is missing,   the error will be detected because the project is not functional, but if a   constraint
is missing, it will just be lost. This occur in about 10% of   pg projects!
 
   In order to avoid this behavior, one must do a "\set ON_ERROR_STOP 1"   at the beginning of the script. However,
thisis never done. Moreover,   there is no simple way to trigger the safer behavior from the command   line but quite a
long"-v ON_ERROR_STOP=1".
 
   It seems to me that:
   * the documentation should suggest to use an explicit stop on error setting     in every script.
   * a "psql -C foo.sql" (check?) or equivalent short option would help?
   * The current client default verbosity is counter productive for quality.     It should be reduced to WARNING and
above,but should *not* include NOTICE     which add a lot of noise ignored by the user and which hides more
importantmessages. For instance, a PK implies an INDEX, a SERIAL implies     a SEQUENCE, fine, but what is the point of
tellingit over and over?     So I suggest to choose a default "client_min_messages = warning".
 


Finally, some issues where found and already reported some time ago about the
implementation of the information schema by PostgreSQL. For instance,
auto-generated constraint names are not unique as they should be, which
makes having a "standard" information schema a little bit pointless, as
querying it returns wrong results:-( The summary of the answer was "do not use
the information schema", or "give unique names", which does not make much sense
for me who is analysing existing projects, and as most constraint names are
generated automatically by PostgreSQL.


For those interested in more details about the survey, a preprint of the paper
is available here :
    http://www.cri.ensmp.fr/classement/doc/A-488.pdf

And the tool used for analysing the projects is available at :
    http://coelho.net/salix/

Comments are welcome.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Inline Extension
Next
From: Cédric Villemain
Date:
Subject: Re: Inline Extension