Thread: The Contrib Roundup (long)

The Contrib Roundup (long)

From

Josh Berkus

Date:

07 June 2005, 14:52:00

Folks,

I had a lot of time to kill on airplanes recently so I've gone
digging through /contrib in an effort to sort out what's in
there and try to apply some consistent rules to it. Before
people read further, please understand that this is just an
initial discussion on what will and won't be in contrib for
8.1; nobody has made any decisions yet.

What Should Be In Contrib?
-------------------------------
Looking over what's in there most of the reasonable contrib
options fall into 3 groups: extra data types, extra functions
and backend utilities. These all seem reasonable things to put
into contrib, with the addition of other code being
tested for inclusion in the core. These categories also
pretty much cover things that need to be inside the PostgreSQL
source to build.

What Shouldn't Be In Contrib?
-------------------------------
The things I think we should exclude from contrib are rather
more varied. Based on examples:

a) Code with major external dependencies other than a
programming language. Partly this is because this means they
are useful to less users; more importantly, this is because the
external dependencies mean that the release cycle for these
tools is likely to be determined by the external dependency and
not by PostgreSQL's release cycle. Further, the external
dependencies mean that it's less likely that the postgresql
core programmers can maintain them in the event that the
original developer goes away. The Mysql conversion scripts are
a good example of this; I don't believe that my2pg even works
with MySQL 4.

b) Alpha-quality code and unfinished projects. Shipping
something with the PostgreSQL source code implies a certain
level of stability, completeness and quality. We shouldn't be
including scripts which took 2 hours to write and have only
been tested on one platform. This stuff can get developed on
pgFoundry and moved to contrib when it's close to mature.

c) Differently licensed code. I'm not an attorney: I won't
pretend to know which licenses it's legal to bundle in our
tarballs and which are not. But I do know that most users and
redistributors aren't going to grep contrib looking for other
licenses, and putting differently licensed stuff in there is
bad pr at best, and a legal booby trap at worst.
(Particularly, there are 3 contrib modules by Massimo del Zotto,
which are GPL licensed. According to the FSF's licensing admin,
installing any these contrib modules will instantly make that
copy of PostgreSQL GPL.)

d) Application code and example code. Contrib is *not* a good
place for "here's how you do this in an application" kind of
code. It's not visible enough to be documentation, and such
examples aren't generally useful to the majority of users as
code.

Moving to PgFoundry is NOT "Demotion"
----------------------------------------
I know that I'm going to get a lot of resistance for the idea
of moving some project to pgFoundry, because authors feel that
it's a "demotion" for their code not to be shipped with the
PostgreSQL source. However, being on pgFoundry increases the
visibility of your code and allows a wider array of people to
contribute to it -- and even find it. And for items of
particularly broad utility, stuff can always go from pgFoundry
into the core when mature or when utility is demonstrated.

Contrib Subdirectories?
-------------------------------------
I think it would also be helpful to users if we could create
subdirectories to organize contrib into categories. This would
help users and packagers find what they want. These
directories would be:
data_types/
functions/
utilities/
I've noted below which contrib code I think should go in those
subdirs.

Contrib Build Options?
---------------------------
I'll point out that several people (including one of our
RPM builders) spoke up in favor of the idea of adding ./contrib
command line options for individual contrib items. Discussion
was dropped without a decision being reached. That would work
like:
./configure --with-perl --prefix=/usr/pgsql --with-tsearch2
--with-fuzzystrmatch

Documentation
--------------------------
As previously mentioned, all contrib modules need to have
documentation in the main postgreSQL docs. Probably their own
section, called "Optional Modules".

Contrib Item Listing
--------------------------------
What follows is my notes on individual contrib projects. Many
contain questions because I don't know enough about the item.
Please read through them an provide what feedback you can.
Especially, provide feedback on the items I'm suggesting
eliminating or moving out. I've noted the author contact info
where I'm thinking of moving modules, and will be attempting to
contact those authors if we decide to change status.

adddepend: is this still needed, or would a proper
dump-and-reload from 7.2 add the dependancy information anyway?

array: placeholder for old array module; contains only a
readme. Should probably be dropped for 8.2.

btree_gist: data_types/

chkpass: data_types/

cube: README needs documentation on what the module is *for*.

dbmirror: should be on pgfoundry/gborg with other replication
systems. Stephen Singer (ssinger@navtechinc.com)

dbsize: functions/

earthdistance: data_types/

findoidjoins: again, it's not clear what this module is for.
Bruce?

fulltextindex: Obsolesced by Tsearch2. Also rather a
brute-force technique for FTI possibly more useful as an
illustration of advance trigger use than as an index. Move to
pgfoundry or techdocs? Maarten Boekhold
(maartenb@dutepp0.et.tudelft.nl)

fuzzystrmatch: functions/

intagg: what does this module do which is not already available
through the built-in array functions and operators? Maybe I
don't understand what it does. Unnatributed in the README. Move
to pgfoundry?

intarray: data_types/

ipc_check: nice idea, possibly useful but works only on FreeBSD. Needs to be vastly expanded to support multiple
platforms.

Work on replacing with "Configurator" project at pgfoundry.
Author unattributed. Recommend removal.

isbn_issn: more data types. Has anyone tested this one lately?
It appears not to have been modified since 7.2. data_types/

lo: another special data type. Is its functionality required
anymore? It appears to be a workaround to some limitations of
our large object interface which may no longer exist. Author
Peter Mount ( peter@retep.org.uk ) data_types/

ltree: data_types/

msql_interface: does anyone use mSQL anymore? In any case,
conversion and foriegn-database-connection tools definitely
belong on pgFoundry. Author Aldrin Leal (
aldrin@americasnet.com ).

mac: A special purpose script which I doubt works on all
platforms. Belongs on pgFoundry so that maybe someone will
take an interest in expanding it.

misc_utils: I believe that all of these utils are obsolesced by
builtin system commands or easily written userspace functions
(like max(x,y)). Also, is under the GPL (see above). Author
Massimo Dal Zotto (dz@cs.unitn.it)

mysql: these utilities have been moved to project sites (such as
GBorg), and I believe that my2pg is broken with current versions
of MySQL. Can we remove this from contrib?

noupdate: this is a cool example of a simple C trigger and would
be lovely to have in a doc somewhere. However, its
functionality is easily replicated through a simple PL/pgSQL
trigger so it seems unnecessary as a contrib module. Author
unattributed.

oid2name: a useful backend utility which is used by a number of
external tools. What would it take to make this a builtin
binary? utilities/

oracle: again, very useful and I wish to move it to pgFoundry
and take over maintenance of it. Author Gilles Darold
(gilles@darold.net).

pg_autovaccuum: moving into the backend.

pg_buffercache: another useful backend utility. Seems perfect
for contrib. utilities/

pg_dumplo: is this still required for pg large objects? If
so, can't we integrate it into the core? utilities/

pg_trgm: data_types/

pg_upgrade: what's the status of this, Bruce? Does it work at
all? Shouldn't this be moved to the pgfoundry project of the
same name until it's stable?

pgbench: I see repeated complaints on -performance about how
pgbench results are misleading. Why are we shipping it with
PostgreSQL then? Shouldn't this be on pgFoundry, maybe in the
testperf project? Shouldn't all performance tests be on
pgFoundry instead of in the code, unless they're part of
regression tests?

pgcrypto: more for /functions. And a good reason to keep the
main PostgreSQL ftp servers outside the US :-b

pgstattuple: utilities/

reindexdb: now obsolete per the REINDEX {database} command.
Remove from contrib.

rtree_gist: data_types/

seg: data_types/

spi: contains TimeTravel functions. Do these actually still
work? The spi stuff is good for documentation purposes anyway
... but if the functions aren't working, should be in the docs
and not /contrib.

start-scripts: utilities/. Needs to be expanded and
checked against more oses.

string: data_types/ Same problem as Massimo's
other library; it's GPL. Also, is it really needed at this
point? Massimo (dz@cs.unitn.it).

tablefunc: functions/

tips: this is a proto-apache-log-slurping project, in *alpha*.
As such, it really needs to be on pgFoundry. Author Terry
Mackintosh (terry@terrym.com)

tools: Two of these are emacs scripts, and would be better
on pgFoundry if not on Savannah. The find-sources shell
script is again GPL and should probably be removed, and moreover
appears to have nothing to do with PostgreSQL.

tsearch: obsolesced by tsearch2. Should be moved to pgfoundry
where it can be maintained by users needing backwards
compatibility.

userlocks: another GPL script, with the problems that entails.
Also problematic as it relies heavily on per-record OIDs,
something we tell users not to do. Overall, should be removed.
Author: Massimo.

vacuumlo: is this still required? If utilities/.

xml and xml2: both by John Gray (jgray@azuli.co.uk). John, why
do we have two of these? Otherwise, data_types/.
--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco