May I have som syntactic sugar, please ? - Mailing list pgsql-hackers

From Emmanuel Charpentier
Subject May I have som syntactic sugar, please ?
Date
Msg-id b2973c$1jue$1@news.hub.org
Whole thread Raw
List pgsql-hackers
The recent addition of drop column makes possible to "change" column type. 
E. g. : I have written a small utility reading Medline queries results and 
dumping them in a Postgres database for further analysis.

[ Two remarks before starting ; 1) I *know* that I'm reinventing the wheel 
and that you can get those results in a nice XML format ; but the problem 
is that I do not (yet) know XLM, and have no time writing and using an XML 
parser, while I know how to parse a simple line-oriented format ; and 2) 
no, the possible fields are not a known fixed set : Medline has alreadu 
changer it's formats thrice in the las two years, so the reading routines 
have to be dynamic. Alas ... ]

These results are a "field-by-field" enumeration. i.e eachrecord is a set 
of <tag> - <Value> pairs (well, not exactly : some fields have more than 
one line, but I'm scketchy ...). Some fields may be missing, some other may 
be repeated.

The obvious solution is to create a temporary table with a set of keys 
identifying record, tag and file source and a "value" column, which has 
better to be of "text" type.

It is almost trivial to create (programactically) the structure of the main 
table (a record identifier, one or two columns identifying the source, an a 
column for each field present at most once in the DB) and the auxilliary 
tables containing repeated fields.

But this table has initially all values "text", which is a) Horribly 
inefficient, b) hard to use from outer programs (M$ tools will choke on an 
indexed text field) and c) quite inadapted to certain queries (e. g. 
comparisons of dates.

Furthermore, this kind of modifications cannot always be made t read-time. 
It happens that only the *semantics* of the problem at hand allows to 
choose the "right* column type. In my example, the date of last Medline 
indexing is formally a timestamp ; but for all practical purposes, a date 
is more than sufficient.

"drop columns" allows to fix this dynmically, along the line of "alter 
table x add column new_y newtype; update x set new_y=y::newtype, alter 
table x drop column y;  alter table x rename column new_y to y;". This has 
two side effects :

The first one is unimportant (at least from a relational point of view) : 
the order of columns in the table is lost.

The other one is *most* important : any Postgres object using the column 
definition (I'm thinking of indexes, views and rules but there might be 
others) will have to be redefined.

What I'm thinking of would be something along the line of "alter table x 
alter column y retype to newtype", doing essentially the same as above, but 
*keeping the old column id* and *propagating* changes as needed in defined 
view and indices (maybe with a "cascade" option, deleting unupdtable 
objects as needed ?).

Am I dreaming awake ? Or is this implementable ? An sufficiently "generally 
useful" ?

Comments ?
                Emmanuel Charpentier
--
Emmanuel Charpentier



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump is broken by recent privileges changes
Next
From: Kevin Brown
Date:
Subject: Re: Irix 6.2, Postgres 7.3.1, some brokenness