Re: Extending PostgreSQL - Mailing list pgsql-general
From | Jean-Michel POURE |
---|---|
Subject | Re: Extending PostgreSQL |
Date | |
Msg-id | 200202190726.g1J7QcFP012404@www1.translationforge Whole thread Raw |
In response to | Extending PostgreSQL ("Amir Behgooy" <behgooy@cs.uiuc.edu>) |
Responses |
Re: Extending PostgreSQL
|
List | pgsql-general |
Le Lundi 11 Février 2002 09:30, Amir Behgooy a écrit : > We are trying to design a 2 to 3 week long project for a graduate course in > Database systems at our university. We are supposed to use PostgreSQL and > add some functionality to it or make some appropriate modifications. Do you > have any suggestions on what can be a useful, yet easy to finish in 2-3 > weeks project? Dear Amir, In my humble opinion, there is a strong need for column droppping This can done either with an ALTER TABLE ALTER COLUMN or an enhanced CREATE TABLE AS. 1) Drop column hack A _DROP_COLUMN_HACK__ compilation option exists but is not maintained any more. Many users are asking for this feature in 7.3. Maybe you can just review the existing code and see if it is a workable solution or not. Or go for a complete new implementation which would take more than 3 weeks. Maybe you can team up with some hacker to start the job. 2) CREATE TABLE AS Alternatively you can also extend the existing "CREATE TABLE AS" to copy primary key and linked objects (triggers, rules, indexes, etc...). This would be a first step to implement a pseudo-drop feature. 3) Hackers list If you whish to go for this feature, please contact the hackers list. There was a discussion lately on hackers. It explains the _DROP_COLUMN_HACK__ situation well. Please find a copy hereafter: Best regards, Jean-Michel POURE ************************************************************* [HACKERS] alter table drop column status De : Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> À : pgsql-hackers@postgresql.org Date : Tue, 12 Feb 2002 10:33:58 +0100 (CET) Browsing the archives, I found the latest comment about dropping columns about summer 2000 closing with Hiroshi's unapplied (?) hack. What is the current status of the implementation? Regards, Zoltan ************************************************************* Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes: > Browsing the archives, I found the latest comment about dropping columns > about summer 2000 closing with Hiroshi's unapplied (?) hack. What is the > current status of the implementation? It was applied, and it's in there with #ifdef _DROP_COLUMN_HACK__, but I believe Hiroshi has given up on that approach as unworkable. The #ifdef'd code is still there (in most places anyway) because no one has bothered to rip it out. But I doubt it would work very well if enabled --- the code mods in the last year or so have not taken any notice of _DROP_COLUMN_HACK__. regards, tom lane ************************************************************* No there was an unapplied hack which uses logical/physical attribute numbers. I have synchronized it with cvs for a year or so but stop it now. Though it had some flaws It solved the following TODOs. * Add ALTER TABLE DROP COLUMN feature * ALTER TABLE ADD COLUMN to inherited table put column in wrong place * Prevent column dropping if column is used by foreign key I gave up to apply the hack mainly because it may introduce the maintenance headache. > and it's in there with #ifdef _DROP_COLUMN_HACK__, > but I believe Hiroshi has given up on that approach as unworkable. > > The #ifdef'd code is still there (in most places anyway) because no > one has bothered to rip it out. But I doubt it would work very well > if enabled --- the code mods in the last year or so have not taken > any notice of _DROP_COLUMN_HACK__. The code doesn't work since long. I would remove it after 7.3 tree is branched. regards, Hiroshi Inoue ************************************************************* This seems fantastic - why can't this be committed? Surely if it's committed then the flaws will fairly quickly be ironed out? Even if it has flaws, then if we say 'this function is not yet stable' at least people can start testing it and reporting the problems? > I gave up to apply the hack mainly because it may introduce > the maintenance headache. Is it a maintenance headache just for you to keep it up to date, or how would it be a maintenance headache if it were committed? Chris ************************************************************* Probably(oops I don't remember well now sorry) the main reason why I didn't insist to apply the patch was that it wasn't so clean as I had expected. My trial implementation uses logical(for clients) and physical (for backend internal) attribute numbers but there were many places where I wasn't able to judge which to use immediately. I'm pretty suspicious if a developer could be careful about the choise when he is implementing an irrevant feature. (Un)fortunately the numbers have the same values mostly and he could hardly notice the mistake even if he chose the wrong attribute numbers. I'm not sure if I myself chose the right attribute numbers everywhere in my implementation. In addtion (probably) there were some pretty essential flaws. I intended to manage the backend internal object references without the logical attribute numbers but I found it difficult in some cases (probably the handling of virtual(not existent in any real table) tuples). Sorry it was more than 1 year ago when I implemented it and I can't remember well what I'd thougth then. Though I'd kept my local branch up to date for about a year, it's about half a year since I touched the stuff last. regards, Hiroshi Inoue ************************************************************* I think this was the thing that really scared everyone about the trial implementation: the near-certainty of bugs that might remain unnoticed for a long time. At the last OSDB conference I had an interesting discussion with Ann Harrison about how Interbase (Firebird) deals with this problem. Essentially, they mark every tuple with an identifier for the schema that it follows. Translated to Postgres terms, it'd work like this: 1. Composite types (row types) could exist independently of tables; this is something we've wanted for awhile anyway. A composite type is identified by its OID in pg_type. pg_attribute rows would have to be considered to belong to pg_type entries not pg_class entries. 2. A relation in pg_class has a pointer to its current preferred schema (row type). This link exists already (reltype), but it would no longer be necessarily fixed for the life of the relation. To implement ADD, DROP or ALTER COLUMN, you'd construct a new row type and update pg_class.reltype to point to it. And that's all you'd do --- you'd not touch the stored data. 3. Tuples being inserted/updated would always be coerced to the current preferred schema of the relation. However, old tuples would remain with their original schema, perhaps indefinitely. (Or we could offer a special command to forcibly update all tuples to current schema.) 4. Internally, we'd probably need to create a "row type cache" separate from the existing relcache, so that the attribute structure shown by a given tuple header could be looked up quickly, whether or not it is the current preferred schema of the relation. 5. It'd no longer be possible to identify a particular column solely by column number, since the column number might vary between schemas. Nor would identification by name be reliable (think RENAME COLUMN). I think what we'd have to do is go back to giving OIDs to individual pg_attribute entries ... they wouldn't be true OIDs in the current sense because not unique across all pg_attribute entries, but we could generate them using the OID counter. Perhaps call them serial numbers not OIDs. When constructing a new schema, the serial number would be carried over from each column that is logically the same column as some pre-existing column --- but the physical column numbers might be quite different. Then, initial construction of a query plan would resolve column name to column serial number using the current schema of the relation, and at runtime the serial number would have to be looked up in the actual schema of each tuple. If it's not found, use the default value of the column as shown in the current schema (this supports ADD COLUMN). If it's found but does not have the same datatype as the Var shows that the current schema expects, perform a runtime type coercion (this supports ALTERing a column datatype). The main thing that this supports that Hiroshi's trial implementation didn't is altering column datatype. It'd also considerably simplify processing of inheritance-tree table scans: rather than the current kluge that translates parent to child column numbers, you'd just make sure that a child table is created with column serial numbers matching the parent for its inherited columns. Then the above-described mechanism takes care of finding the child columns for you: essentially, a child-table tuple can be treated just like a tuple that's not of the current schema in the parent table. (I'm not sure if the trial implementation could do that too.) The weakest feature of the whole scheme is the per-tuple runtime lookups implied by points 4 and 5. We could probably avoid any noticeable slowdown in normal cases by caching the results in Var nodes of execution plans, but in cases where a relation has a wild mix of tuples of different vintages a single-entry cache wouldn't help much. Another objection is the need to add an OID field to tuple headers; 4 more bytes per tuple adds up (and on some platforms it'd be 8 bytes due to alignment considerations). Another problem is that the distinction between column positions and column serial numbers has the same kind of potential for confusion as between logical and physical numbers in the trial implementation. It wouldn't be as bad, because the values would be different in most cases. This'd be a sufficiently big change that I'm not at all sure we'd want to do it that way. But I thought I'd sketch out the idea and see if anyone likes it. Tom Lane ************************************************************************************** Well, dropping a column doesn't seem to be a relevant feature. But unfortunately our production system requires updates/upgrades "on the fly", without stopping and dumping out/in the whole database. Currently it's only about 16 megs of data but it's growing... I would be satisfied with a working method for dropping and recreating only one table with a short shutdown (~ a few minutes). The problem for me is that the foreign key constraints of all referencing tables must be recreated and I want to do this automagically. It would be enough for me if I could write a script which does this reasonably fast. I wanted to know if I should wait for the solution of the full ALTER TABLE implementation or not. I'm afraid I shouldn't wait, should I? ;-) -- Kov\'acs, Zolt\'an ************************************************************************************** No there was an unapplied hack which uses logical/physical attribute numbers. I have synchronized it with cvs for a year or so but stop it now. Though it had some flaws It solved the following TODOs. * Add ALTER TABLE DROP COLUMN feature * ALTER TABLE ADD COLUMN to inherited table put column in wrong place * Prevent column dropping if column is used by foreign key I gave up to apply the hack mainly because it may introduce the maintenance headache. > and it's in there with #ifdef _DROP_COLUMN_HACK__, > but I believe Hiroshi has given up on that approach as unworkable. > > The #ifdef'd code is still there (in most places anyway) because no > one has bothered to rip it out. But I doubt it would work very well > if enabled --- the code mods in the last year or so have not taken > any notice of _DROP_COLUMN_HACK__. The code doesn't work since long. I would remove it after 7.3 tree is branched. regards, Hiroshi Inoue
pgsql-general by date: