Thread: alter table drop column status

alter table drop column status

From

Kovacs Zoltan

Date:

12 February 2002, 04:52:48

Browsing the archives, I found the latest comment about dropping columns
about summer 2000 closing with Hiroshi's unapplied (?) hack. What is the
current status of the implementation?

Regards, Zoltan

--                         Kov\'acs, Zolt\'an                        kovacsz@pc10.radnoti-szeged.sulinet.hu
          http://www.math.u-szeged.hu/~kovzol                        ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz

Re: alter table drop column status

From

Tom Lane

Date:

12 February 2002, 10:24:16

Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:
> Browsing the archives, I found the latest comment about dropping columns
> about summer 2000 closing with Hiroshi's unapplied (?) hack. What is the
> current status of the implementation?

It was applied, and it's in there with #ifdef _DROP_COLUMN_HACK__,
but I believe Hiroshi has given up on that approach as unworkable.

The #ifdef'd code is still there (in most places anyway) because no
one has bothered to rip it out.  But I doubt it would work very well
if enabled --- the code mods in the last year or so have not taken
any notice of _DROP_COLUMN_HACK__.
        regards, tom lane

Re: alter table drop column status

From

"Hiroshi Inoue"

Date:

12 February 2002, 12:11:58

> -----Original Message-----
> From: Tom Lane
>
> Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:
> > Browsing the archives, I found the latest comment about dropping columns
> > about summer 2000 closing with Hiroshi's unapplied (?) hack. What is the
> > current status of the implementation?
>
> It was applied,

No there was an unapplied hack which uses logical/physical
attribute numbers. I have synchronized it with cvs for a
year or so but stop it now. Though it had some flaws It
solved the following TODOs.

* Add ALTER TABLE DROP COLUMN feature
* ALTER TABLE ADD COLUMN to inherited table put column in wrong place
* Prevent column dropping if column is used by foreign key

I gave up to apply the hack mainly because it may introduce
the maintenance headache.

> and it's in there with #ifdef _DROP_COLUMN_HACK__,
> but I believe Hiroshi has given up on that approach as unworkable.
>
> The #ifdef'd code is still there (in most places anyway) because no
> one has bothered to rip it out.  But I doubt it would work very well
> if enabled --- the code mods in the last year or so have not taken
> any notice of _DROP_COLUMN_HACK__.

The code doesn't work since long. I would remove it after 7.3 tree
is branched.

regards,
Hiroshi Inoue

Re: alter table drop column status

From

"Christopher Kings-Lynne"

Date:

13 February 2002, 00:21:22

> No there was an unapplied hack which uses logical/physical
> attribute numbers. I have synchronized it with cvs for a
> year or so but stop it now. Though it had some flaws It
> solved the following TODOs.
>
> * Add ALTER TABLE DROP COLUMN feature
> * ALTER TABLE ADD COLUMN to inherited table put column in wrong place
> * Prevent column dropping if column is used by foreign key

This seems fantastic - why can't this be committed?  Surely if it's
committed then the flaws will fairly quickly be ironed out?  Even if it has
flaws, then if we say 'this function is not yet stable' at least people can
start testing it and reporting the problems?

> I gave up to apply the hack mainly because it may introduce
> the maintenance headache.

Is it a maintenance headache just for you to keep it up to date, or how
would it be a maintenance headache if it were committed?

Chris

Re: alter table drop column status

From

Jean-Michel POURE

Date:

13 February 2002, 03:21:37

Le Mercredi 13 Février 2002 06:14, Christopher Kings-Lynne a écrit :
> This seems fantastic - why can't this be committed?  Surely if it's
> committed then the flaws will fairly quickly be ironed out?  Even if it has
> flaws, then if we say 'this function is not yet stable' at least people can
> start testing it and reporting the problems?

+1. What are the reasons why this hack was not applied?

Re: alter table drop column status

From

Dave Page

Date:

13 February 2002, 03:42:26

> -----Original Message-----
> From: Jean-Michel POURE [mailto:jm.poure@freesurf.fr] 
> Sent: 13 February 2002 08:10
> To: Christopher Kings-Lynne; Hiroshi Inoue; Tom Lane; Kovacs Zoltan
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] alter table drop column status
> 
> 
> Le Mercredi 13 Février 2002 06:14, Christopher Kings-Lynne a écrit :
> > This seems fantastic - why can't this be committed?  Surely if it's 
> > committed then the flaws will fairly quickly be ironed out? 
>  Even if 
> > it has flaws, then if we say 'this function is not yet stable' at 
> > least people can start testing it and reporting the problems?
> 
> +1. What are the reasons why this hack was not applied?

See /doc/TODO.detail/drop in the source tree. That pretty much explains it.

Regards, Dave.

Re: alter table drop column status

From

Hiroshi Inoue

Date:

13 February 2002, 03:51:34

Christopher Kings-Lynne wrote:
> 
> > No there was an unapplied hack which uses logical/physical
> > attribute numbers. I have synchronized it with cvs for a
> > year or so but stop it now. Though it had some flaws It
> > solved the following TODOs.
> >
> > * Add ALTER TABLE DROP COLUMN feature
> > * ALTER TABLE ADD COLUMN to inherited table put column in wrong place
> > * Prevent column dropping if column is used by foreign key
> 
> This seems fantastic - why can't this be committed?  Surely if it's
> committed then the flaws will fairly quickly be ironed out?  Even if it has
> flaws, then if we say 'this function is not yet stable' at least people can
> start testing it and reporting the problems?
> 
> > I gave up to apply the hack mainly because it may introduce
> > the maintenance headache.
> 
> Is it a maintenance headache just for you to keep it up to date, or how
> would it be a maintenance headache if it were committed?

Probably(oops I don't remember well now sorry) the main
reason why I didn't insist to apply the patch was that
it wasn't so clean as I had expected.
My trial implementation uses logical(for clients) and
physical (for backend internal) attribute numbers but
there were many places where I wasn't able to judge which
to use immediately. I'm pretty suspicious if a developer
could be careful about the choise when he is implementing
an irrevant feature. (Un)fortunately the numbers have
the same values mostly and he could hardly notice the
mistake even if he chose the wrong attribute numbers.
I'm not sure if I myself chose the right attribute numbers 
everywhere in my implementation.
In addtion (probably) there were some pretty essential
flaws. I intended to manage the backend internal
object references without the logical attribute
numbers but I found it difficult in some cases
(probably the handling of virtual(not existent 
in any real table) tuples).

Sorry it was more than 1 year ago when I implemented
it and I can't remember well what I'd thougth then.
Though I'd kept my local branch up to date for
about a year, it's about half a year since I touched
the stuff last. 

regards,
Hiroshi Inoue

Re: alter table drop column status

From

Tom Lane

Date:

13 February 2002, 11:13:43

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> My trial implementation uses logical(for clients) and
> physical (for backend internal) attribute numbers but
> there were many places where I wasn't able to judge which
> to use immediately. I'm pretty suspicious if a developer
> could be careful about the choise when he is implementing
> an irrevant feature. (Un)fortunately the numbers have
> the same values mostly and he could hardly notice the
> mistake even if he chose the wrong attribute numbers.

I think this was the thing that really scared everyone about the trial
implementation: the near-certainty of bugs that might remain unnoticed
for a long time.

At the last OSDB conference I had an interesting discussion with
Ann Harrison about how Interbase (Firebird) deals with this problem.
Essentially, they mark every tuple with an identifier for the schema
that it follows.  Translated to Postgres terms, it'd work like this:

1. Composite types (row types) could exist independently of tables;
this is something we've wanted for awhile anyway.  A composite type
is identified by its OID in pg_type.  pg_attribute rows would have
to be considered to belong to pg_type entries not pg_class entries.

2. A relation in pg_class has a pointer to its current preferred schema
(row type).  This link exists already (reltype), but it would no longer
be necessarily fixed for the life of the relation.  To implement ADD,
DROP or ALTER COLUMN, you'd construct a new row type and update
pg_class.reltype to point to it.  And that's all you'd do --- you'd not
touch the stored data.

3. Tuples being inserted/updated would always be coerced to the current
preferred schema of the relation.  However, old tuples would remain
with their original schema, perhaps indefinitely.  (Or we could offer
a special command to forcibly update all tuples to current schema.)

4. Internally, we'd probably need to create a "row type cache" separate
from the existing relcache, so that the attribute structure shown by a
given tuple header could be looked up quickly, whether or not it is the
current preferred schema of the relation.

5. It'd no longer be possible to identify a particular column solely
by column number, since the column number might vary between schemas.
Nor would identification by name be reliable (think RENAME COLUMN).
I think what we'd have to do is go back to giving OIDs to individual
pg_attribute entries ... they wouldn't be true OIDs in the current sense
because not unique across all pg_attribute entries, but we could
generate them using the OID counter.  Perhaps call them serial numbers
not OIDs.  When constructing a new schema, the serial number would be
carried over from each column that is logically the same column as some
pre-existing column --- but the physical column numbers might be quite
different.  Then, initial construction of a query plan would resolve
column name to column serial number using the current schema of the
relation, and at runtime the serial number would have to be looked up
in the actual schema of each tuple.  If it's not found, use the default
value of the column as shown in the current schema (this supports ADD
COLUMN).  If it's found but does not have the same datatype as the Var
shows that the current schema expects, perform a runtime type coercion
(this supports ALTERing a column datatype).

The main thing that this supports that Hiroshi's trial implementation
didn't is altering column datatype.

It'd also considerably simplify processing of inheritance-tree table
scans: rather than the current kluge that translates parent to child
column numbers, you'd just make sure that a child table is created with
column serial numbers matching the parent for its inherited columns.
Then the above-described mechanism takes care of finding the child
columns for you: essentially, a child-table tuple can be treated just
like a tuple that's not of the current schema in the parent table.
(I'm not sure if the trial implementation could do that too.)

The weakest feature of the whole scheme is the per-tuple runtime lookups
implied by points 4 and 5.  We could probably avoid any noticeable
slowdown in normal cases by caching the results in Var nodes of
execution plans, but in cases where a relation has a wild mix of tuples
of different vintages a single-entry cache wouldn't help much.

Another objection is the need to add an OID field to tuple headers; 4
more bytes per tuple adds up (and on some platforms it'd be 8 bytes due
to alignment considerations).

Another problem is that the distinction between column positions and
column serial numbers has the same kind of potential for confusion as
between logical and physical numbers in the trial implementation.  It
wouldn't be as bad, because the values would be different in most cases.

This'd be a sufficiently big change that I'm not at all sure we'd want
to do it that way.  But I thought I'd sketch out the idea and see if
anyone likes it.
        regards, tom lane

Re: alter table drop column status

From

Kovacs Zoltan

Date:

13 February 2002, 12:29:13

> to use immediately. I'm pretty suspicious if a developer
> could be careful about the choise when he is implementing
> an irrevant feature. (Un)fortunately the numbers have

Well, dropping a column doesn't seem to be a relevant feature. But
unfortunately our production system requires updates/upgrades "on the
fly", without stopping and dumping out/in the whole database. Currently
it's only about 16 megs of data but it's growing... I would be satisfied
with a working method for dropping and recreating only one table with a
short shutdown (~ a few minutes). The problem for me is that the foreign
key constraints of all referencing tables must be recreated and I want to
do this automagically. It would be enough for me if I could write a
script which does this reasonably fast.

I wanted to know if I should wait for the solution of the full ALTER TABLE
implementation or not. I'm afraid I shouldn't wait, should I? ;-)

--                         Kov\'acs, Zolt\'an                        kovacsz@pc10.radnoti-szeged.sulinet.hu
          http://www.math.u-szeged.hu/~kovzol                        ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz

Re: alter table drop column status

From

Jean-Michel POURE

Date:

14 February 2002, 04:22:15

Le Mercredi 13 Février 2002 18:23, Kovacs Zoltan a écrit :
> I wanted to know if I should wait for the solution of the full ALTER TABLE
> implementation or not. I'm afraid I shouldn't wait, should I? ;-)

What we could do using pgAdmin2 is :
("table_from" is the table to be modified, "table_to" is the resulting table)

1) Mark objects for deletion
* mark columns in "table_from" for deletion,
* mark primary keys in "table_from" for deletion,
* mark foreign keys in "table_from" for deletion,

2) Copy schema and data
* copy "table_to" structure out of "table_from" keeing only marked objects,
* copy data from "table_from" to "table_to",

3) Add rules and triggers, rename
* add "table_from" triggers to "table_to",
* add "table_from" rules to "table_to",
* drop table "table_from",
* rename "table_to".

The same script should also work for inherited tables.

This could be a hack until equivalent features are added natively to 
PostgreSQL. Do you think it is relevant to add this feature to pgAdmin2? Does 
Hiroshi script provide the same kind of features?

What is your opinion my dear friends? We wait for your advice.

Cheers,
jean-Michel POURE

Re: alter table drop column status

From

"Christopher Kings-Lynne"

Date:

14 February 2002, 20:22:18

That's the only way to do it at the moment - would you like to collaborate
on the actual sql script to get this done?  I wonder if it could be done
entirely with a stored procedure?  That'd be cool:

select drop_column(mytable, mycolumn);

Sweet :)

I'd like to implement this for phpPgAdmin as well.

Chris

> -----Original Message-----
> From: Jean-Michel POURE [mailto:jm.poure@freesurf.fr]
> Sent: Thursday, 14 February 2002 5:17 PM
> To: Kovacs Zoltan; Hiroshi Inoue; dpage@pgadmin.org
> Cc: Christopher Kings-Lynne; Tom Lane; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] alter table drop column status
>
>
> Le Mercredi 13 Février 2002 18:23, Kovacs Zoltan a écrit :
> > I wanted to know if I should wait for the solution of the full
> ALTER TABLE
> > implementation or not. I'm afraid I shouldn't wait, should I? ;-)
>
> What we could do using pgAdmin2 is :
> ("table_from" is the table to be modified, "table_to" is the
> resulting table)
>
> 1) Mark objects for deletion
> * mark columns in "table_from" for deletion,
> * mark primary keys in "table_from" for deletion,
> * mark foreign keys in "table_from" for deletion,
>
> 2) Copy schema and data
> * copy "table_to" structure out of "table_from" keeing only
> marked objects,
> * copy data from "table_from" to "table_to",
>
> 3) Add rules and triggers, rename
> * add "table_from" triggers to "table_to",
> * add "table_from" rules to "table_to",
> * drop table "table_from",
> * rename "table_to".
>
> The same script should also work for inherited tables.
>
> This could be a hack until equivalent features are added natively to
> PostgreSQL. Do you think it is relevant to add this feature to
> pgAdmin2? Does
> Hiroshi script provide the same kind of features?
>
> What is your opinion my dear friends? We wait for your advice.
>
> Cheers,
> jean-Michel POURE
>

Re: alter table drop column status

From

Kovacs Zoltan

Date:

15 February 2002, 01:52:18

> select drop_column(mytable, mycolumn);

IMHO first at least a LOCK should be executed on all tables which are in
any reference with "mytable". If LOCK is not enough, the entire database
should be locked (in pg_hba.conf) for all users except for the maintainer.

> > 1) Mark objects for deletion
> > * mark columns in "table_from" for deletion,
> > * mark primary keys in "table_from" for deletion,
> > * mark foreign keys in "table_from" for deletion,
* check all other tables if they have any references to the columns of "table_from" marked to be deleted; if check
fails,STOP
 
* lock all tables which appear in FOREIGN KEYS of "table_from" and all tables which have FOREIGN KEYS references to
"table_from"

> > 2) Copy schema and data
> > * copy "table_to" structure out of "table_from" keeing only
> > marked objects,
> > * copy data from "table_from" to "table_to",
> >
> > 3) Add rules and triggers, rename
> > * add "table_from" triggers to "table_to",
> > * add "table_from" rules to "table_to",
> > * drop table "table_from",
* (postgres will automatically drop referential integrity triggers from all tables referencing the the dropped table
"table_from")
> > * rename "table_to".
* recreate referential integrity triggers in all tables described above
* unlock all locked tables

I'm afraid LOCK is not available inside a PLPGSQL function (I write almost
everything in PLPGSQL). However, a shell script should do this easily, but
it's no so smart to call a shell script from a PLPGSQL function (although
I do this some time), if Cristopher would like to use it with a single
SELECT.

Regards, Zoltan
                        Kov\'acs, Zolt\'an                        kovacsz@pc10.radnoti-szeged.sulinet.hu
       http://www.math.u-szeged.hu/~kovzol                        ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz

Re: alter table drop column status

From

"Christopher Kings-Lynne"

Date:

15 February 2002, 02:52:16

> IMHO first at least a LOCK should be executed on all tables which are in
> any reference with "mytable". If LOCK is not enough, the entire database
> should be locked (in pg_hba.conf) for all users except for the maintainer.

Yep.

> I'm afraid LOCK is not available inside a PLPGSQL function (I write almost
> everything in PLPGSQL). However, a shell script should do this easily, but
> it's no so smart to call a shell script from a PLPGSQL function (although
> I do this some time), if Cristopher would like to use it with a single
> SELECT.

Hmmm - can LOCKs in PLPGSQL be added in 7.3, or are there reasons it's
difficult?

I'd love to publish a contrib of 'Chris's DDL functions' like:

alter_column_null(table, column, state)
drop_column(table, column)
drop_foreign_key(table, keyname)

etc.

So that people can use these in lieu of them being available natively in
postgres.

I guess they could be written in C - but then you may as well implement them
properly!

Chris

Re: alter table drop column status

From

Tom Lane

Date:

15 February 2002, 10:22:47

Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:
> I'm afraid LOCK is not available inside a PLPGSQL function

Works fine for me ...
        regards, tom lane

Re: alter table drop column status

From

"Zeugswetter Andreas SB SD"

Date:

15 February 2002, 12:22:13

> Another objection is the need to add an OID field to tuple headers; 4
> more bytes per tuple adds up (and on some platforms it'd be 8 bytes due
> to alignment considerations).

How about only allowing one version per page, this is how Informix does it.
Imho separating in memory tuple representation from on disk tuple representation
would be a good thing anyway. While you need to align certain things in memory
there is no need to align on disk stuff. This would potentially save a lot of
diskspace. I know a lot of people say disk space is cheap, but the issue is that
IO is slow. It would also open the door to features like compressing datapages 
like RDB does. We have calculated here that porting six ~750 Gb databases from 
rdb to some other db would need ~4 times the disk space.

Andreas

Re: alter table drop column status

From

Kovacs Zoltan

Date:

19 February 2002, 13:16:10

> > I'm afraid LOCK is not available inside a PLPGSQL function
> 
> Works fine for me ...

Hmm, it works for me, too. OK, I see no more rocks ahead writing a PLPGSQL
function which drops a column. It'll be slow, but it'll work. However, a C
function would be better. Unfortunately I have no experience in writing
libpq or ecpg functions. Is it a problem for you if I contribute a PLPGSQL
code?

Regards, Zoltan

Re: alter table drop column status

From

"Christopher Kings-Lynne"

Date:

19 February 2002, 21:54:18

Hi Zoltan,

I'd love to see a pl/pgsql funciton to drop a column submitted to the list.
I'll submit a set null / set not null one and maybe we can make up a little
package of functions for techdocs.postgres.org.

In fact, getting these pl/pgsql functions right will make it easier to write
C versions, which might make it easier to integrate the functionality
directly into postgres...

Chris

> -----Original Message-----
> From: Kovacs Zoltan [mailto:kovacsz@pc10.radnoti-szeged.sulinet.hu]
> Sent: Wednesday, 20 February 2002 2:07 AM
> To: Tom Lane
> Cc: Christopher Kings-Lynne; jm.poure@freesurf.fr; Hiroshi Inoue;
> dpage@pgadmin.org; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] alter table drop column status
>
>
> > > I'm afraid LOCK is not available inside a PLPGSQL function
> >
> > Works fine for me ...
>
> Hmm, it works for me, too. OK, I see no more rocks ahead writing a PLPGSQL
> function which drops a column. It'll be slow, but it'll work. However, a C
> function would be better. Unfortunately I have no experience in writing
> libpq or ecpg functions. Is it a problem for you if I contribute a PLPGSQL
> code?
>
> Regards, Zoltan
>