Thread: DROP COLUMN Proposal

DROP COLUMN Proposal

From
"Christopher Kings-Lynne"
Date:
Hi All,

I've been thinking about this DROP COLUMN business (sorry to start another
spammy, flamey thread!).  I'm taking ideas from lots of sources here.

How does this sound for a process?

1.
A new column is added to pg_attribute called 'attisdropped'.  It, of course,
defaults to false.

2.
The column expansion (*) code and the code that checks for valid column
references everywhere in the codebase is changed to also check the
attisdropped field.  Does someone have a comprehensive list of places to be
changed?

3.
The DROP COLUMN command does nothing but set the attisdropped of a column to
true, and rename the column to something like DELETED_old_col_name.  The
column renaming will help people using non-attisdropped aware admin programs
see what's what, plus it will allow people to create a new column with the
same name as the column just dropped.

Now the dropped column will be invisible.  As you update rows, etc. the
space will be reclaimed in the table as NULLs are put in where the old value
used to be.  Is this correct?

4.
A new command, something like "ALTER TABLE tab RECLAIM;" will be able to be
run on tables.  It will basically go through the entire table and rewrite
every row as is, NULLifying all dropped columns in the table.  This gives
the DBA the option of recovering his/her space if they want.

Notes
-----
a. What happens with TOASTed columns that are dropped?
b. Would it be worth implementing an 'UNDROP' command...?
c. Do we need an 'attisreclaimed' field in pg_attribute to indicate that a
field as been fully reclaimed, or do we just let people run it whenever they
want (even if it has no effect other than to waste time)?
d. Are there any other comments?

Basically, I would like to come up with a 'white paper' implementation that
we can all agree on.  Then, I will try to code some parts myself, and
solicit help from others for other parts.  Hopefully, together we can get a
DROP COLUMN implementation.  The most important step, however, is to agree
on an implementation spec.

Hopefully I can get the www person to set up a project page (like the
proposed win32 project page) to coordinate things.

Comments?

Regards,

Chris





Re: DROP COLUMN Proposal

From
"Christopher Kings-Lynne"
Date:
> 2.
> The column expansion (*) code and the code that checks for valid column
> references everywhere in the codebase is changed to also check the
> attisdropped field.  Does someone have a comprehensive list of
> places to be
> changed?

Actually - did Hiroshi(?)'s original HACK have this code - we can re-use
that.

Chris





Re: DROP COLUMN Proposal

From
Hannu Krosing
Date:
On Mon, 2002-07-01 at 09:47, Christopher Kings-Lynne wrote:
> Hi All,
> 
> I've been thinking about this DROP COLUMN business (sorry to start another
> spammy, flamey thread!).  I'm taking ideas from lots of sources here.
> 
> How does this sound for a process?
> 
> 1.
> A new column is added to pg_attribute called 'attisdropped'.  It, of course,
> defaults to false.
> 
> 2.
> The column expansion (*) code and the code that checks for valid column
> references everywhere in the codebase is changed to also check the
> attisdropped field.  Does someone have a comprehensive list of places to be
> changed?

It seems at least easy to test/debug incrementally:

i.e put in the 'attisdropped' column with default 0 and _not_ the actual
DROP COLUMN command. then test by manually setting and unsetting it
until everything works, then switch on the command.

> 3.
> The DROP COLUMN command does nothing but set the attisdropped of a column to
> true, 

This will probably require a full lock on system tables to avoid nasty
border conditions when updating caches. But we probably have something
like it for drop table already.

> and rename the column to something like DELETED_old_col_name.

With some number appended for the case when we want to drop several
columns with the same same name.

The name might be '-old_col_name' to save space ( not to overrun
MAX_IDENTIFIER_LENGTH ) or even '-ld_col_name'

> The
> column renaming will help people using non-attisdropped aware admin programs
> see what's what, plus it will allow people to create a new column with the
> same name as the column just dropped.
> 
> Now the dropped column will be invisible.  As you update rows, etc. the
> space will be reclaimed in the table as NULLs are put in where the old value
> used to be. 

You probably have to set DEFAULT for this column to NULL to achieve it.
And dropping / modifying indexes and constraints that reference the
deleted column .

> Is this correct?
> 
> 4.
> A new command, something like "ALTER TABLE tab RECLAIM;" will be able to be
> run on tables.  It will basically go through the entire table and rewrite
> every row as is, NULLifying all dropped columns in the table.  This gives
> the DBA the option of recovering his/her space if they want.

Could it not just be an oprion to "VACUUM table "?

> Notes
> -----
> a. What happens with TOASTed columns that are dropped?

What happens currently when rows with TOASTed cols are deleted/updated ?

> b. Would it be worth implementing an 'UNDROP' command...?

I don't think so. Better to resurrect some form of limited time travel
on system level, so that one can get back the data if it is really
needed.

> c. Do we need an 'attisreclaimed' field in pg_attribute to indicate that a
> field as been fully reclaimed, or do we just let people run it whenever they
> want (even if it has no effect other than to waste time)?
> d. Are there any other comments?
> 
> Basically, I would like to come up with a 'white paper' implementation that
> we can all agree on.  Then, I will try to code some parts myself, and
> solicit help from others for other parts.  Hopefully, together we can get a
> DROP COLUMN implementation.  The most important step, however, is to agree
> on an implementation spec.

Ironically, often the most important step in reaching agreement is
showing clean working  code ;)

> Hopefully I can get the www person to set up a project page (like the
> proposed win32 project page) to coordinate things.

---------------
Hannu






Re: DROP COLUMN Proposal

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
>> The DROP COLUMN command does nothing but set the attisdropped of a column to
>> true, 

> This will probably require a full lock on system tables to avoid nasty
> border conditions when updating caches.

AFAICS it's no different from any other ALTER TABLE command: exclusive
lock on the table being modified is necessary and sufficient.

>> Now the dropped column will be invisible.  As you update rows, etc. the
>> space will be reclaimed in the table as NULLs are put in where the old value
>> used to be. 

> You probably have to set DEFAULT for this column to NULL to achieve it.

Right, get rid of any default.

> And dropping / modifying indexes and constraints that reference the
> deleted column .

This part should fall out of Rod Taylor's pg_depend stuff pretty easily.
We still need to debate about the behavior, though.  If for example there
is a unique index on column B, do you need "DROP B CASCADE" to get rid
of it, or is "DROP B RESTRICT" good enough?  Does your answer change if
the unique index is on two columns (A,B)?  I'm not real sure where the
boundary is between attributes of the column (okay to drop as part of
the column) and independent objects that ought to be treated as
requiring CASCADE.

>> A new command, something like "ALTER TABLE tab RECLAIM;" will be able to be
>> run on tables.

> Could it not just be an oprion to "VACUUM table "?

I thought the same.  It certainly doesn't belong with ALTER TABLE...

>> a. What happens with TOASTed columns that are dropped?

> What happens currently when rows with TOASTed cols are deleted/updated ?

No different from anything else, AFAICS.

The nice thing about this implementation approach is that most of the
backend need not be aware of deleted columns.  There are a few places in
the parser (probably few enough to count on one hand) that will have to
explicitly check for and reject references to dropped columns, and
you're done.  The rewriter, planner and executor are blissfully ignorant
of the whole deal.

You might have some problems with code in psql, pg_dump, or other
clients that examines the system tables; it'd have to be fixed to pay
attention to attisdropped as well.

>> c. Do we need an 'attisreclaimed' field in pg_attribute to indicate that a
>> field as been fully reclaimed, or do we just let people run it whenever they
>> want (even if it has no effect other than to waste time)?

Don't think we need it.
        regards, tom lane




Re: DROP COLUMN Proposal

From
"Christopher Kings-Lynne"
Date:
> This part should fall out of Rod Taylor's pg_depend stuff pretty easily.
> We still need to debate about the behavior, though.  If for example there
> is a unique index on column B, do you need "DROP B CASCADE" to get rid
> of it, or is "DROP B RESTRICT" good enough?  Does your answer change if
> the unique index is on two columns (A,B)?  I'm not real sure where the
> boundary is between attributes of the column (okay to drop as part of
> the column) and independent objects that ought to be treated as
> requiring CASCADE.

From SQL92:

"If RESTRICT is specified, then C shall not be referenced in
the <query expression> of any view descriptor or in the <search
condition> of any constraint descriptor other than a table con-
straint descriptor that contains references to no other column
and that is included in the table descriptor of T."

So I guess that means that if the unique index is only on the dropped
column, then restrict mode will still be able to drop it...

Chris








Re: DROP COLUMN Proposal

From
"Christopher Kings-Lynne"
Date:
> The nice thing about this implementation approach is that most of the
> backend need not be aware of deleted columns.  There are a few places in
> the parser (probably few enough to count on one hand) that will have to
> explicitly check for and reject references to dropped columns, and
> you're done.  The rewriter, planner and executor are blissfully ignorant
> of the whole deal.

If you can enumerate these places without much effort, it'd be appreciated!

I found:

expandRTE() in parser/parse_relation.c

What else?

Chris