Thread: DROP COLUMN Proposal
Hi All, I've been thinking about this DROP COLUMN business (sorry to start another spammy, flamey thread!). I'm taking ideas from lots of sources here. How does this sound for a process? 1. A new column is added to pg_attribute called 'attisdropped'. It, of course, defaults to false. 2. The column expansion (*) code and the code that checks for valid column references everywhere in the codebase is changed to also check the attisdropped field. Does someone have a comprehensive list of places to be changed? 3. The DROP COLUMN command does nothing but set the attisdropped of a column to true, and rename the column to something like DELETED_old_col_name. The column renaming will help people using non-attisdropped aware admin programs see what's what, plus it will allow people to create a new column with the same name as the column just dropped. Now the dropped column will be invisible. As you update rows, etc. the space will be reclaimed in the table as NULLs are put in where the old value used to be. Is this correct? 4. A new command, something like "ALTER TABLE tab RECLAIM;" will be able to be run on tables. It will basically go through the entire table and rewrite every row as is, NULLifying all dropped columns in the table. This gives the DBA the option of recovering his/her space if they want. Notes ----- a. What happens with TOASTed columns that are dropped? b. Would it be worth implementing an 'UNDROP' command...? c. Do we need an 'attisreclaimed' field in pg_attribute to indicate that a field as been fully reclaimed, or do we just let people run it whenever they want (even if it has no effect other than to waste time)? d. Are there any other comments? Basically, I would like to come up with a 'white paper' implementation that we can all agree on. Then, I will try to code some parts myself, and solicit help from others for other parts. Hopefully, together we can get a DROP COLUMN implementation. The most important step, however, is to agree on an implementation spec. Hopefully I can get the www person to set up a project page (like the proposed win32 project page) to coordinate things. Comments? Regards, Chris
> 2. > The column expansion (*) code and the code that checks for valid column > references everywhere in the codebase is changed to also check the > attisdropped field. Does someone have a comprehensive list of > places to be > changed? Actually - did Hiroshi(?)'s original HACK have this code - we can re-use that. Chris
On Mon, 2002-07-01 at 09:47, Christopher Kings-Lynne wrote: > Hi All, > > I've been thinking about this DROP COLUMN business (sorry to start another > spammy, flamey thread!). I'm taking ideas from lots of sources here. > > How does this sound for a process? > > 1. > A new column is added to pg_attribute called 'attisdropped'. It, of course, > defaults to false. > > 2. > The column expansion (*) code and the code that checks for valid column > references everywhere in the codebase is changed to also check the > attisdropped field. Does someone have a comprehensive list of places to be > changed? It seems at least easy to test/debug incrementally: i.e put in the 'attisdropped' column with default 0 and _not_ the actual DROP COLUMN command. then test by manually setting and unsetting it until everything works, then switch on the command. > 3. > The DROP COLUMN command does nothing but set the attisdropped of a column to > true, This will probably require a full lock on system tables to avoid nasty border conditions when updating caches. But we probably have something like it for drop table already. > and rename the column to something like DELETED_old_col_name. With some number appended for the case when we want to drop several columns with the same same name. The name might be '-old_col_name' to save space ( not to overrun MAX_IDENTIFIER_LENGTH ) or even '-ld_col_name' > The > column renaming will help people using non-attisdropped aware admin programs > see what's what, plus it will allow people to create a new column with the > same name as the column just dropped. > > Now the dropped column will be invisible. As you update rows, etc. the > space will be reclaimed in the table as NULLs are put in where the old value > used to be. You probably have to set DEFAULT for this column to NULL to achieve it. And dropping / modifying indexes and constraints that reference the deleted column . > Is this correct? > > 4. > A new command, something like "ALTER TABLE tab RECLAIM;" will be able to be > run on tables. It will basically go through the entire table and rewrite > every row as is, NULLifying all dropped columns in the table. This gives > the DBA the option of recovering his/her space if they want. Could it not just be an oprion to "VACUUM table "? > Notes > ----- > a. What happens with TOASTed columns that are dropped? What happens currently when rows with TOASTed cols are deleted/updated ? > b. Would it be worth implementing an 'UNDROP' command...? I don't think so. Better to resurrect some form of limited time travel on system level, so that one can get back the data if it is really needed. > c. Do we need an 'attisreclaimed' field in pg_attribute to indicate that a > field as been fully reclaimed, or do we just let people run it whenever they > want (even if it has no effect other than to waste time)? > d. Are there any other comments? > > Basically, I would like to come up with a 'white paper' implementation that > we can all agree on. Then, I will try to code some parts myself, and > solicit help from others for other parts. Hopefully, together we can get a > DROP COLUMN implementation. The most important step, however, is to agree > on an implementation spec. Ironically, often the most important step in reaching agreement is showing clean working code ;) > Hopefully I can get the www person to set up a project page (like the > proposed win32 project page) to coordinate things. --------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: >> The DROP COLUMN command does nothing but set the attisdropped of a column to >> true, > This will probably require a full lock on system tables to avoid nasty > border conditions when updating caches. AFAICS it's no different from any other ALTER TABLE command: exclusive lock on the table being modified is necessary and sufficient. >> Now the dropped column will be invisible. As you update rows, etc. the >> space will be reclaimed in the table as NULLs are put in where the old value >> used to be. > You probably have to set DEFAULT for this column to NULL to achieve it. Right, get rid of any default. > And dropping / modifying indexes and constraints that reference the > deleted column . This part should fall out of Rod Taylor's pg_depend stuff pretty easily. We still need to debate about the behavior, though. If for example there is a unique index on column B, do you need "DROP B CASCADE" to get rid of it, or is "DROP B RESTRICT" good enough? Does your answer change if the unique index is on two columns (A,B)? I'm not real sure where the boundary is between attributes of the column (okay to drop as part of the column) and independent objects that ought to be treated as requiring CASCADE. >> A new command, something like "ALTER TABLE tab RECLAIM;" will be able to be >> run on tables. > Could it not just be an oprion to "VACUUM table "? I thought the same. It certainly doesn't belong with ALTER TABLE... >> a. What happens with TOASTed columns that are dropped? > What happens currently when rows with TOASTed cols are deleted/updated ? No different from anything else, AFAICS. The nice thing about this implementation approach is that most of the backend need not be aware of deleted columns. There are a few places in the parser (probably few enough to count on one hand) that will have to explicitly check for and reject references to dropped columns, and you're done. The rewriter, planner and executor are blissfully ignorant of the whole deal. You might have some problems with code in psql, pg_dump, or other clients that examines the system tables; it'd have to be fixed to pay attention to attisdropped as well. >> c. Do we need an 'attisreclaimed' field in pg_attribute to indicate that a >> field as been fully reclaimed, or do we just let people run it whenever they >> want (even if it has no effect other than to waste time)? Don't think we need it. regards, tom lane
> This part should fall out of Rod Taylor's pg_depend stuff pretty easily. > We still need to debate about the behavior, though. If for example there > is a unique index on column B, do you need "DROP B CASCADE" to get rid > of it, or is "DROP B RESTRICT" good enough? Does your answer change if > the unique index is on two columns (A,B)? I'm not real sure where the > boundary is between attributes of the column (okay to drop as part of > the column) and independent objects that ought to be treated as > requiring CASCADE. From SQL92: "If RESTRICT is specified, then C shall not be referenced in the <query expression> of any view descriptor or in the <search condition> of any constraint descriptor other than a table con- straint descriptor that contains references to no other column and that is included in the table descriptor of T." So I guess that means that if the unique index is only on the dropped column, then restrict mode will still be able to drop it... Chris
> The nice thing about this implementation approach is that most of the > backend need not be aware of deleted columns. There are a few places in > the parser (probably few enough to count on one hand) that will have to > explicitly check for and reject references to dropped columns, and > you're done. The rewriter, planner and executor are blissfully ignorant > of the whole deal. If you can enumerate these places without much effort, it'd be appreciated! I found: expandRTE() in parser/parse_relation.c What else? Chris