Re: ALTER TABLE DROP COLUMN - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: ALTER TABLE DROP COLUMN
Date
Msg-id 200006120100.VAA11386@candle.pha.pa.us
Whole thread Raw
In response to Re: ALTER TABLE DROP COLUMN  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> >> Seems we have 4 DROP COLUMN ideas:
> >> Method                                                  Advantage
> >> -----------------------------------------------------------------
> >> 1    invisible column marked by negative attnum        fast
> >> 2    invisible column marked by is_dropped column        fast
> >> 3    make copy of table without column            col removed
> >> 4    make new tuples in existing table without column    col removed
> 
> Bruce and I talked about this by phone yesterday, and we realized that
> none of these are very satisfactory.  #1 and #2 both have the flaw that
> applications that examine pg_attribute will probably break: they will
> see a sequence of attnum values with gaps in it.  And what should the
> rel's relnatts field be set to?  #3 and #4 are better on that point,
> but they leave us with the problem of renumbering references to columns
> after the dropped one in constraints, rules, PL functions, etc.

Yes, glad you summarized.

> 
> Furthermore, there is a closely related problem that none of these
> approaches give us much help on: recursive ALTER TABLE ADD COLUMN.
> Right now, ADD puts the new column at the end of each table it's added
> to, which often means that it gets a different column number in child
> tables than in parent tables.  That leads to havoc for pg_dump.

Also good point.

> 
> I think the only clean solution is to create a clear distinction between
> physical and logical column numbers.  Each pg_attribute tuple would need
> two attnum fields, and pg_class would need two relnatts fields as well.

Excellent idea.

> A column once created would never change its physical column number, but
> its logical column number might change as a consequence of adding or
> dropping columns before it.  ADD COLUMN would ensure that a column added
> to child tables receives the same logical column number as it has in the
> parent table, thus solving the dump/reload problem.  DROP COLUMN would
> assign an invalid logical column number to dropped columns.  They could
> be numbered zero except that we'd probably still want a unique index on
> attrelid+attnum, and the index would complain.  I'd suggest using
> Hiroshi's idea: give a dropped column a logical attnum equal to
> -(physical_attnum + offset).

My guess is that we would need a unique index on the physical attno, not
the logical one.  Multiple zero attno's may be fine.

> 
> With this approach, internal operations on tuples would all use
> physical column numbers, but operations that interface to the outside
> world would present a view of only the valid logical columns.  For
> example, the parser would only allow logical columns to be referenced
> by name; "SELECT *" would expand to valid logical columns in logical-
> column-number order; COPY would send or receive valid logical columns
> in logical-column-number order; etc.

Yes, the only hard part will be taking values supplied in logical order
and moving them into pysical order.  Not too hard with dropped columns,
because they are only gaps, but inheritance would require re-ordering
some of the values supplied by the user.  Not hard, just something
additional that is needed.

> 
> Stored rules and so forth probably should store physical column numbers
> so that they need not be modified during column add/drop.

Yes!

> 
> This would require looking at all the places in the backend to determine
> whether they should be working with logical or physical column numbers,
> but the design is such that most all places would want to be using
> physical numbers, so I don't think it'd be too painful.

Agreed.  Most are physical.

> 
> Although I'd prefer to give the replacement columns two new names
> (eg, "attlnum" and "attpnum") to ensure we find all uses, this would
> surely break applications that examine pg_attribute.  For compatibility
> we'd have to recycle "attnum" and "relnatts" to indicate logical column
> number and logical column count, while adding new fields (say "attpnum"
> and "relnpatts") for the physical number and count.

Can I recommend keeping attnum and relatts as logical, and adding
attheapnum and relheapatts so that it clearly shows these are the heap
values, not the user values.

Great idea.  I was seeing things blocked in every option until your
idea.

Also, my guess is that Hiroshi's #ifdef's mark the places we need to
start looking at.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: float8 regression / platform report
Next
From: Peter Eisentraut
Date:
Subject: Re: Gripe: working on configure now requires Automake installed locally