Thread: Demo patch for DROP COLUMN

Demo patch for DROP COLUMN

From
"Christopher Kings-Lynne"
Date:
DON'T APPLY THIS PATCH!

Hi All,

This is the patch of what I've done so far.  I'd value any feedback!

The only thing that I'm not checking (as far as I know) is ALTER TABLE/ADD
FOREIGN KEY and CREATE CONSTRAINT TRIGGER, as I'm still trying to figure out
the best place to make the change.

Every other command is updated.

Dependency support is commented out until I figure out how to do it...

It includes the most detailed regression test I've ever written, but no
expected results until I fix foreign keys.

Please have a look and see if there's anything I've overlooked...

Thanks,

Chris

Attachment

Re: Demo patch for DROP COLUMN

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> This is the patch of what I've done so far.  I'd value any feedback!

A few comments ---

You really really should put in the wrapper routines that were suggested
for SearchSysCache, as this would avoid a whole lot of duplicated code;
most of the explicit tests for COLUMN_IS_DROPPED that are in your patch
could be replaced with that technique, reducing code size and increasing
readability.

The implementation routine for DROP COLUMN is wrong; it needs to be
split into two parts, one that calls performDeletion and one that's
called by same.  AlterTableDropConstraint might be a useful model,
although looking at it I wonder whether it has the permissions checks
right.  (Should the owner of a table be allowed to drop constraints on
child tables he doesn't own?  If not, we should recurse to
AlterTableDropConstraint rather than calling RemoveRelChecks directly.)
Also, you have

!      * Propagate to children if desired
!      * @@ THIS BEHAVIOR IS BROKEN, HOWEVER IT MATCHES RENAME COLUMN @@

What's broken about it?  RENAME probably is broken, in that it should
never allow non-recursion, but DROP doesn't have that issue.


!     for (;;) {
!         if (SearchSysCacheExists(ATTNAME,
!                                  ObjectIdGetDatum(myrelid),
!                                  PointerGetDatum(newattname),
!                                  0, 0)) {
!             snprintf(newattname, NAMEDATALEN, "___pg_dropped%d %d", ++n, attnum);
!         }
!         else break;
!     }

Why not just

    while (SearchSysCacheExists(...))
    {
        snprintf(... ++n ...);
    }

More to the point, though, this is a waste of time.  Since you're using
a physical column number there can be no conflict against the names of
other dropped columns.  There might be a conflict against a user column
name, but the answer to that is to choose a weird name in the first
place; the hack above only avoids half of the problem, namely when the
user column already exists.  You're still wrong if he tries to do ALTER
TABLE ADD COLUMN ___pg_dropped1 later on.  Therefore what you want to
do is use a name that is (a) very unlikely to be used in practice, and
(b) predictable so that we can document it and say "tough, you can't use
that name".  I'd suggest something like
    ........pg.dropped.%d........
Note the use of dots instead of underscores --- we may as well pick a
name that's not valid as an unquoted identifier.  (Dashes or other
choices would work as well.  I'm not in favor of control characters
or whitespace in the name though.)

      foreach(c, rte->eref->colnames)
      {
          attnum++;
!         if (strcmp(strVal(lfirst(c)), colname) == 0)
          {
              if (result)
                  elog(ERROR, "Column reference \"%s\" is ambiguous", colname);
--- 271,278 ----
      foreach(c, rte->eref->colnames)
      {
          attnum++;
!         /* Column name must be non-NULL and be a string match */
!         if (lfirst(c) != NULL && strcmp(strVal(lfirst(c)), colname) == 0)
          {
              if (result)
                  elog(ERROR, "Column reference \"%s\" is ambiguous", colname);

Although I think it was my suggestion to put nulls in the eref lists,
it sure looks messy.  What happens if you just allow the dropped-column
names to be used in eref?  After getting a match you'll need to check if
the column is actually valid, but I think maybe this can be combined
with lookups that will occur anyway.  Probably it'd net out being fewer
changes in total.

The proposed pg_dump change will NOT work, since it will cause pg_dump's
idea of column numbers not to agree with reality, thus breaking
pg_attrdef lookups and probably other places.  Probably best bet is to
actually retrieve attisdropped (or constant 'f' for older releases),
store dropped attrs in the pg_dump datastructures anyway, and then omit
them at the time of listing out the table definition.

tab-complete query is wrong (try it)...

            regards, tom lane

Re: Demo patch for DROP COLUMN

From
"Christopher Kings-Lynne"
Date:
> You really really should put in the wrapper routines that were suggested
> for SearchSysCache, as this would avoid a whole lot of duplicated code;
> most of the explicit tests for COLUMN_IS_DROPPED that are in your patch
> could be replaced with that technique, reducing code size and increasing
> readability.

Thing is that not all the commands use SearchSysCache (some use get_attnum
for instance), and you'd still need to go thru all the commands and change
them to use SearchSysCacheNotDropped(blah).  I'll do it tho, if you think
it's a better way.

> The implementation routine for DROP COLUMN is wrong; it needs to be
> split into two parts, one that calls performDeletion and one that's
> called by same.  AlterTableDropConstraint might be a useful model,
> although looking at it I wonder whether it has the permissions checks
> right.  (Should the owner of a table be allowed to drop constraints on
> child tables he doesn't own?  If not, we should recurse to
> AlterTableDropConstraint rather than calling RemoveRelChecks directly.)

Yeah - I hadn't gotten around to researching how to split it up yet.  That's
why I commented out the performDeletion.  I'll do it soon.

> Also, you have
>
> !      * Propagate to children if desired
> !      * @@ THIS BEHAVIOR IS BROKEN, HOWEVER IT MATCHES RENAME COLUMN @@
>
> What's broken about it?  RENAME probably is broken, in that it should
> never allow non-recursion, but DROP doesn't have that issue.

Hrm.  Yeah I guess the real problem is dropping a column that still exists
in an ancestor.  I got mixed up there.  Is there a problem with this,
though:

ALTER TABLE ONLY ancestor DROP a;

Seems a little dodgy to me...

> Why not just
>
>     while (SearchSysCacheExists(...))
>     {
>         snprintf(... ++n ...);
>     }

Ah yes.

> More to the point, though, this is a waste of time.  Since you're using
> a physical column number there can be no conflict against the names of
> other dropped columns.  There might be a conflict against a user column
> name, but the answer to that is to choose a weird name in the first
> place; the hack above only avoids half of the problem, namely when the
> user column already exists.

Problem was, I couldn't find a generated name that was _impossible_ to enter
as a user.  So, I made sure it would never be a problem.

> You're still wrong if he tries to do ALTER
> TABLE ADD COLUMN ___pg_dropped1 later on.  Therefore what you want to
> do is use a name that is (a) very unlikely to be used in practice, and
> (b) predictable so that we can document it and say "tough, you can't use
> that name".  I'd suggest something like
>     ........pg.dropped.%d........
> Note the use of dots instead of underscores --- we may as well pick a
> name that's not valid as an unquoted identifier.

Not that I already use spaces in my generated names, for this exact reason.
But I'll change it to your example above...

> (Dashes or other
> choices would work as well.  I'm not in favor of control characters
> or whitespace in the name though.)

OK, no whitespace.

> Although I think it was my suggestion to put nulls in the eref lists,
> it sure looks messy.  What happens if you just allow the dropped-column
> names to be used in eref?  After getting a match you'll need to check if
> the column is actually valid, but I think maybe this can be combined
> with lookups that will occur anyway.  Probably it'd net out being fewer
> changes in total.

Well, that change was actually trivial after I thought it was going to be
difficult...  When you say "after getting a match need to check if actually
valid", wasn't that what I proposed originally, but we decided it'd be
faster if the columns never even made it in?  Where would you propose doing
these post hoc checks?

> The proposed pg_dump change will NOT work, since it will cause pg_dump's
> idea of column numbers not to agree with reality, thus breaking
> pg_attrdef lookups and probably other places.  Probably best bet is to
> actually retrieve attisdropped (or constant 'f' for older releases),
> store dropped attrs in the pg_dump datastructures anyway, and then omit
> them at the time of listing out the table definition.

Yes, I'm actually aware of this.  In fact, I was planning to make exactly
the change you have suggested here.

> tab-complete query is wrong (try it)...

OK, hadn't noticed that one.  BTW, how do I make a dropped column disappear
from the column-name-tab-complete-cache thingy in psql?  I guess I'll see
how DROP TABLE works...

Chris


Re: Demo patch for DROP COLUMN

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> What's broken about it?  RENAME probably is broken, in that it should
>> never allow non-recursion, but DROP doesn't have that issue.

> Hrm.  Yeah I guess the real problem is dropping a column that still exists
> in an ancestor.  I got mixed up there.

Yup, we need to figure out a way of preventing that.  I've been thinking
about adding an attisinherited column to pg_attribute, to mark columns
that came from a parent table.  Such a column could not be renamed or
dropped except in a command that's recursed from the parent.  (But what
about multiply-inherited columns?)

> Is there a problem with this,
> though:
> ALTER TABLE ONLY ancestor DROP a;
> Seems a little dodgy to me...

Seems okay to me.  The columns in the children would cease to be
inherited and would become plain columns.  That might mean going
around and marking them so, if we adopt an explicit marking.

>> the hack above only avoids half of the problem, namely when the
>> user column already exists.

> Problem was, I couldn't find a generated name that was _impossible_ to enter
> as a user.  So, I made sure it would never be a problem.

But you *didn't* make sure it would never be a problem.

> Well, that change was actually trivial after I thought it was going to be
> difficult...  When you say "after getting a match need to check if actually
> valid", wasn't that what I proposed originally,

No, your original code tested for validity before the strcmp, which is
surely the slowest possible way to do things.

> Where would you propose doing these post hoc checks?

Not sure yet.  I'm just wondering whether you've found all the places
that will need to be tweaked to not dump core on nulls in the eref
lists...

            regards, tom lane