Re: RFC: Restructuring pg_aggregate - Mailing list pgsql-hackers

From Tom Lane
Subject Re: RFC: Restructuring pg_aggregate
Date
Msg-id 25867.1018110892@sss.pgh.pa.us
Whole thread Raw
In response to Re: RFC: Restructuring pg_aggregate  (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
Responses Re: RFC: Restructuring pg_aggregate  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
List pgsql-hackers
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> How about putting a note in the 7.3 release that tells them not to rely on
> sequential attnums in tn pg_attribute anymore.  That should make it easier
> to implement column dropping in the future?

That seems like pure speculation to me, if not outright wrong.  If we
can't renumber the attnums it'll be because the table's tuples still
have data at a particular column position.  Since we'll need to know
the datatype of that data (if only to skip over it correctly), there
will still have to be a pg_attribute entry for the dropped column.
Thus, what people will more likely have to watch out for is pg_attribute
rows marked "deleted" in some fashion.

We are actually not that far away from being able to do DROP COLUMN,
if people don't mind being slow to recover the space used by a dropped
column.  It'd work like this:

1. Add an "attisdropped" boolean to pg_attribute.

2. DROP COLUMN sets this flag and changes attname to something like
"***deleted_NNN".  (Changing attname is only necessary to allow the
same column name to be reused without drawing a unique-index error.)
That's it --- it's done.

3. Column lookup, expansion of *, etc have to be taught to ignore
columns marked attisdropped.

The idea is that the extant data sits there but is invisible.  Inserts
of new rows in the table would always insert a NULL in the dropped
column (which'd fall out more or less for free, there being no way
to tell the system to insert anything else).  Over time, UPDATEs of
extant rows would also replace the dropped data with NULLs.

I suspect there are only about half a dozen key places that would have
to explicitly check attisdropped.  None of the low-level executor
machinery would care at all, since it's dealing with "real" tuples where
the attribute is still there, at least as a NULL.

Hiroshi's "DROP_COLUMN_HACK" was essentially along this line, but
I think he made a representational mistake by trying to change the
attnums of dropped columns to be negative values.  That means that
a lot of low-level places *do* need to know about the dropped-column
convention, else they can't make any sense of tuple layouts.  The
negative-attnum idea might have been a little easier for clients
inspecting pg_attribute to cope with, but in practice I think they'd
need to be taught about dropped columns anyway --- as evidenced by
your remark suggesting that gaps in the sequence of positive attnums
would break clients.
        regards, tom lane

PS: Once you have that, try this on for size: ALTER COLUMN is
ALTER DROP COLUMN;ALTER ADD COLUMN newtype;UPDATE foo SET newcol = coercion_fn(oldcol);

That last couldn't be expressed as an SQL statement because the parser
wouldn't allow access to oldcol, but there's nothing stopping it at the
implementation level.

This approach changes the user-visible column ordering, which'd be
a tad annoying, so probably something based on building a new version of
the table would be better.  But as a quick hack this would be doable.

Actually, given the DROP feature a user could do it for himself:
ALTER ADD COLUMN tempcol newtype;UPDATE foo SET tempcol = coercion_fn(oldcol);ALTER DROP COLUMN oldcol;ALTER RENAME
COLUMNtempcol to oldcol;
 

which seems like an okay approach, especially since it'd allow the
UPDATE computing the new column values to be of arbitrary complexity,
not just a simple coercion of one existing column.


pgsql-hackers by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Suggestion for optimization
Next
From: Bruce Momjian
Date:
Subject: Re: timeout implementation issues