Thread: Suggested TODO: allow ALTERing of typemods without heap/index rebuild

Suggested TODO: allow ALTERing of typemods without heap/index rebuild

From
Josh Berkus
Date:
All,

I just realized that even if you do this:

table foo (id    serial,bar    varchar(200)
)

ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000)

... it triggers a heap & index rebuild, even though it's completely 
unnecessary.  Is this a special case of VARCHAR, or are there other 
types where we should be allowing typemod changes without rebuilding?

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Josh Berkus <josh@agliodbs.com> writes:
> I just realized that even if you do this:
> ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000)
> ... it triggers a heap & index rebuild, even though it's completely 
> unnecessary.

Yeah, this has been discussed before; I think it's even in the TODO
list.  The stumbling block has been to identify a reasonably clean way
of determining which datatype changes don't require a scan.

> Is this a special case of VARCHAR, or are there other 
> types where we should be allowing typemod changes without rebuilding?

There are any number of other cases where it's potentially interesting.
Consider:

* NUMERIC -> NUMERIC with a larger precision and/or scale

* VARBIT lengthening

* TIMESTAMP precision increase

* VARCHAR(anything) -> TEXT

and that's without considering the potential uses for user-defined
types.  Now that we allow user-defined types to have usable typmods,
I'm sure there will be applications for them too.  There are also
cases where a change might require a scan to ensure a new constraint
is met, but not a rewrite (eg, reducing the max length of VARCHAR).

We could certainly put in a quick hack that just covered a few of the
cases for built-in types, but it's not very pleasing ...
        regards, tom lane


On Mon, 2009-06-01 at 13:26 -0700, Josh Berkus wrote:
> All,
> 
> I just realized that even if you do this:
> 
> table foo (
>     id    serial,
>     bar    varchar(200)
> )
> 
> ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000)
> 
> ... it triggers a heap & index rebuild, even though it's completely 
> unnecessary.  Is this a special case of VARCHAR, or are there other 
> types where we should be allowing typemod changes without rebuilding?

NUMERIC(x, y) comes to mind, although that might be a more dangerous
case. If you turn a NUMERIC(5,0) into a NUMERIC(5,1), then '1.2' may be
stored as either '1' or '1.2' depending on whether you did the insert
before or after the change. That's because, with NUMERIC, it's not
really a constraint, but a rule for rounding.

There may be other interesting cases involving constraints. For
instance, if you have CHECK(i < 200), you should be able to add CHECK(i
< 1000) without an exclusive lock or recheck. Then, with an exclusive
lock, you can remove the original tighter constraint, but at least it
wouldn't have to recheck the entire table.

Not sure how much effort that is worth -- VARCHAR and NUMERIC typmods
are probably more common problems and easier to fix.

Regards,Jeff Davis



On Mon, Jun 1, 2009 at 9:49 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>
>
> NUMERIC(x, y) comes to mind, although that might be a more dangerous
> case. If you turn a NUMERIC(5,0) into a NUMERIC(5,1), then '1.2' may be
> stored as either '1' or '1.2' depending on whether you did the insert
> before or after the change. That's because, with NUMERIC, it's not
> really a constraint, but a rule for rounding.

Well it's not like rewriting the table is going to accomplish anything though...


> There may be other interesting cases involving constraints. For
> instance, if you have CHECK(i < 200), you should be able to add CHECK(i
> < 1000) without an exclusive lock or recheck. Then, with an exclusive
> lock, you can remove the original tighter constraint, but at least it
> wouldn't have to recheck the entire table.

We have the infrastructure for this kind of check actually, it's the
same kind of thing we do for partition exclusion...



-- 
greg


Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild

From
Guillaume Smet
Date:
On Mon, Jun 1, 2009 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We could certainly put in a quick hack that just covered a few of the
> cases for built-in types, but it's not very pleasing ...

Jonah proposed a patch for that a long time ago.

See http://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php .

-- 
Guillaume


Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild

From
Josh Berkus
Date:
> Yeah, this has been discussed before; I think it's even in the TODO
> list.

I couldn't find it.  At least, not under data types, and also not with 
the keyword "typemod".  Anyone see it?

> The stumbling block has been to identify a reasonably clean way
> of determining which datatype changes don't require a scan.

Yep.  One possibility I'm thinking is supplying a function for each type 
which takes two typemods (old and new) and returns a value (none, check, 
rebuild) which defines what we need to do: nothing, check but not 
rebuild, or rebuild.  Default would be rebuild.  Then the logic is 
simple for each data type.

Note that this doesn't deal with the special case of VARCHAR-->TEXT, but 
just with changing typemods.  Are there other cases of data *type* 
conversions where no check or rebuild is required?  Otherwise we might 
just special case VARCHAR-->TEXT.

Oh, here's a general case: changing DOMAINs on the same base type should 
only be a check, and changing from a DOMAIN to its own base type should 
be a none.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


On Mon, 2009-06-01 at 14:39 -0700, Josh Berkus wrote:
> Note that this doesn't deal with the special case of VARCHAR-->TEXT, but 
> just with changing typemods.  Are there other cases of data *type* 
> conversions where no check or rebuild is required?  Otherwise we might 
> just special case VARCHAR-->TEXT.

I observe that the casts (VARCHAR -> TEXT and TEXT -> VARCHAR) are
marked WITHOUT FUNCTION. If that's the case, can't we use that to say
that no heap rebuild is required? Perhaps we'll need to combine this
with the typmod checks to see if we need to check the heap. 

Regards,Jeff Davis



Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild

From
Josh Berkus
Date:
Jeff,

> I observe that the casts (VARCHAR ->  TEXT and TEXT ->  VARCHAR) are
> marked WITHOUT FUNCTION. If that's the case, can't we use that to say
> that no heap rebuild is required? Perhaps we'll need to combine this
> with the typmod checks to see if we need to check the heap.

yeah, you're right .. that would give us a short list of conversions 
which don't require a rewrite.    However, as Tom points out, that 
doesn't mean that they might not need a reindex (as well as OID, there's 
also XML).

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Josh Berkus <josh@agliodbs.com> writes:
> yeah, you're right .. that would give us a short list of conversions 
> which don't require a rewrite.    However, as Tom points out, that 
> doesn't mean that they might not need a reindex (as well as OID, there's 
> also XML).

Um.  I had actually forgotten about the reindexing point, but yup that
is a stumbling block to any "no work" conversions.  It might be best to
only handle cases where the column's base type is not changing, so that
we don't have any index semantics changes happening.  I think we could
still handle the varchar->text case (since they share index opclasses)
but that could be a hardwired special case.
        regards, tom lane


Josh Berkus <josh@agliodbs.com> writes:
>> Yeah, this has been discussed before; I think it's even in the TODO
>> list.

> I couldn't find it.  At least, not under data types, and also not with 
> the keyword "typemod".  Anyone see it?

It's the last item under ALTER:

* Don't require table rewrite on ALTER TABLE ... ALTER COLUMN TYPE, when the old and new data types are binary
compatiblehttp://archives.postgresql.org/message-id/200903040137.n241bAUV035002@wwwmaster.postgresql.orghttp://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php
        regards, tom lane


Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild

From
Dimitri Fontaine
Date:
Hi,

Josh Berkus <josh@agliodbs.com> writes:
>> The stumbling block has been to identify a reasonably clean way
>> of determining which datatype changes don't require a scan.
>
> Yep.  One possibility I'm thinking is supplying a function for each type
> which takes two typemods (old and new) and returns a value (none, check,
> rebuild) which defines what we need to do: nothing, check but not rebuild,
> or rebuild.  Default would be rebuild.  Then the logic is simple for each
> data type.

That seems like a good idea, I don't see how the current infrastructure
could provide enough information to skip this here. Add in there whether
a reindex is needed, too, in the accepted return values (maybe a mask is
needed, such as NOREWRITE|REINDEX).

> Note that this doesn't deal with the special case of VARCHAR-->TEXT, but
> just with changing typemods.  Are there other cases of data *type*
> conversions where no check or rebuild is required?  Otherwise we might just
> special case VARCHAR-->TEXT.

It seems there's some new stuff for this in 8.4, around the notions of
binary coercibility and type categories, which allow user defined types
to be declared IO compatible with built-in types, e.g. citext/text.

Maybe the case is not so special anymore?
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=22ff6d46991447bffaff343f4e333dcee188094d
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=4a3be7e52d7e87d2c05ecc59bc4e7d20f0bc9b17

> Oh, here's a general case: changing DOMAINs on the same base type should
> only be a check, and changing from a DOMAIN to its own base type should be a
> none.

DOMAINs and CASTs are still on the todo list IIRC, so I'm not sure the
current infrastructure around DOMAINs would be flexible (or complete)
enough for the system to determine when the domain A to domain B type
change is binary coercible. It has no CAST information to begin with, I
guess.

As far as reindexing is concerned, talking with RhodiumToad (Andrew
Gierth) on IRC gave insights, as usual. Standard PostgreSQL supports two
data type change without reindex needs: varchar to text and cidr to
inet. In both cases, the types share the indexing infrastructure: same
PROCEDUREs are in use in the OPERATORs that the index is using.

Could it be that we already have the information we need in order to
dynamically decide whether a heap rewrite and a reindex are necessary,
even in case of user defined type conversions?

Regards,
-- 
dim