Thread: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
All, I just realized that even if you do this: table foo (id serial,bar varchar(200) ) ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000) ... it triggers a heap & index rebuild, even though it's completely unnecessary. Is this a special case of VARCHAR, or are there other types where we should be allowing typemod changes without rebuilding? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I just realized that even if you do this: > ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000) > ... it triggers a heap & index rebuild, even though it's completely > unnecessary. Yeah, this has been discussed before; I think it's even in the TODO list. The stumbling block has been to identify a reasonably clean way of determining which datatype changes don't require a scan. > Is this a special case of VARCHAR, or are there other > types where we should be allowing typemod changes without rebuilding? There are any number of other cases where it's potentially interesting. Consider: * NUMERIC -> NUMERIC with a larger precision and/or scale * VARBIT lengthening * TIMESTAMP precision increase * VARCHAR(anything) -> TEXT and that's without considering the potential uses for user-defined types. Now that we allow user-defined types to have usable typmods, I'm sure there will be applications for them too. There are also cases where a change might require a scan to ensure a new constraint is met, but not a rewrite (eg, reducing the max length of VARCHAR). We could certainly put in a quick hack that just covered a few of the cases for built-in types, but it's not very pleasing ... regards, tom lane
On Mon, 2009-06-01 at 13:26 -0700, Josh Berkus wrote: > All, > > I just realized that even if you do this: > > table foo ( > id serial, > bar varchar(200) > ) > > ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000) > > ... it triggers a heap & index rebuild, even though it's completely > unnecessary. Is this a special case of VARCHAR, or are there other > types where we should be allowing typemod changes without rebuilding? NUMERIC(x, y) comes to mind, although that might be a more dangerous case. If you turn a NUMERIC(5,0) into a NUMERIC(5,1), then '1.2' may be stored as either '1' or '1.2' depending on whether you did the insert before or after the change. That's because, with NUMERIC, it's not really a constraint, but a rule for rounding. There may be other interesting cases involving constraints. For instance, if you have CHECK(i < 200), you should be able to add CHECK(i < 1000) without an exclusive lock or recheck. Then, with an exclusive lock, you can remove the original tighter constraint, but at least it wouldn't have to recheck the entire table. Not sure how much effort that is worth -- VARCHAR and NUMERIC typmods are probably more common problems and easier to fix. Regards,Jeff Davis
On Mon, Jun 1, 2009 at 9:49 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > > NUMERIC(x, y) comes to mind, although that might be a more dangerous > case. If you turn a NUMERIC(5,0) into a NUMERIC(5,1), then '1.2' may be > stored as either '1' or '1.2' depending on whether you did the insert > before or after the change. That's because, with NUMERIC, it's not > really a constraint, but a rule for rounding. Well it's not like rewriting the table is going to accomplish anything though... > There may be other interesting cases involving constraints. For > instance, if you have CHECK(i < 200), you should be able to add CHECK(i > < 1000) without an exclusive lock or recheck. Then, with an exclusive > lock, you can remove the original tighter constraint, but at least it > wouldn't have to recheck the entire table. We have the infrastructure for this kind of check actually, it's the same kind of thing we do for partition exclusion... -- greg
On Mon, Jun 1, 2009 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > We could certainly put in a quick hack that just covered a few of the > cases for built-in types, but it's not very pleasing ... Jonah proposed a patch for that a long time ago. See http://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php . -- Guillaume
> Yeah, this has been discussed before; I think it's even in the TODO > list. I couldn't find it. At least, not under data types, and also not with the keyword "typemod". Anyone see it? > The stumbling block has been to identify a reasonably clean way > of determining which datatype changes don't require a scan. Yep. One possibility I'm thinking is supplying a function for each type which takes two typemods (old and new) and returns a value (none, check, rebuild) which defines what we need to do: nothing, check but not rebuild, or rebuild. Default would be rebuild. Then the logic is simple for each data type. Note that this doesn't deal with the special case of VARCHAR-->TEXT, but just with changing typemods. Are there other cases of data *type* conversions where no check or rebuild is required? Otherwise we might just special case VARCHAR-->TEXT. Oh, here's a general case: changing DOMAINs on the same base type should only be a check, and changing from a DOMAIN to its own base type should be a none. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Mon, 2009-06-01 at 14:39 -0700, Josh Berkus wrote: > Note that this doesn't deal with the special case of VARCHAR-->TEXT, but > just with changing typemods. Are there other cases of data *type* > conversions where no check or rebuild is required? Otherwise we might > just special case VARCHAR-->TEXT. I observe that the casts (VARCHAR -> TEXT and TEXT -> VARCHAR) are marked WITHOUT FUNCTION. If that's the case, can't we use that to say that no heap rebuild is required? Perhaps we'll need to combine this with the typmod checks to see if we need to check the heap. Regards,Jeff Davis
Jeff, > I observe that the casts (VARCHAR -> TEXT and TEXT -> VARCHAR) are > marked WITHOUT FUNCTION. If that's the case, can't we use that to say > that no heap rebuild is required? Perhaps we'll need to combine this > with the typmod checks to see if we need to check the heap. yeah, you're right .. that would give us a short list of conversions which don't require a rewrite. However, as Tom points out, that doesn't mean that they might not need a reindex (as well as OID, there's also XML). -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > yeah, you're right .. that would give us a short list of conversions > which don't require a rewrite. However, as Tom points out, that > doesn't mean that they might not need a reindex (as well as OID, there's > also XML). Um. I had actually forgotten about the reindexing point, but yup that is a stumbling block to any "no work" conversions. It might be best to only handle cases where the column's base type is not changing, so that we don't have any index semantics changes happening. I think we could still handle the varchar->text case (since they share index opclasses) but that could be a hardwired special case. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: >> Yeah, this has been discussed before; I think it's even in the TODO >> list. > I couldn't find it. At least, not under data types, and also not with > the keyword "typemod". Anyone see it? It's the last item under ALTER: * Don't require table rewrite on ALTER TABLE ... ALTER COLUMN TYPE, when the old and new data types are binary compatiblehttp://archives.postgresql.org/message-id/200903040137.n241bAUV035002@wwwmaster.postgresql.orghttp://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php regards, tom lane
Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
From
Dimitri Fontaine
Date:
Hi, Josh Berkus <josh@agliodbs.com> writes: >> The stumbling block has been to identify a reasonably clean way >> of determining which datatype changes don't require a scan. > > Yep. One possibility I'm thinking is supplying a function for each type > which takes two typemods (old and new) and returns a value (none, check, > rebuild) which defines what we need to do: nothing, check but not rebuild, > or rebuild. Default would be rebuild. Then the logic is simple for each > data type. That seems like a good idea, I don't see how the current infrastructure could provide enough information to skip this here. Add in there whether a reindex is needed, too, in the accepted return values (maybe a mask is needed, such as NOREWRITE|REINDEX). > Note that this doesn't deal with the special case of VARCHAR-->TEXT, but > just with changing typemods. Are there other cases of data *type* > conversions where no check or rebuild is required? Otherwise we might just > special case VARCHAR-->TEXT. It seems there's some new stuff for this in 8.4, around the notions of binary coercibility and type categories, which allow user defined types to be declared IO compatible with built-in types, e.g. citext/text. Maybe the case is not so special anymore? http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=22ff6d46991447bffaff343f4e333dcee188094d http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=4a3be7e52d7e87d2c05ecc59bc4e7d20f0bc9b17 > Oh, here's a general case: changing DOMAINs on the same base type should > only be a check, and changing from a DOMAIN to its own base type should be a > none. DOMAINs and CASTs are still on the todo list IIRC, so I'm not sure the current infrastructure around DOMAINs would be flexible (or complete) enough for the system to determine when the domain A to domain B type change is binary coercible. It has no CAST information to begin with, I guess. As far as reindexing is concerned, talking with RhodiumToad (Andrew Gierth) on IRC gave insights, as usual. Standard PostgreSQL supports two data type change without reindex needs: varchar to text and cidr to inet. In both cases, the types share the indexing infrastructure: same PROCEDUREs are in use in the OPERATORs that the index is using. Could it be that we already have the information we need in order to dynamically decide whether a heap rewrite and a reindex are necessary, even in case of user defined type conversions? Regards, -- dim