Thread: Avoiding rewrite in ALTER TABLE ALTER TYPE

Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In some
cases, we can determine that doing so is unhelpful, and that the conversion
shall always succeed:

CREATE DOMAIN loosedom AS text;
CREATE TABLE t (c varchar(2));
ALTER TABLE t ALTER c TYPE varchar(4);
ALTER TABLE t ALTER c TYPE text;
ALTER TABLE t ALTER c TYPE loosedom;

In other cases, we can determine that the rewrite is unhelpful, but a cast could
still throw an error:

CREATE DOMAIN tightdom AS text CHECK (value LIKE '<%/>');
CREATE TABLE t (c text);
ALTER TABLE t ALTER c TYPE xml USING c::xml;
ALTER TABLE t ALTER c TYPE varchar(64);
ALTER TABLE t ALTER c TYPE tightdom;

I wish to replace table rewrites with table verification scans where possible,
then skip those verification scans where possible.  I've noted three subproblems
that require distinct treatment.  In the following discussion, "tuple" is the
tuple on disk, and "tuple'" is tuple we would form during a rewriting ALTER
TABLE ALTER TYPE.  Likewise "col" and "col'" for individual columns.


1. The table heap rewrite itself

To legitimately leave the table heap intact, we must confirm datumIsEqual(col,
col') for every tuple and column.  Without any specific configuration to suggest
when this might be true, we can always determine its truth through a
_verification scan_ of the table.  If the datumIsEqual test fails for any tuple
during the verification scan, terminate the scan and proceed with a table
rewrite.  When should we attempt the verification scan in the first place, as
opposed to proceeding directly to a rewrite?  For some time, I thought it best
to automatically attempt it when the type change degenerates to a single binary
coercion.  Cases like the text->xml and timestamp->timestamptz [timezone=UTC]
are more difficult to identify mechanically, so I supposed to let the user
identify them via a new keyword to ALTER TABLE ALTER TYPE.  Having thought on it
more, though, it actually seems best to attempt the verification scan *every*
time.  In most ineligible conversions, an inequality will appear very early, so
the scan is effectively O(1) in the negative case.  A notable exception is
something like char(6)->varchar(6) in a table with a billion tuples having
length(col) = 6 and one with length(col) = 5.  The verification scan might read
most of the table before finding the one tuple that forces a rewrite.  That
isn't a particularly regular scenario in my experience, so the "just do the
right thing" aspect of preceding every potential rewrite with a verification
scan seems to win out.

Certain very popular type changes (see introduction) can be _exempt_ from the
verification scan: we can determine that they will always succeed.  To capture
that, I propose extending CREATE CAST with the notion of an exemptor function:

CREATE CAST (source_type AS target_type)   { WITH FUNCTION function_name (argument_type [, ...])         [ WITH
EXEMPTORfunction_name ] |     WITHOUT FUNCTION |     WITH INOUT }   [ AS ASSIGNMENT | AS IMPLICIT ]
 

The exemptor shall have this signature:

exemptor_func(integer, -- source_typmodinteger  -- dest_typmod
) RETURNS boolean

The exemptor shall return true iff datumIsEqual(x, x::target_type(dest_typmod))
for every x in source_type most recently coerced to source_type(source_typmod).
When the cast is WITHOUT FUNCTION (a binary coercion) and target_type lacks a
length coercion cast, the cast has an implicit exemption, and an exemptor is
superfluous: code can assume an exemptor that always returns true.  Use of WITH
EXEMPTOR mainly makes sense alongside multi-arg WITH FUNCTION (should probably
raise an error upon other uses).

When ATPrepAlterColumnType determines that a transformation expression is merely
a series of casts (or perhaps some stronger condition), and every such cast has
an exemptor or implicit exemption, it will call those exemptors with the old and
new typmods.  If the exemptors all return true and this continues to hold for
all other ALTER TYPE subcommands of this ALTER TABLE, we will skip the
verification scan and table rewrite.

I will add exemptors to length coercion casts of types other than "character"
and "bit", which are exempt only in the no-change case.  The candidates that are
not pure length coercions, int8->bit and int4->bit, would not benefit.


2. CHECK constraints, index predicates, and expression indexes

CHECK constraints, index predicates, and indexed expressions can change behavior
without limitation when the types of columns they reference change.  An exempt
typmod change alone will not affect them.  While there are some other safe cases
("CHECK (length(col) = 2", over a varchar->text conversion), I don't have a
general framework in mind for identifying them.  For now, an exempt change that
includes a type change, such as varchar(2)->text, will still require a
verification scan whenever a CHECK constraint, index predicate, or indexed
expression references the column.  The verification scan will confirm that
datumIsEqual(expr(tuple), expr(tuple')) for all tuples and all affected
expressions.  Are there any nice ways to detect a broader selection of the
expressions that do not require verification?


3. reindex when an operator class change crosses operator families

Having avoided a heap rewrite, we may still need to reindex indexes that just
received new operator classes.  Consider a type "reverseint4" that stores just
like an int4, but its default btree operator class sorts opposite int4_ops.
Converting int4->reverseint4 would not touch the table heap, but we would need
to reindex any indexes on the changing column.  I'm not aware of any affected
core data types (albeit not having investigated systematically), but we still
need to get this case right.

Conveniently, operator families cover exactly this notion of operator class
compatibility.  When the old and new operator classes share an operator family,
we can skip the reindex without verification.  This covers the primary case of
varchar/text, plus a bonus of timestamp<->timestamptz and inet<->cidr.  Indexes
on expressions can use the same check; we will have verified the actual bits of
the computed expressions during the verification scan.  Index predicates need no
special handling at this stage.

For changes having no operator family to vindicate them, we can just reindex.
Though we could introduce an index variant of the verification scan, confirming
that v_n::target_type <= v_n+1::target_type for all values, it seems relatively
unimportant: operator families cover the notable use cases nicely.


4. NOTE: other constraint types
UNIQUE, PRIMARY KEY and EXCLUDE constraints remain valid as the indexes
implementing them remain valid.  This proposal does not affect foreign key
constraints.


I see this breaking down into various patches:
1. Add the verification scan to the vicinity of ATRewriteTable and skip the
table rewrite when it finds no changes.  Full reindex remains.
2. In the no-rewrite case, narrow us from reindexing the entire table to merely
reindexing those indexes touched by the operator class changes.
3. Further skip reindex operations when the operator class has changed but the
operator family has not.
4. Modify ATPrepAlterColumnType to detect implicit exemptions, and modify
ATRewriteTable to skip the verification scan in light of those discoveries.
5. Add pg_cast.castexemptor and modify CREATE CAST to populate it.  Define an
exemptor for the varchar length coercion cast.  Modify ATPrepAlterColumnType to
use pg_cast.castexemptor and combine their outcomes with implicit exemptions.
6. Define exemptors for time, timetz, timestamp, timestamptz, and interval.
7. Define exemptor for bit varying.
8. Define exemptor for numeric.


These threads, referenced from the TODO entry concerning a subset of this plan,
were most helpful in revealing the scope of problems to address:
http://archives.postgresql.org/message-id/200903040137.n241bAUV035002@wwwmaster.postgresql.org
http://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php

What have I missed?  I welcome any corrections/unaddressed problems/questions.

Thanks,
nm


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Robert Haas
Date:
On Dec 29, 2010, at 7:56 AM, Noah Misch <noah@leadboat.com> wrote:
> ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In some
> cases, we can determine that doing so is unhelpful, and that the conversion
> shall always succeed:
>
> CREATE DOMAIN loosedom AS text;
> CREATE TABLE t (c varchar(2));
> ALTER TABLE t ALTER c TYPE varchar(4);
> ALTER TABLE t ALTER c TYPE text;
> ALTER TABLE t ALTER c TYPE loosedom;
>
> In other cases, we can determine that the rewrite is unhelpful, but a cast could
> still throw an error:
>
> CREATE DOMAIN tightdom AS text CHECK (value LIKE '<%/>');
> CREATE TABLE t (c text);
> ALTER TABLE t ALTER c TYPE xml USING c::xml;
> ALTER TABLE t ALTER c TYPE varchar(64);
> ALTER TABLE t ALTER c TYPE tightdom;
>
> I wish to replace table rewrites with table verification scans where possible,
> then skip those verification scans where possible.

Seems like a good idea.

> Having thought on it
> more, though, it actually seems best to attempt the verification scan *every*
> time.  In most ineligible conversions, an inequality will appear very early, so
> the scan is effectively O(1) in the negative case.  A notable exception is
> something like char(6)->varchar(6) in a table with a billion tuples having
> length(col) = 6 and one with length(col) = 5.  The verification scan might read
> most of the table before finding the one tuple that forces a rewrite. That
> isn't a particularly regular scenario in my experience, so the "just do the
> right thing" aspect of preceding every potential rewrite with a verification
> scan seems to win out.

I think this scenario will be more common than you might think.  Tables don't contain random data; they contain data
thatthe DBA thinks is valid.  The situation where the data is mostly as you expect but with a few kooky rows is, in my
experience,extremely common.  And it makes the worst case a LOT worse.  I really doubt this is worth the complexity
anyway- converting between two types that are mostly-but-not-quite binary compatible seems like an edge case in every
sense.

> Certain very popular type changes (see introduction) can be _exempt_ from the
> verification scan: we can determine that they will always succeed.  To capture
> that, I propose extending CREATE CAST with the notion of an exemptor function:
>
> CREATE CAST (source_type AS target_type)
>    { WITH FUNCTION function_name (argument_type [, ...])
>          [ WITH EXEMPTOR function_name ] |
>      WITHOUT FUNCTION |
>      WITH INOUT }
>    [ AS ASSIGNMENT | AS IMPLICIT ]
>
> The exemptor shall have this signature:
>
> exemptor_func(
>    integer, -- source_typmod
>    integer  -- dest_typmod
> ) RETURNS boolean
>
> The exemptor shall return true iff datumIsEqual(x, x::target_type(dest_typmod))
> for every x in source_type most recently coerced to source_type(source_typmod).
> When the cast is WITHOUT FUNCTION (a binary coercion) and target_type lacks a
> length coercion cast, the cast has an implicit exemption, and an exemptor is
> superfluous: code can assume an exemptor that always returns true.  Use of WITH
> EXEMPTOR mainly makes sense alongside multi-arg WITH FUNCTION (should probably
> raise an error upon other uses).

I am not sure whether it's worth trying to be general here. Maybe we should just hard-code the known cases involving
coredatatypes. 

>

...Robert

Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In some
> cases, we can determine that doing so is unhelpful, and that the conversion
> shall always succeed:
> I wish to replace table rewrites with table verification scans where possible,
> then skip those verification scans where possible.

This has been discussed before; have you read the previous threads?

I really really dislike the notion of a "verification scan": it's
basically work that is going to be useless if it fails.  I think your
argument that it will usually fail quickly is quite unconvincing, and in
any case the situations where it is useful at all are too thin on the
ground to be worth the code space to implement it.  It seems sufficient
to me to skip the rewrite in cases of provable binary compatibility, with
possibly an extra check for "safe" changes of typmod.  With respect to
the latter, I agree a type-specific function to compare the typmods
would be the way to go, although "exemptor" seems a pretty badly chosen
name for it.
        regards, tom lane


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Dec 29, 2010, at 7:56 AM, Noah Misch <noah@leadboat.com> wrote:
>> The exemptor shall have this signature:
>> 
>> exemptor_func(
>> integer, -- source_typmod
>> integer  -- dest_typmod
>> ) RETURNS boolean
>> 
>> The exemptor shall return true iff datumIsEqual(x, x::target_type(dest_typmod))
>> for every x in source_type most recently coerced to source_type(source_typmod).

> I am not sure whether it's worth trying to be general here. Maybe we should just hard-code the known cases involving
coredatatypes.
 

I find the idea of hard coding to be pretty icky.  However, the elephant
in the room here is the possibility of typmod getting replaced by some
other representation.  It would make life simpler if we didn't invent
this additional type-specific API until that dust has settled.  So maybe
the plan should be hard-coding in the short term and add an API later.
        regards, tom lane


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
David Fetter
Date:
On Wed, Dec 29, 2010 at 11:16:23AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In some
> > cases, we can determine that doing so is unhelpful, and that the conversion
> > shall always succeed:
> > I wish to replace table rewrites with table verification scans where possible,
> > then skip those verification scans where possible.
> 
> This has been discussed before; have you read the previous threads?

It would help a lot if, in future, you include references to the
previous threads you have in mind rather than simply mention that they
exist.

Saying, in effect, "search the archives with our not-super-great
search technology using keywords you didn't think of," comes off as
pretty dismissive if not downright hostile.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Robert Haas
Date:
On Dec 29, 2010, at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I really really dislike the notion of a "verification scan": it's
> basically work that is going to be useless if it fails.

I think it has potential in cases like text to xml.  In that case it'll either work or fail, with no possibility of
requiringa do-over.  Scanning the whole table is a whole lot cheaper than rewriting it.  But I agree with your
assessmentof the "optimistic" verification scan case. 

...Robert

Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Dec 29, 2010, at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I really really dislike the notion of a "verification scan": it's
>> basically work that is going to be useless if it fails.

> I think it has potential in cases like text to xml.  In that case it'll either work or fail, with no possibility of
requiringa do-over.  Scanning the whole table is a whole lot cheaper than rewriting it.
 

I don't believe avoiding the write part (but not the read part, nor the
XML syntax verification part) is a sufficiently compelling argument to
justify having that code path.  There are not enough distinct datatypes
sharing binary representations to make this a worthwhile thing to worry
over.

Basically, I believe that the only use-case that will have more than
epsilon number of users is "I want to make this varchar(5) into
varchar(10), or possibly text".  We can fix that case without adding a
boatload more code that we'll have to maintain.

I do have some interest in the idea of having a type-specific function
that can recognize no-op typmod changes, but I would envision that as
being an expression evaluation optimization: let the planner throw away
the call to the length-checking function when it isn't going to do
anything.  It's not by any means only useful in ALTER COLUMN TYPE ---
and in fact probably doesn't even need any bespoke code there, if we put
it into expression_planner() instead.
        regards, tom lane


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Wed, Dec 29, 2010 at 10:56:39AM -0500, Robert Haas wrote:
> On Dec 29, 2010, at 7:56 AM, Noah Misch <noah@leadboat.com> wrote:
> > Having thought on it
> > more, though, it actually seems best to attempt the verification scan *every*
> > time.  In most ineligible conversions, an inequality will appear very early, so
> > the scan is effectively O(1) in the negative case.  A notable exception is
> > something like char(6)->varchar(6) in a table with a billion tuples having
> > length(col) = 6 and one with length(col) = 5.  The verification scan might read
> > most of the table before finding the one tuple that forces a rewrite. That
> > isn't a particularly regular scenario in my experience, so the "just do the
> > right thing" aspect of preceding every potential rewrite with a verification
> > scan seems to win out.
> 
> I think this scenario will be more common than you might think.  Tables don't contain random data; they contain data
thatthe DBA thinks is valid.  The situation where the data is mostly as you expect but with a few kooky rows is, in my
experience,extremely common.
 

Perhaps.  A few kooky rows is indeed common, but we're talking about a specific
breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE
transformation expression, and 0.1% have different bits.  Is that common?

In case it was not obvious, I'll note that any error thrown by a transformation
expression during the verification scan still aborts the ALTER TABLE.  A
varchar(20)->varchar(10) that finds an 11-char string will fail permanently
during the verification scan.  Indeed, the primary value of the verification
scan is to distinguish positive and error, not positive and negative.

Expanding on my introduction, none of the following can yield a negative
verification scan; the result is always positive or an error:

CREATE DOMAIN loosedom AS text;
CREATE DOMAIN tightdom AS text CHECK (value LIKE '<%/>');
CREATE TABLE t (c varchar(6));
INSERT INTO t VALUES ('<abc/>'),('<de/>');
ALTER TABLE t ALTER c TYPE varchar(8);
ALTER TABLE t ALTER c TYPE text;
ALTER TABLE t ALTER c TYPE loosedom;
ALTER TABLE t ALTER c TYPE xml USING c::xml;
ALTER TABLE t ALTER c TYPE varchar(64);
ALTER TABLE t ALTER c TYPE tightdom;

Adding a bpchar into the mix makes a negative verification scan possible, as
does a USING clause having a truncating effect.  Continuing the example, these
can and would get a negative verification scan:
ALTER TABLE t ALTER c TYPE character(6);
ALTER TABLE t ALTER c TYPE varchar(5) USING c::varchar(5);
Plenty of academic USING clause examples exist:
ALTER TABLE t ALTER c TYPE varchar(8) USING CASE c WHEN '<de/>' THEN 'foo' ELSE c END;

Verification scans for conversions between fundamentally different types will
generally end in the negative at the first tuple.  Consider {timestamp,bigint,
real,numeric,varbit}->text, int->bigint, interval->reltime, etc.  I can't think
of a decent non-academic example where heterogeneous conversions like these will
dupe the verification scan for even a handful of tuples.  Granted, one can flip
that around as an argument for declaring the conversions that have a chance.

> And it makes the worst case a LOT worse.

I suppose the absolute worst case would involve an ALTER TABLE adding brutally
expensive CHECK constraints, such that the cost of computing those constraints
would dominate the cost of both the verification scan and the rewrite, yielding
a 100% slower ALTER TABLE run.  A more realistic bad case might be a table much
larger than memory with no indexes, and the verification scan adds a full seq
scan for nothing.  A crude test here has rewriting such a table taking 7x as
long as a seq scan on it.  By the back of that envelope, we'll take about 15%
more time.  The right mix of expensive expressions will raise that percentage,
and the presence of indexes will drop it.  Remember though, we're still only
talking about the relatively-rare cases that even can get a negative
verification scan.

> I really doubt this is worth the complexity anyway -    

We'd have the verification scan regardless of how we choose when to use it,
because how else would we implement no-rewrite varchar(8)->varchar(4) or
text->xml?  An unconditional verification scan is merely the most trivial
algorithm for deciding when to employ it.  Those conversions are semantically
similar to adding CHECK constraints, and in that sense we already have an
initial verification scan implementation: ATRewriteTable(..., InvalidOid, ...).

> converting between two types that are mostly-but-not-quite binary compatible seems like an edge case in every sense.

Yes.  Indeed, that's the intuitive basis for my hypothesis that the verification
scan will usually either fail early.  I don't advocate this approach to pick up
edge cases, but to pick up reasonable cases _without explicit annotations_
showing them to be achievable.  Take the text->xml example, certainly of genuine
value if not top-frequency.  I see three ways to ensure we do a verification
scan for it:

1. Have the user tell us: ALTER TABLE t ALTER c TYPE xml USING c::xml IMPLICIT
2. Mark the text->xml cast as "possibly no-rewrite" and look for that
3. Do a verification scan every time

(1) gets the job done, but quality of user experience is lacking.  (2) is best
for the user, as long as the annotations are well-maintained, but it adds the
most maintenance burden.  (3) has the lowest maintenance burden and, for common
use cases, the user-experience quality of (2), but it can significantly add to
the ALTER TABLE runtime in rare cases.  One can also do (1) with (2) to provide
an override when the annotations are incomplete.  Ultimately, any of these would
work for my own needs.  What's your preference?  Are there other notable options
that preserve full functionality?

Thanks,
nm


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Wed, Dec 29, 2010 at 11:16:23AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In some
> > cases, we can determine that doing so is unhelpful, and that the conversion
> > shall always succeed:
> > I wish to replace table rewrites with table verification scans where possible,
> > then skip those verification scans where possible.
> 
> This has been discussed before; have you read the previous threads?

I cited two threads I had read on the subject.  Were there other important ones?

> I really really dislike the notion of a "verification scan": it's
> basically work that is going to be useless if it fails.  I think your
> argument that it will usually fail quickly is quite unconvincing, and in
> any case the situations where it is useful at all are too thin on the
> ground to be worth the code space to implement it.  It seems sufficient
> to me to skip the rewrite in cases of provable binary compatibility, with
> possibly an extra check for "safe" changes of typmod.  With respect to
> the latter, I agree a type-specific function to compare the typmods
> would be the way to go, although "exemptor" seems a pretty badly chosen
> name for it.

I have attempted to expand on these problems in my reply to Robert.


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Wed, Dec 29, 2010 at 02:01:28PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Dec 29, 2010, at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I really really dislike the notion of a "verification scan": it's
> >> basically work that is going to be useless if it fails.
> 
> > I think it has potential in cases like text to xml.  In that case it'll either work or fail, with no possibility of
requiringa do-over.  Scanning the whole table is a whole lot cheaper than rewriting it.
 
> 
> I don't believe avoiding the write part (but not the read part, nor the
> XML syntax verification part) is a sufficiently compelling argument to
> justify having that code path.  There are not enough distinct datatypes
> sharing binary representations to make this a worthwhile thing to worry
> over.
> 
> Basically, I believe that the only use-case that will have more than
> epsilon number of users is "I want to make this varchar(5) into
> varchar(10), or possibly text".  We can fix that case without adding a
> boatload more code that we'll have to maintain.

Those are certainly the big ones, but I've also hit these in the field:
SET timezone = 'UTC';
CREATE DOMAIN state AS text CHECK (VALUE ~ '[A-Z]{2}');
CREATE TABLE t (c0 text, c1 text, c2 timestamp);
ALTER TABLE t ALTER c0 TYPE state,               ALTER c1 TYPE varchar(6),               ALTER c2 TYPE timestamptz;

It felt normal, but I very well may have been in that epsilon unawares.  I would
be disappointed to end the project with no way to avoid rewrites for them.  That
being said, there are certainly more ways to achieve all of those than the one I
have proposed.

> I do have some interest in the idea of having a type-specific function
> that can recognize no-op typmod changes, but I would envision that as
> being an expression evaluation optimization: let the planner throw away
> the call to the length-checking function when it isn't going to do
> anything.  It's not by any means only useful in ALTER COLUMN TYPE ---
> and in fact probably doesn't even need any bespoke code there, if we put
> it into expression_planner() instead.

That sounds significantly better.  Ignorant question: how often will
expression_planner have the old typmod available?  Also, would this be worth
attempting unconditionally, or might some callers use the expression too few
times and suffer from the overhead of deciding whether to use it?

Perhaps the order of patches I proposed was faulty and should have placed the
most important use cases first, like this:

1. Modify ATPrepAlterColumnType to detect a bare RelabelType transformation
expression and conditionally skip/downgrade ATRewriteTable.  This would cover
varchar->text and similar domain changes, but a full reindex remains.
2. In the no-rewrite case, modify the vicinity of finish_heap_swap to narrow us
from reindexing the entire table to merely reindexing those indexes touched by
the operator class changes.
3. Further skip reindex operations when the operator class has changed but the
operator family has not.
4. Add pg_cast.castexemptor and modify CREATE CAST to populate it.  Define an
exemptor (or however we name it) for the varchar length coercion cast.  Modify
expression_planner to use it to strip out superfluous length coercion casts.
This would cover varchar(10)->varchar(20) and similar domain changes.
5. Define exemptors for time, timetz, timestamp, timestamptz, and interval.
6. Define exemptor for bit varying.
7. Define exemptor for numeric.
8. (subject to further discussion) Add the verification scan to the vicinity of
ATRewriteTable and skip the table rewrite when it finds no changes.  This covers
varchar(20)->varchar(10) and similar domain changes, timestamp->timestamptz, and
text->xml.

Thanks,
nm


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Robert Haas
Date:
On Wed, Dec 29, 2010 at 6:46 PM, Noah Misch <noah@leadboat.com> wrote:
>> I think this scenario will be more common than you might think.  Tables don't contain random data; they contain data
thatthe DBA thinks is valid.  The situation where the data is mostly as you expect but with a few kooky rows is, in my
experience,extremely common. 
>
> Perhaps.  A few kooky rows is indeed common, but we're talking about a specific
> breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE
> transformation expression, and 0.1% have different bits.  Is that common?

I think it's common enough to be worth worrying about.

> Expanding on my introduction, none of the following can yield a negative
> verification scan; the result is always positive or an error:
>
> CREATE DOMAIN loosedom AS text;
> CREATE DOMAIN tightdom AS text CHECK (value LIKE '<%/>');
> CREATE TABLE t (c varchar(6));
> INSERT INTO t VALUES ('<abc/>'),('<de/>');
> ALTER TABLE t ALTER c TYPE varchar(8);
> ALTER TABLE t ALTER c TYPE text;
> ALTER TABLE t ALTER c TYPE loosedom;
> ALTER TABLE t ALTER c TYPE xml USING c::xml;
> ALTER TABLE t ALTER c TYPE varchar(64);
> ALTER TABLE t ALTER c TYPE tightdom;

+1 for trying to optimize these cases (but maybe after we optimize the
varchar -> text and varchar(less) -> varchar(more) cases to skip the
scan altogether).

> Adding a bpchar into the mix makes a negative verification scan possible, as
> does a USING clause having a truncating effect.  Continuing the example, these
> can and would get a negative verification scan:
> ALTER TABLE t ALTER c TYPE character(6);
> ALTER TABLE> In case it was not obvious, I'll note that any error thrown by a transformationt ALTER c TYPE varchar(5)
USINGc::varchar(5); 
> Plenty of academic USING clause examples exist:
> ALTER TABLE t ALTER c TYPE varchar(8) USING CASE c WHEN '<de/>' THEN 'foo' ELSE c END;

I am not really convinced that there's much value in optimizing these
cases.  They're not likely to arise very often in practice, and DBAs
like predictability.  There's tangible value in being able to say
"this is going to scan your table at most once - it might rewrite it,
or it might just verify what's there, or it might decide no scan is
necessary, but the absolute worst case is one scan with rewrite".
That's simple to understand and simple to document and probably
simpler to code too, and I think it covers very nearly all of the
cases people are likely to care about in practice.

> Yes.  Indeed, that's the intuitive basis for my hypothesis that the verification
> scan will usually either fail early.  I don't advocate this approach to pick up
> edge cases, but to pick up reasonable cases _without explicit annotations_
> showing them to be achievable.  Take the text->xml example, certainly of genuine
> value if not top-frequency.  I see three ways to ensure we do a verification
> scan for it:
>
> 1. Have the user tell us: ALTER TABLE t ALTER c TYPE xml USING c::xml IMPLICIT
> 2. Mark the text->xml cast as "possibly no-rewrite" and look for that
> 3. Do a verification scan every time

I think for any pair of types (T1, T2) we should first determine
whether we can skip the scan altogether.  If yes, we're done.  If no,
then we should have a way of determining whether a verify-only scan is
guaranteed to be sufficient (in your terminology, the verification
scan is guaranteed to return either positive or error, not negative).
If yes, then we do a verification scan.  If no, we do a rewrite.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
> On Wed, Dec 29, 2010 at 6:46 PM, Noah Misch <noah@leadboat.com> wrote:
> > Perhaps. ?A few kooky rows is indeed common, but we're talking about a specific
> > breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE
> > transformation expression, and 0.1% have different bits. ?Is that common?
> 
> I think it's common enough to be worth worrying about.

Okay.  Could you give an example of a specific ALTER TABLE recipe worth worrying
about and subject to degradation under my proposal?

> > Adding a bpchar into the mix makes a negative verification scan possible, as
> > does a USING clause having a truncating effect. ?Continuing the example, these
> > can and would get a negative verification scan:
> > ALTER TABLE t ALTER c TYPE character(6);
> > ALTER TABLE> In case it was not obvious, I'll note that any error thrown by a transformation
>  t ALTER c TYPE varchar(5) USING c::varchar(5);
> > Plenty of academic USING clause examples exist:
> > ALTER TABLE t ALTER c TYPE varchar(8) USING CASE c WHEN '<de/>' THEN 'foo' ELSE c END;
> 
> I am not really convinced that there's much value in optimizing these
> cases.  They're not likely to arise very often in practice, 

Just to make sure we're clear: those were examples of what I had intended to
pessimize for the sake of simplicity.

> and DBAs
> like predictability.  There's tangible value in being able to say
> "this is going to scan your table at most once - it might rewrite it,
> or it might just verify what's there, or it might decide no scan is
> necessary, but the absolute worst case is one scan with rewrite".
> That's simple to understand and simple to document and probably
> simpler to code too, and I think it covers very nearly all of the
> cases people are likely to care about in practice.


> > 1. Have the user tell us: ALTER TABLE t ALTER c TYPE xml USING c::xml IMPLICIT
> > 2. Mark the text->xml cast as "possibly no-rewrite" and look for that
> > 3. Do a verification scan every time
> 
> I think for any pair of types (T1, T2) we should first determine
> whether we can skip the scan altogether.  If yes, we're done.  If no,
> then we should have a way of determining whether a verify-only scan is
> guaranteed to be sufficient (in your terminology, the verification
> scan is guaranteed to return either positive or error, not negative).
> If yes, then we do a verification scan.  If no, we do a rewrite.

How would we answer the second question in general?

Thanks,
nm


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Robert Haas
Date:
On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch <noah@leadboat.com> wrote:
> On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
>> On Wed, Dec 29, 2010 at 6:46 PM, Noah Misch <noah@leadboat.com> wrote:
>> > Perhaps. ?A few kooky rows is indeed common, but we're talking about a specific
>> > breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE
>> > transformation expression, and 0.1% have different bits. ?Is that common?
>>
>> I think it's common enough to be worth worrying about.
>
> Okay.  Could you give an example of a specific ALTER TABLE recipe worth worrying
> about and subject to degradation under my proposal?

Any of the ones you listed in your second set of examples, e.g.:

ALTER TABLE t ALTER c TYPE character(6);

Under your proposal, this can scan the whole table once in read-only
mode, and then realize that it needs to go back and rewrite the whole
table.

>> I think for any pair of types (T1, T2) we should first determine
>> whether we can skip the scan altogether.  If yes, we're done.  If no,
>> then we should have a way of determining whether a verify-only scan is
>> guaranteed to be sufficient (in your terminology, the verification
>> scan is guaranteed to return either positive or error, not negative).
>> If yes, then we do a verification scan.  If no, we do a rewrite.
>
> How would we answer the second question in general?

I am not sure - I guess we'd need to design some sort of mechanism for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Jim Nasby
Date:
On Dec 29, 2010, at 10:14 PM, Robert Haas wrote:
> +1 for trying to optimize these cases (but maybe after we optimize the
> varchar -> text and varchar(less) -> varchar(more) cases to skip the
> scan altogether).

+1 on getting the obvious cases of varchar and numeric done first; we run into those a lot at work and would be willing
tosponsor work on a patch that makes those operations as fast as just adding a new column. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Thu, Dec 30, 2010 at 12:57:45AM -0500, Robert Haas wrote:
> On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
> >> I think for any pair of types (T1, T2) we should first determine
> >> whether we can skip the scan altogether. ?If yes, we're done. ?If no,
> >> then we should have a way of determining whether a verify-only scan is
> >> guaranteed to be sufficient (in your terminology, the verification
> >> scan is guaranteed to return either positive or error, not negative).
> >> If yes, then we do a verification scan. ?If no, we do a rewrite.
> >
> > How would we answer the second question in general?
> 
> I am not sure - I guess we'd need to design some sort of mechanism for that.

Okay, here goes.  Given a Var "varexpr" representing the column we're changing
and an expression tree "expr" we need to answer two questions (argument lists
simplified -- assume the same RTEs in all cases):

always-noop: "Will datumIsEquals(ExecEvalExpr(varexpr), ExecEvalExpr(expr))
return true or yield an error for all possible tuples?"
never-error: "Will ExecEvalExpr(expr) never throw an error?"

Currently we're only interested in the second question when the first is also
true; I'm not sure if there's something fundamental there, or just an artifact
of current needs.  To support answering these questions, extend the CREATE CAST
changes from my earlier proposal, modifying the exemptor signature to return an
int, a bitmask containing one bit for each of these two questions.  Call the
function in find_typmod_coercion_function.  If its return value answers "yes" to
both questions, return COERCION_PATH_NONE, resulting in omission of the length
coercion node.  For other verdicts, generate the FuncExpr as normal and insert
the verdict in a new FuncExpr field "funcexempt".  (That need not increase the
size of FuncExpr, if that's a concern.)

ATPrepAlterColumnType, having generated its transformation expression, will call
a new function that recursively walks the tree to answer the two questions.  The
walker will apply these rules:

1. For a Var with the varno/varattno in question, intrinsically "yes" to both.
2. A RelabelType node inherits the answers of its sole argument.
3. A CoerceToDomain node inherits the always-noop answer of its sole argument.
When GetDomainConstraints() == NIL, it also inherits the never-error answer.
Otherwise, never-error becomes "no".
4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field
and the answers from its first argument.
5. Any other node answers "no" to both questions.

If the transformation expression root has "yes" to both questions, we're done
with no scan.  If only always-noop is true, we do a verification scan only.
Otherwise, we optimize nothing and do a rewrite.

Thoughts?

Thanks,
nm


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Robert Haas
Date:
On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch <noah@leadboat.com> wrote:
> 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field
> and the answers from its first argument.

Why its first argument?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Fri, Dec 31, 2010 at 12:34:50AM -0500, Robert Haas wrote:
> On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch <noah@leadboat.com> wrote:
> > 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field
> > and the answers from its first argument.
> 
> Why its first argument?

funcexempt would only be nonzero for FuncExpr of length coercion casts.  Those
have the subject datum as a first argument, typmod as second, and is-explicit
boolean as third.  The other arguments are effectively already validated.

That brings up a point -- the exemptor function also needs an is-explicit
argument, as that affects the decision for some types.


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Thu, Dec 30, 2010 at 08:35:34PM -0500, Noah Misch wrote:
> On Thu, Dec 30, 2010 at 12:57:45AM -0500, Robert Haas wrote:
> > On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch <noah@leadboat.com> wrote:
> > > On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
> > >> I think for any pair of types (T1, T2) we should first determine
> > >> whether we can skip the scan altogether. ?If yes, we're done. ?If no,
> > >> then we should have a way of determining whether a verify-only scan is
> > >> guaranteed to be sufficient (in your terminology, the verification
> > >> scan is guaranteed to return either positive or error, not negative).
> > >> If yes, then we do a verification scan. ?If no, we do a rewrite.
> > >
> > > How would we answer the second question in general?
> > 
> > I am not sure - I guess we'd need to design some sort of mechanism for that.
> 
> Okay, here goes.  Given...<snip>

That seems to be working decently.  However, It turns out that changes like
text->varchar(8) and varchar(8)->varchar(4) don't fall into either of those
optimization categories.  An implicit varchar length coercion will truncate
trailing blanks to make the string fit, so this legitimately requires a rewrite:

CREATE TEMP TABLE t (c) AS SELECT 'foo  '::text;
SELECT c || '<-' FROM t;
ALTER TABLE t ALTER c TYPE varchar(4);
SELECT c || '<-' FROM t;

In light of that, I'm increasingly thinking we'll want a way for the user to
request a scan in place of a rewrite.  The scan would throw an error if a
rewrite ends up being necessary.  Adding a keyword for that purpose, the syntax
would resemble:

ALTER TABLE <name> ALTER [COLUMN] <colname> [SET DATA] TYPE <typename>[IMPLICIT] [ USING <expression> ]

I had wished to avoid this as something of a UI wart, but I don't see a way to
cover all important conversions automatically and with a single-pass guarantee.
This would cover the rest.

Thoughts?


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Robert Haas
Date:
On Wed, Jan 5, 2011 at 11:26 PM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Dec 30, 2010 at 08:35:34PM -0500, Noah Misch wrote:
>> On Thu, Dec 30, 2010 at 12:57:45AM -0500, Robert Haas wrote:
>> > On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch <noah@leadboat.com> wrote:
>> > > On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
>> > >> I think for any pair of types (T1, T2) we should first determine
>> > >> whether we can skip the scan altogether. ?If yes, we're done. ?If no,
>> > >> then we should have a way of determining whether a verify-only scan is
>> > >> guaranteed to be sufficient (in your terminology, the verification
>> > >> scan is guaranteed to return either positive or error, not negative).
>> > >> If yes, then we do a verification scan. ?If no, we do a rewrite.
>> > >
>> > > How would we answer the second question in general?
>> >
>> > I am not sure - I guess we'd need to design some sort of mechanism for that.
>>
>> Okay, here goes.  Given...<snip>
>
> That seems to be working decently.  However, It turns out that changes like
> text->varchar(8) and varchar(8)->varchar(4) don't fall into either of those
> optimization categories.  An implicit varchar length coercion will truncate
> trailing blanks to make the string fit, so this legitimately requires a rewrite:
>
> CREATE TEMP TABLE t (c) AS SELECT 'foo  '::text;
> SELECT c || '<-' FROM t;
> ALTER TABLE t ALTER c TYPE varchar(4);
> SELECT c || '<-' FROM t;
>
> In light of that, I'm increasingly thinking we'll want a way for the user to
> request a scan in place of a rewrite.  The scan would throw an error if a
> rewrite ends up being necessary.  Adding a keyword for that purpose, the syntax
> would resemble:
>
> ALTER TABLE <name> ALTER [COLUMN] <colname> [SET DATA] TYPE <typename>
>        [IMPLICIT] [ USING <expression> ]
>
> I had wished to avoid this as something of a UI wart, but I don't see a way to
> cover all important conversions automatically and with a single-pass guarantee.
> This would cover the rest.
>
> Thoughts?

I still think you're better off focusing first on the case where we
can skip the whole nine yards, and doing this stuff as a follow-on
patch.  Trying to do too many things, especially possibly
controversial stuff, especially in the last CommitFest, often ends up
with the whole patch getting rejected, which makes no one happy.
Submitting the smallest useful, self-contained change you can and then
work up from there.  Or at least split out the patch into parts that
can be applied independently, so that if the eventual committer likes
A but not B you at least get A in.

Don't take any of this as a rejection of any of what you're proposing;
I haven't really made up my mind yet, and there are plenty of other
people who would have a say even if I had.  Rather, I'd like to
maximize the chances of us at least part of this work committed to
9.1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From
Noah Misch
Date:
On Thu, Jan 06, 2011 at 12:24:19AM -0500, Robert Haas wrote:
> I still think you're better off focusing first on the case where we
> can skip the whole nine yards, and doing this stuff as a follow-on
> patch.  Trying to do too many things, especially possibly
> controversial stuff, especially in the last CommitFest, often ends up
> with the whole patch getting rejected, which makes no one happy.
> Submitting the smallest useful, self-contained change you can and then
> work up from there.  Or at least split out the patch into parts that
> can be applied independently, so that if the eventual committer likes
> A but not B you at least get A in.

Will do.  Thanks.