Thread: Cases where alter table set type varchar(longer length) still needstable rewrite

Cases where alter table set type varchar(longer length) still needstable rewrite

From
Jeremy Finzel
Date:
Good morning!

We are a little bit puzzled because running the following command on a 9.6 cluster is apparently requiring a table rewrite, or at least a very long operation of some kind, even though the docs say that as of 9.2:

  • Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the intervaltimestamp, and timestamptz types.

I have a table foo with 100 million rows, and a column:
  • id character varying(20)
The following command is the one that we expect to execute very quickly (we are not seeing any locking), but it is instead taking a very long time:
  • ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);
I attempted the same operation instead cast to text - same problem.

Why do the docs seem wrong in our case?  I have a guess: if the table was created prior to version 9.2, perhaps they are not binary coercible to text after 9.2?  In any case, I would be very grateful for an explanation!


Thank you!
Jeremy
Jeremy Finzel <finzelj@gmail.com> writes:
> I have a table foo with 100 million rows, and a column:
>    - id character varying(20)
> The following command is the one that we expect to execute very quickly (we
> are not seeing any locking), but it is instead taking a very long time:
>    - ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);

Hm, the code is supposed to avoid a table rewrite, but I wonder if
there's something else that's not being avoided, such as an index
rebuild or foreign-key verification.  Could we see the whole table
definition, eg from psql \d+ ?

            regards, tom lane



On Mon, Feb 17, 2020 at 8:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Finzel <finzelj@gmail.com> writes:
> I have a table foo with 100 million rows, and a column:
>    - id character varying(20)
> The following command is the one that we expect to execute very quickly (we
> are not seeing any locking), but it is instead taking a very long time:
>    - ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);

Hm, the code is supposed to avoid a table rewrite, but I wonder if
there's something else that's not being avoided, such as an index
rebuild or foreign-key verification.  Could we see the whole table
definition, eg from psql \d+ ?

                        regards, tom lane

Based on your feedback, I quickly identified that indeed, the following index is causing the re-type to be slow:

"id_idx" btree ("substring"(id::text, 4, 7))

I'm still not sure why a rebuild of this index would be required, technically speaking.  But perhaps in any case the docs should have something to the effect that expression indexes may require rebuild under specific circumstances?

Thanks!
Jeremy
On 2/17/20 9:01 AM, Jeremy Finzel wrote:
On Mon, Feb 17, 2020 at 8:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Finzel <finzelj@gmail.com> writes:
> I have a table foo with 100 million rows, and a column:
>    - id character varying(20)
> The following command is the one that we expect to execute very quickly (we
> are not seeing any locking), but it is instead taking a very long time:
>    - ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);

Hm, the code is supposed to avoid a table rewrite, but I wonder if
there's something else that's not being avoided, such as an index
rebuild or foreign-key verification.  Could we see the whole table
definition, eg from psql \d+ ?

                        regards, tom lane

Based on your feedback, I quickly identified that indeed, the following index is causing the re-type to be slow:

"id_idx" btree ("substring"(id::text, 4, 7))

I'm still not sure why a rebuild of this index would be required, technically speaking.  But perhaps in any case the docs should have something to the effect that expression indexes may require rebuild under specific circumstances?

How much faster would "it" be if you dropped the index, ran ALTER and rebuilt the index?  Or is it too late?

--
Angular momentum makes the world go 'round.
On 2/17/20 7:01 AM, Jeremy Finzel wrote:
> On Mon, Feb 17, 2020 at 8:21 AM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>     Jeremy Finzel <finzelj@gmail.com <mailto:finzelj@gmail.com>> writes:
>      > I have a table foo with 100 million rows, and a column:
>      >    - id character varying(20)
>      > The following command is the one that we expect to execute very
>     quickly (we
>      > are not seeing any locking), but it is instead taking a very long
>     time:
>      >    - ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);
> 
>     Hm, the code is supposed to avoid a table rewrite, but I wonder if
>     there's something else that's not being avoided, such as an index
>     rebuild or foreign-key verification.  Could we see the whole table
>     definition, eg from psql \d+ ?
> 
>                              regards, tom lane
> 
> 
> Based on your feedback, I quickly identified that indeed, the following 
> index is causing the re-type to be slow:
> 
> "id_idx" btree ("substring"(id::text, 4, 7))
> 
> I'm still not sure why a rebuild of this index would be required, 
> technically speaking.  But perhaps in any case the docs should have 
> something to the effect that expression indexes may require rebuild 
> under specific circumstances?

How about?:

https://www.postgresql.org/docs/9.6/sql-altertable.html

"Adding a column with a DEFAULT clause or changing the type of an 
existing column will require the entire table and its indexes to be 
rewritten. As an exception when changing the type of an existing column, 
if the USING clause does not change the column contents and the old type 
is either binary coercible to the new type or an unconstrained domain 
over the new type, a table rewrite is not needed; but any indexes on the 
affected columns must still be rebuilt. Adding or removing a system oid 
column also requires rewriting the entire table. Table and/or index 
rebuilds may take a significant amount of time for a large table; and 
will temporarily require as much as double the disk space."

> 
> Thanks!
> Jeremy


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 2/17/20 7:01 AM, Jeremy Finzel wrote:
>> I'm still not sure why a rebuild of this index would be required, 
>> technically speaking.  But perhaps in any case the docs should have 
>> something to the effect that expression indexes may require rebuild 
>> under specific circumstances?

> [ it is already ]

Yeah.  In principle, we wouldn't need to rebuild the indexes in this
case, since there's no semantic difference between a value sourced
from a varchar(N) column and a varchar(some-other-N) column.  In general,
though, ALTER COLUMN TYPE doesn't know whether that's true; and there
are definitely binary-compatible cases where it *does* matter.
(An example is that coercing integer to OID is allowed without a
table rewrite, but an index rebuild is needed because the sort
order is different.)  So right now, any index mentioning the altered
column has to be rebuilt.

Maybe someday we'll figure out how to do better.  I'm kind of wondering
whether it wouldn't be safe to assume that changes that only change the
typmod and not the type OID don't require index rebuilds.

            regards, tom lane





On Mon, Feb 17, 2020 at 10:46 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

How about?:

https://www.postgresql.org/docs/9.6/sql-altertable.html

"Adding a column with a DEFAULT clause or changing the type of an
existing column will require the entire table and its indexes to be
rewritten. As an exception when changing the type of an existing column,
if the USING clause does not change the column contents and the old type
is either binary coercible to the new type or an unconstrained domain
over the new type, a table rewrite is not needed; but any indexes on the
affected columns must still be rebuilt. Adding or removing a system oid
column also requires rewriting the entire table. Table and/or index
rebuilds may take a significant amount of time for a large table; and
will temporarily require as much as double the disk space."

>
> Thanks!
> Jeremy


--
Adrian Klaver
adrian.klaver@aklaver.com

You mean the part "any indexes on the affected columns must still be rebuilt"?  Yes, I guess that is pretty clear.  Thanks,

Jeremy