Thread: Which SET TYPE don't actually require a rewrite

Which SET TYPE don't actually require a rewrite

From
Magnus Hagander
Date:
Our Fine Manual (TM) specifies:
"As an exception, when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt."

First of all, how is a non-internals-expert even supposed to know what a binary coercible type is? That's not a very user-friendly way to say it.

Second, how is even an expert supposed to find the list? :)

For example, we can query pg_cast for casts that are binary coercible, that's a start, but it doesn't really tell us the answer.

We can also for example increase the precision of numeric without a rewrite (but not scale). Or we can change between text and varchar. And we can increase the length of a varchar but not decrease it.

Surely we can do better than this when it comes to documenting it? Even if it's a pluggable thing so it may or may not be true of external datatypes installed later, we should be able to at least be more clear about the builtin types, I think?

--

Re: Which SET TYPE don't actually require a rewrite

From
Amit Kapila
Date:
On Wed, Jul 15, 2020 at 6:25 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> Our Fine Manual (TM) specifies:
> "As an exception, when changing the type of an existing column, if the USING clause does not change the column
contentsand the old type is either binary coercible to the new type or an unconstrained domain over the new type, a
tablerewrite is not needed; but any indexes on the affected columns must still be rebuilt." 
>
> First of all, how is a non-internals-expert even supposed to know what a binary coercible type is? That's not a very
user-friendlyway to say it. 
>
> Second, how is even an expert supposed to find the list? :)
>
> For example, we can query pg_cast for casts that are binary coercible, that's a start, but it doesn't really tell us
theanswer. 
>
> We can also for example increase the precision of numeric without a rewrite (but not scale). Or we can change between
textand varchar. And we can increase the length of a varchar but not decrease it. 
>
> Surely we can do better than this when it comes to documenting it? Even if it's a pluggable thing so it may or may
notbe true of external datatypes installed later, we should be able to at least be more clear about the builtin types,
Ithink? 
>

+1 for providing more information in the documentation.  One way could
be that we give some examples of how a user can check whether types
are binary coercible or not and then also specify clearly in which
other cases the rewrite can happen.  Similarly, it seems the
information when the rewrite can happen for "SET (storage_parameter
...)" (doc says: "depending on the parameter you might need to rewrite
the table to get the desired effects") is thin.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Which SET TYPE don't actually require a rewrite

From
Noah Misch
Date:
On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote:
> Our Fine Manual (TM) specifies:
> "As an exception, when changing the type of an existing column, if the
> USING clause does not change the column contents and the old type is either
> binary coercible to the new type or an unconstrained domain over the new
> type, a table rewrite is not needed; but any indexes on the affected
> columns must still be rebuilt."
> 
> First of all, how is a non-internals-expert even supposed to know what a
> binary coercible type is?

The manual defines it at <firstterm>binary coercible</firstterm>.

> We can also for example increase the precision of numeric without a rewrite
> (but not scale). Or we can change between text and varchar. And we can
> increase the length of a varchar but not decrease it.
> 
> Surely we can do better than this when it comes to documenting it? Even if
> it's a pluggable thing so it may or may not be true of external
> datatypes installed later, we should be able to at least be more clear
> about the builtin types, I think?

I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of
query optimizer logic.  The manual brings up only a minority of planner
optimizations, and comprehensive lists of optimization preconditions are even
rarer.  But I don't mind if $SUBJECT documentation departs from that norm.



Re: Which SET TYPE don't actually require a rewrite

From
Magnus Hagander
Date:


On Fri, Jul 17, 2020 at 5:40 AM Noah Misch <noah@leadboat.com> wrote:
On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote:
> Our Fine Manual (TM) specifies:
> "As an exception, when changing the type of an existing column, if the
> USING clause does not change the column contents and the old type is either
> binary coercible to the new type or an unconstrained domain over the new
> type, a table rewrite is not needed; but any indexes on the affected
> columns must still be rebuilt."
>
> First of all, how is a non-internals-expert even supposed to know what a
> binary coercible type is?

The manual defines it at <firstterm>binary coercible</firstterm>.

The only way to actually realize that this is a <firstterm> is to look at the source code though, right? It's definitely not clear that one should go look at the CREATE CAST documentation to find the definition -- certainly not from the ALTER TABLE documentation, which I would argue is the place where most people would go.

And while having the definition there is nice, it doesn't help an end user in any way at all to determine if their ALTER TABLE statement is going to be "safe from rewrites" or not. It (hopefully) helps someone who knows some things about the database internals, which is of course a valuable thing as well, but not the end user.


> We can also for example increase the precision of numeric without a rewrite
> (but not scale). Or we can change between text and varchar. And we can
> increase the length of a varchar but not decrease it.
>
> Surely we can do better than this when it comes to documenting it? Even if
> it's a pluggable thing so it may or may not be true of external
> datatypes installed later, we should be able to at least be more clear
> about the builtin types, I think?

I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of
query optimizer logic.  The manual brings up only a minority of planner
optimizations, and comprehensive lists of optimization preconditions are even
rarer.  But I don't mind if $SUBJECT documentation departs from that norm.

I can see the argument being made for that, and certainly having been made for it in the future. But I'd say given the very bad consequences of getting it wrong, it's far from minor. And given the number of times I've had to answer the question "can I make this change safely" (which usually amounts to me trying it out to see what happens, if I hadn't done that exact one many times before) indicates the need for a more detailed documentation on it.

As Amit mentions it is also triggered by some store parameter changes. But not all. So looking at it the other way, the part that the end user really cares about it "which ALTER TABLE operations will rewrite the table and which will not". Maybe what we need is a section specifically on this that summarizes all the different ways that it can happen.

--

Re: Which SET TYPE don't actually require a rewrite

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> As Amit mentions it is also triggered by some store parameter changes. But
> not all. So looking at it the other way, the part that the end user really
> cares about it "which ALTER TABLE operations will rewrite the table and
> which will not". Maybe what we need is a section specifically on this that
> summarizes all the different ways that it can happen.

No, what we need is EXPLAIN for DDL ;-).  Trying to keep such
documentation in sync with the actual code behavior would be impossible.
(For one thing, some aspects can be affected by extension datatype
behaviors.)

            regards, tom lane



Re: Which SET TYPE don't actually require a rewrite

From
Noah Misch
Date:
On Fri, Jul 17, 2020 at 04:08:36PM +0200, Magnus Hagander wrote:
> On Fri, Jul 17, 2020 at 5:40 AM Noah Misch <noah@leadboat.com> wrote:
> > On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote:
> > > Our Fine Manual (TM) specifies:
> > > "As an exception, when changing the type of an existing column, if the
> > > USING clause does not change the column contents and the old type is
> > either
> > > binary coercible to the new type or an unconstrained domain over the new
> > > type, a table rewrite is not needed; but any indexes on the affected
> > > columns must still be rebuilt."
> > >
> > > First of all, how is a non-internals-expert even supposed to know what a
> > > binary coercible type is?
> >
> > The manual defines it at <firstterm>binary coercible</firstterm>.
> 
> The only way to actually realize that this is a <firstterm> is to look at
> the source code though, right?

I see italic typeface for <firstterm>.  This one deserves an <indexterm>, too.
(I bet many other <firstterm> uses deserve an <indexterm>.)

> It's definitely not clear that one should go
> look at the CREATE CAST documentation to find the definition -- certainly
> not from the ALTER TABLE documentation, which I would argue is the place
> where most people would go.

Agreed.

> > We can also for example increase the precision of numeric without a
> > rewrite
> > > (but not scale). Or we can change between text and varchar. And we can
> > > increase the length of a varchar but not decrease it.
> > >
> > > Surely we can do better than this when it comes to documenting it? Even
> > if
> > > it's a pluggable thing so it may or may not be true of external
> > > datatypes installed later, we should be able to at least be more clear
> > > about the builtin types, I think?
> >
> > I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of
> > query optimizer logic.  The manual brings up only a minority of planner
> > optimizations, and comprehensive lists of optimization preconditions are
> > even
> > rarer.  But I don't mind if $SUBJECT documentation departs from that norm.
> 
> I can see the argument being made for that, and certainly having been made
> for it in the future. But I'd say given the very bad consequences of
> getting it wrong, it's far from minor. And given the number of times I've
> had to answer the question "can I make this change safely" (which usually
> amounts to me trying it out to see what happens, if I hadn't done that
> exact one many times before) indicates the need for a more detailed
> documentation on it.

Such a doc addition is fine with me.  I agree with Tom that it will be prone
to staleness, but I don't conclude that the potential for staleness reduces
its net value below zero.  Having said that, if the consequences of doc
staleness are "very bad", you may consider documenting the debug1 user
interface (https://postgr.es/m/20121202020736.GD13163@tornado.leadboat.com)
instead of documenting the exact rules.  Either way is fine with me.



Re: Which SET TYPE don't actually require a rewrite

From
Bruce Momjian
Date:
On Fri, Jul 17, 2020 at 11:26:56AM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > As Amit mentions it is also triggered by some store parameter changes. But
> > not all. So looking at it the other way, the part that the end user really
> > cares about it "which ALTER TABLE operations will rewrite the table and
> > which will not". Maybe what we need is a section specifically on this that
> > summarizes all the different ways that it can happen.
> 
> No, what we need is EXPLAIN for DDL ;-).  Trying to keep such
> documentation in sync with the actual code behavior would be impossible.
> (For one thing, some aspects can be affected by extension datatype
> behaviors.)

I know Tom put a wink on that, but I actually do feel that the only
clean way to do this is to give users a way to issue the query in a
non-executing way that will report if a rewrite is going to happen.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Which SET TYPE don't actually require a rewrite

From
Michael Paquier
Date:
On Tue, Jul 21, 2020 at 04:55:37PM -0400, Bruce Momjian wrote:
> I know Tom put a wink on that, but I actually do feel that the only
> clean way to do this is to give users a way to issue the query in a
> non-executing way that will report if a rewrite is going to happen.

Yeah, when doing a schema upgrade for an application, that's the usual
performance pin-point and people used to other things than Postgres
write their queries without being aware of that.  We have something
able to track that with the event trigger table_rewrite, but there is
no easy option to store the event and bypass its execution.  I think
that using a plpgsql function wrapping an ALTER TABLE query with an
exception block for an error generated by an event trigger if seeing
table_rewrite allows to do that, though.
--
Michael

Attachment

Re: Which SET TYPE don't actually require a rewrite

From
Magnus Hagander
Date:


On Sat, Jul 18, 2020 at 4:57 AM Noah Misch <noah@leadboat.com> wrote:
On Fri, Jul 17, 2020 at 04:08:36PM +0200, Magnus Hagander wrote:
> On Fri, Jul 17, 2020 at 5:40 AM Noah Misch <noah@leadboat.com> wrote:
> > On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote:
> > > Our Fine Manual (TM) specifies:
> > > "As an exception, when changing the type of an existing column, if the
> > > USING clause does not change the column contents and the old type is
> > either
> > > binary coercible to the new type or an unconstrained domain over the new
> > > type, a table rewrite is not needed; but any indexes on the affected
> > > columns must still be rebuilt."
> > >
> > > First of all, how is a non-internals-expert even supposed to know what a
> > > binary coercible type is?
> >
> > The manual defines it at <firstterm>binary coercible</firstterm>.
>
> The only way to actually realize that this is a <firstterm> is to look at
> the source code though, right?

I see italic typeface for <firstterm>.  This one deserves an <indexterm>, too.
(I bet many other <firstterm> uses deserve an <indexterm>.)

> It's definitely not clear that one should go
> look at the CREATE CAST documentation to find the definition -- certainly
> not from the ALTER TABLE documentation, which I would argue is the place
> where most people would go.

Agreed.

> > We can also for example increase the precision of numeric without a
> > rewrite
> > > (but not scale). Or we can change between text and varchar. And we can
> > > increase the length of a varchar but not decrease it.
> > >
> > > Surely we can do better than this when it comes to documenting it? Even
> > if
> > > it's a pluggable thing so it may or may not be true of external
> > > datatypes installed later, we should be able to at least be more clear
> > > about the builtin types, I think?
> >
> > I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of
> > query optimizer logic.  The manual brings up only a minority of planner
> > optimizations, and comprehensive lists of optimization preconditions are
> > even
> > rarer.  But I don't mind if $SUBJECT documentation departs from that norm.
>
> I can see the argument being made for that, and certainly having been made
> for it in the future. But I'd say given the very bad consequences of
> getting it wrong, it's far from minor. And given the number of times I've
> had to answer the question "can I make this change safely" (which usually
> amounts to me trying it out to see what happens, if I hadn't done that
> exact one many times before) indicates the need for a more detailed
> documentation on it.

Such a doc addition is fine with me.  I agree with Tom that it will be prone
to staleness, but I don't conclude that the potential for staleness reduces
its net value below zero.  Having said that, if the consequences of doc
staleness are "very bad", you may consider documenting the debug1 user
interface (https://postgr.es/m/20121202020736.GD13163@tornado.leadboat.com)
instead of documenting the exact rules.  Either way is fine with me.

The DEBUG1 method is only after the fact though, isn't it?

That makes it pretty hard for someone to say review a migration script and see "this is going to cause problems". And if it's going to be run in an env, I personally find it more useful to just stick an event trigger in there per our documentation and block it (though it might be a good idea to link to that from the alter table reference page, and not just have it under event trigger examples).

I agree that documenting the rules would definitely be prone to staleness, and that having EXPLAIN for DDL would be the *better* solution. But also that having the docs, even if they go a bit stale, would be better than the scenario today.

Unfortunately, I'm not sure I know enough of the details of what the rules actually *are* to explain them in a way that's easy enough to go in the docs...

--

Re: Which SET TYPE don't actually require a rewrite

From
Noah Misch
Date:
On Wed, Aug 05, 2020 at 02:52:42PM +0200, Magnus Hagander wrote:
> On Sat, Jul 18, 2020 at 4:57 AM Noah Misch <noah@leadboat.com> wrote:
> > Such a doc addition is fine with me.  I agree with Tom that it will be prone
> > to staleness, but I don't conclude that the potential for staleness reduces
> > its net value below zero.  Having said that, if the consequences of doc
> > staleness are "very bad", you may consider documenting the debug1 user
> > interface (https://postgr.es/m/20121202020736.GD13163@tornado.leadboat.com)
> > instead of documenting the exact rules.  Either way is fine with me.
> 
> The DEBUG1 method is only after the fact though, isn't it?
> 
> That makes it pretty hard for someone to say review a migration script and
> see "this is going to cause problems". And if it's going to be run in an
> env, I personally find it more useful to just stick an event trigger in
> there per our documentation and block it (though it might be a good idea to
> link to that from the alter table reference page, and not just have it
> under event trigger examples).

The "after the fact" aspect is basically the same for the DEBUG1 method and
the event trigger method.  Each fires after lock acquisition and before
rewriting the first tuple.

Event trigger drawbacks include the requirement for superuser cooperation.
DEBUG1/statement_timeout drawbacks include an ambiguity: if it reaches
statement_timeout without printing the DEBUG1, that could mean a lack of
rewrite, or it could mean some other cause of slowness.  I have a weak
preference for promoting the DEBUG1/statement_timeout approach, because cloud
deployments find the superuser obstacle insurmountable.  The ambiguity is
surmountable; one can always remove the statement_timeout and run the command
to completion in a pre-production environment.