Thread: Which SET TYPE don't actually require a rewrite
Our Fine Manual (TM) specifies:
"As an exception, when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt."
First of all, how is a non-internals-expert even supposed to know what a binary coercible type is? That's not a very user-friendly way to say it.
Second, how is even an expert supposed to find the list? :)
For example, we can query pg_cast for casts that are binary coercible, that's a start, but it doesn't really tell us the answer.
We can also for example increase the precision of numeric without a rewrite (but not scale). Or we can change between text and varchar. And we can increase the length of a varchar but not decrease it.
Surely we can do better than this when it comes to documenting it? Even if it's a pluggable thing so it may or may not be true of external datatypes installed later, we should be able to at least be more clear about the builtin types, I think?
--
On Wed, Jul 15, 2020 at 6:25 PM Magnus Hagander <magnus@hagander.net> wrote: > > Our Fine Manual (TM) specifies: > "As an exception, when changing the type of an existing column, if the USING clause does not change the column contentsand the old type is either binary coercible to the new type or an unconstrained domain over the new type, a tablerewrite is not needed; but any indexes on the affected columns must still be rebuilt." > > First of all, how is a non-internals-expert even supposed to know what a binary coercible type is? That's not a very user-friendlyway to say it. > > Second, how is even an expert supposed to find the list? :) > > For example, we can query pg_cast for casts that are binary coercible, that's a start, but it doesn't really tell us theanswer. > > We can also for example increase the precision of numeric without a rewrite (but not scale). Or we can change between textand varchar. And we can increase the length of a varchar but not decrease it. > > Surely we can do better than this when it comes to documenting it? Even if it's a pluggable thing so it may or may notbe true of external datatypes installed later, we should be able to at least be more clear about the builtin types, Ithink? > +1 for providing more information in the documentation. One way could be that we give some examples of how a user can check whether types are binary coercible or not and then also specify clearly in which other cases the rewrite can happen. Similarly, it seems the information when the rewrite can happen for "SET (storage_parameter ...)" (doc says: "depending on the parameter you might need to rewrite the table to get the desired effects") is thin. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote: > Our Fine Manual (TM) specifies: > "As an exception, when changing the type of an existing column, if the > USING clause does not change the column contents and the old type is either > binary coercible to the new type or an unconstrained domain over the new > type, a table rewrite is not needed; but any indexes on the affected > columns must still be rebuilt." > > First of all, how is a non-internals-expert even supposed to know what a > binary coercible type is? The manual defines it at <firstterm>binary coercible</firstterm>. > We can also for example increase the precision of numeric without a rewrite > (but not scale). Or we can change between text and varchar. And we can > increase the length of a varchar but not decrease it. > > Surely we can do better than this when it comes to documenting it? Even if > it's a pluggable thing so it may or may not be true of external > datatypes installed later, we should be able to at least be more clear > about the builtin types, I think? I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of query optimizer logic. The manual brings up only a minority of planner optimizations, and comprehensive lists of optimization preconditions are even rarer. But I don't mind if $SUBJECT documentation departs from that norm.
On Fri, Jul 17, 2020 at 5:40 AM Noah Misch <noah@leadboat.com> wrote:
On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote:
> Our Fine Manual (TM) specifies:
> "As an exception, when changing the type of an existing column, if the
> USING clause does not change the column contents and the old type is either
> binary coercible to the new type or an unconstrained domain over the new
> type, a table rewrite is not needed; but any indexes on the affected
> columns must still be rebuilt."
>
> First of all, how is a non-internals-expert even supposed to know what a
> binary coercible type is?
The manual defines it at <firstterm>binary coercible</firstterm>.
The only way to actually realize that this is a <firstterm> is to look at the source code though, right? It's definitely not clear that one should go look at the CREATE CAST documentation to find the definition -- certainly not from the ALTER TABLE documentation, which I would argue is the place where most people would go.
And while having the definition there is nice, it doesn't help an end user in any way at all to determine if their ALTER TABLE statement is going to be "safe from rewrites" or not. It (hopefully) helps someone who knows some things about the database internals, which is of course a valuable thing as well, but not the end user.
> We can also for example increase the precision of numeric without a rewrite
> (but not scale). Or we can change between text and varchar. And we can
> increase the length of a varchar but not decrease it.
>
> Surely we can do better than this when it comes to documenting it? Even if
> it's a pluggable thing so it may or may not be true of external
> datatypes installed later, we should be able to at least be more clear
> about the builtin types, I think?
I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of
query optimizer logic. The manual brings up only a minority of planner
optimizations, and comprehensive lists of optimization preconditions are even
rarer. But I don't mind if $SUBJECT documentation departs from that norm.
As Amit mentions it is also triggered by some store parameter changes. But not all. So looking at it the other way, the part that the end user really cares about it "which ALTER TABLE operations will rewrite the table and which will not". Maybe what we need is a section specifically on this that summarizes all the different ways that it can happen.
--
Magnus Hagander <magnus@hagander.net> writes: > As Amit mentions it is also triggered by some store parameter changes. But > not all. So looking at it the other way, the part that the end user really > cares about it "which ALTER TABLE operations will rewrite the table and > which will not". Maybe what we need is a section specifically on this that > summarizes all the different ways that it can happen. No, what we need is EXPLAIN for DDL ;-). Trying to keep such documentation in sync with the actual code behavior would be impossible. (For one thing, some aspects can be affected by extension datatype behaviors.) regards, tom lane
On Fri, Jul 17, 2020 at 04:08:36PM +0200, Magnus Hagander wrote: > On Fri, Jul 17, 2020 at 5:40 AM Noah Misch <noah@leadboat.com> wrote: > > On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote: > > > Our Fine Manual (TM) specifies: > > > "As an exception, when changing the type of an existing column, if the > > > USING clause does not change the column contents and the old type is > > either > > > binary coercible to the new type or an unconstrained domain over the new > > > type, a table rewrite is not needed; but any indexes on the affected > > > columns must still be rebuilt." > > > > > > First of all, how is a non-internals-expert even supposed to know what a > > > binary coercible type is? > > > > The manual defines it at <firstterm>binary coercible</firstterm>. > > The only way to actually realize that this is a <firstterm> is to look at > the source code though, right? I see italic typeface for <firstterm>. This one deserves an <indexterm>, too. (I bet many other <firstterm> uses deserve an <indexterm>.) > It's definitely not clear that one should go > look at the CREATE CAST documentation to find the definition -- certainly > not from the ALTER TABLE documentation, which I would argue is the place > where most people would go. Agreed. > > We can also for example increase the precision of numeric without a > > rewrite > > > (but not scale). Or we can change between text and varchar. And we can > > > increase the length of a varchar but not decrease it. > > > > > > Surely we can do better than this when it comes to documenting it? Even > > if > > > it's a pluggable thing so it may or may not be true of external > > > datatypes installed later, we should be able to at least be more clear > > > about the builtin types, I think? > > > > I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of > > query optimizer logic. The manual brings up only a minority of planner > > optimizations, and comprehensive lists of optimization preconditions are > > even > > rarer. But I don't mind if $SUBJECT documentation departs from that norm. > > I can see the argument being made for that, and certainly having been made > for it in the future. But I'd say given the very bad consequences of > getting it wrong, it's far from minor. And given the number of times I've > had to answer the question "can I make this change safely" (which usually > amounts to me trying it out to see what happens, if I hadn't done that > exact one many times before) indicates the need for a more detailed > documentation on it. Such a doc addition is fine with me. I agree with Tom that it will be prone to staleness, but I don't conclude that the potential for staleness reduces its net value below zero. Having said that, if the consequences of doc staleness are "very bad", you may consider documenting the debug1 user interface (https://postgr.es/m/20121202020736.GD13163@tornado.leadboat.com) instead of documenting the exact rules. Either way is fine with me.
On Fri, Jul 17, 2020 at 11:26:56AM -0400, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > As Amit mentions it is also triggered by some store parameter changes. But > > not all. So looking at it the other way, the part that the end user really > > cares about it "which ALTER TABLE operations will rewrite the table and > > which will not". Maybe what we need is a section specifically on this that > > summarizes all the different ways that it can happen. > > No, what we need is EXPLAIN for DDL ;-). Trying to keep such > documentation in sync with the actual code behavior would be impossible. > (For one thing, some aspects can be affected by extension datatype > behaviors.) I know Tom put a wink on that, but I actually do feel that the only clean way to do this is to give users a way to issue the query in a non-executing way that will report if a rewrite is going to happen. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, Jul 21, 2020 at 04:55:37PM -0400, Bruce Momjian wrote: > I know Tom put a wink on that, but I actually do feel that the only > clean way to do this is to give users a way to issue the query in a > non-executing way that will report if a rewrite is going to happen. Yeah, when doing a schema upgrade for an application, that's the usual performance pin-point and people used to other things than Postgres write their queries without being aware of that. We have something able to track that with the event trigger table_rewrite, but there is no easy option to store the event and bypass its execution. I think that using a plpgsql function wrapping an ALTER TABLE query with an exception block for an error generated by an event trigger if seeing table_rewrite allows to do that, though. -- Michael
Attachment
On Sat, Jul 18, 2020 at 4:57 AM Noah Misch <noah@leadboat.com> wrote:
On Fri, Jul 17, 2020 at 04:08:36PM +0200, Magnus Hagander wrote:
> On Fri, Jul 17, 2020 at 5:40 AM Noah Misch <noah@leadboat.com> wrote:
> > On Wed, Jul 15, 2020 at 02:54:37PM +0200, Magnus Hagander wrote:
> > > Our Fine Manual (TM) specifies:
> > > "As an exception, when changing the type of an existing column, if the
> > > USING clause does not change the column contents and the old type is
> > either
> > > binary coercible to the new type or an unconstrained domain over the new
> > > type, a table rewrite is not needed; but any indexes on the affected
> > > columns must still be rebuilt."
> > >
> > > First of all, how is a non-internals-expert even supposed to know what a
> > > binary coercible type is?
> >
> > The manual defines it at <firstterm>binary coercible</firstterm>.
>
> The only way to actually realize that this is a <firstterm> is to look at
> the source code though, right?
I see italic typeface for <firstterm>. This one deserves an <indexterm>, too.
(I bet many other <firstterm> uses deserve an <indexterm>.)
> It's definitely not clear that one should go
> look at the CREATE CAST documentation to find the definition -- certainly
> not from the ALTER TABLE documentation, which I would argue is the place
> where most people would go.
Agreed.
> > We can also for example increase the precision of numeric without a
> > rewrite
> > > (but not scale). Or we can change between text and varchar. And we can
> > > increase the length of a varchar but not decrease it.
> > >
> > > Surely we can do better than this when it comes to documenting it? Even
> > if
> > > it's a pluggable thing so it may or may not be true of external
> > > datatypes installed later, we should be able to at least be more clear
> > > about the builtin types, I think?
> >
> > I recall reasoning that ATColumnChangeRequiresRewrite() is a DDL analog of
> > query optimizer logic. The manual brings up only a minority of planner
> > optimizations, and comprehensive lists of optimization preconditions are
> > even
> > rarer. But I don't mind if $SUBJECT documentation departs from that norm.
>
> I can see the argument being made for that, and certainly having been made
> for it in the future. But I'd say given the very bad consequences of
> getting it wrong, it's far from minor. And given the number of times I've
> had to answer the question "can I make this change safely" (which usually
> amounts to me trying it out to see what happens, if I hadn't done that
> exact one many times before) indicates the need for a more detailed
> documentation on it.
Such a doc addition is fine with me. I agree with Tom that it will be prone
to staleness, but I don't conclude that the potential for staleness reduces
its net value below zero. Having said that, if the consequences of doc
staleness are "very bad", you may consider documenting the debug1 user
interface (https://postgr.es/m/20121202020736.GD13163@tornado.leadboat.com)
instead of documenting the exact rules. Either way is fine with me.
The DEBUG1 method is only after the fact though, isn't it?
That makes it pretty hard for someone to say review a migration script and see "this is going to cause problems". And if it's going to be run in an env, I personally find it more useful to just stick an event trigger in there per our documentation and block it (though it might be a good idea to link to that from the alter table reference page, and not just have it under event trigger examples).
I agree that documenting the rules would definitely be prone to staleness, and that having EXPLAIN for DDL would be the *better* solution. But also that having the docs, even if they go a bit stale, would be better than the scenario today.
Unfortunately, I'm not sure I know enough of the details of what the rules actually *are* to explain them in a way that's easy enough to go in the docs...
On Wed, Aug 05, 2020 at 02:52:42PM +0200, Magnus Hagander wrote: > On Sat, Jul 18, 2020 at 4:57 AM Noah Misch <noah@leadboat.com> wrote: > > Such a doc addition is fine with me. I agree with Tom that it will be prone > > to staleness, but I don't conclude that the potential for staleness reduces > > its net value below zero. Having said that, if the consequences of doc > > staleness are "very bad", you may consider documenting the debug1 user > > interface (https://postgr.es/m/20121202020736.GD13163@tornado.leadboat.com) > > instead of documenting the exact rules. Either way is fine with me. > > The DEBUG1 method is only after the fact though, isn't it? > > That makes it pretty hard for someone to say review a migration script and > see "this is going to cause problems". And if it's going to be run in an > env, I personally find it more useful to just stick an event trigger in > there per our documentation and block it (though it might be a good idea to > link to that from the alter table reference page, and not just have it > under event trigger examples). The "after the fact" aspect is basically the same for the DEBUG1 method and the event trigger method. Each fires after lock acquisition and before rewriting the first tuple. Event trigger drawbacks include the requirement for superuser cooperation. DEBUG1/statement_timeout drawbacks include an ambiguity: if it reaches statement_timeout without printing the DEBUG1, that could mean a lack of rewrite, or it could mean some other cause of slowness. I have a weak preference for promoting the DEBUG1/statement_timeout approach, because cloud deployments find the superuser obstacle insurmountable. The ambiguity is surmountable; one can always remove the statement_timeout and run the command to completion in a pre-production environment.