Thread: Correct docs re: rewriting indexes when table rewrite is skipped
Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for no-rewrite ALTER TABLE .. ALTER TYPE." However the docs still claim that "a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt." I've attached a simple patch to update the docs to match the current behavior. Thanks, James Coleman
Attachment
Re: Correct docs re: rewriting indexes when table rewrite is skipped
From
Matthias van de Meent
Date:
On Tue, 29 Mar 2022 at 16:04, James Coleman <jtc331@gmail.com> wrote: > > Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for > no-rewrite ALTER TABLE > .. ALTER TYPE." However the docs still claim that "a table rewrite is > not needed; but any indexes on the affected columns must still be > rebuilt." Although indexes might indeed not need a rebuild, in many cases they still do (e.g. when the type changes between text and a domain of text with a different collation). I think that the current state of the docs is better in that regard; as it explicitly warns for index rebuilds, even when the letter of the docs is incorrect: there are indeed cases we don't need to rebuild the indexes; but that would require more elaboration. Kind regards, Matthias van de Meent
On Tue, Mar 29, 2022 at 11:29 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Tue, 29 Mar 2022 at 16:04, James Coleman <jtc331@gmail.com> wrote: > > > > Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for > > no-rewrite ALTER TABLE > > .. ALTER TYPE." However the docs still claim that "a table rewrite is > > not needed; but any indexes on the affected columns must still be > > rebuilt." > > Although indexes might indeed not need a rebuild, in many cases they > still do (e.g. when the type changes between text and a domain of text > with a different collation). > > I think that the current state of the docs is better in that regard; > as it explicitly warns for index rebuilds, even when the letter of the > docs is incorrect: there are indeed cases we don't need to rebuild the > indexes; but that would require more elaboration. Admittedly I hadn't thought of that case. But isn't it already covered in the existing docs by the phrase "or an unconstrained domain over the new type"? I don't love the word "or" there because there's a sense in which the first clause "binary coercible to the new type" is still accurate for your example unless you narrowly separate "domain" and "type", but I think that narrow distinction is what's technically there already. That being said, I could instead of removing the clause entirely replace it with something like "indexes may still need to be rebuilt when the new type is a constrained domain". Thoughts? James Coleman
On Wed, Mar 30, 2022 at 10:04 AM James Coleman <jtc331@gmail.com> wrote: > Admittedly I hadn't thought of that case. But isn't it already covered > in the existing docs by the phrase "or an unconstrained domain over > the new type"? I don't love the word "or" there because there's a > sense in which the first clause "binary coercible to the new type" is > still accurate for your example unless you narrowly separate "domain" > and "type", but I think that narrow distinction is what's technically > there already. > > That being said, I could instead of removing the clause entirely > replace it with something like "indexes may still need to be rebuilt > when the new type is a constrained domain". You're talking about this as if the normal cases is that indexes don't need to rebuilt, but sometimes they do. However, if I recall correctly, the way the code is structured, it supposes that the indexes do need a rebuild, and then tries to prove that they actually don't. That disconnect makes me nervous, because it seems to me that it could easily lead to a situation where our documentation contains subtle errors. I think somebody should go through and study the algorithm as it exists in the code, and then write documentation to match it. And I think that when they do that, they should approach it from the same point of view that the code does i.e. "these are the conditions for skipping the index rebuild" rather than "these are the conditions for performing an index rebuild." By doing it that way, I think we minimize the likelihood of disconnects between code and documentation, and also make it easier to update in future if the algorithm gets changed. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 30, 2022 at 11:41 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 30, 2022 at 10:04 AM James Coleman <jtc331@gmail.com> wrote: > > Admittedly I hadn't thought of that case. But isn't it already covered > > in the existing docs by the phrase "or an unconstrained domain over > > the new type"? I don't love the word "or" there because there's a > > sense in which the first clause "binary coercible to the new type" is > > still accurate for your example unless you narrowly separate "domain" > > and "type", but I think that narrow distinction is what's technically > > there already. > > > > That being said, I could instead of removing the clause entirely > > replace it with something like "indexes may still need to be rebuilt > > when the new type is a constrained domain". > > You're talking about this as if the normal cases is that indexes don't > need to rebuilt, but sometimes they do. However, if I recall > correctly, the way the code is structured, it supposes that the > indexes do need a rebuild, and then tries to prove that they actually > don't. That disconnect makes me nervous, because it seems to me that > it could easily lead to a situation where our documentation contains > subtle errors. I think somebody should go through and study the > algorithm as it exists in the code, and then write documentation to > match it. And I think that when they do that, they should approach it > from the same point of view that the code does i.e. "these are the > conditions for skipping the index rebuild" rather than "these are the > conditions for performing an index rebuild." By doing it that way, I > think we minimize the likelihood of disconnects between code and > documentation, and also make it easier to update in future if the > algorithm gets changed. Hmm, having it match the way it works makes sense. Would you feel comfortable with an intermediate step (queueing up that as a larger change) changing the clause to something like "indexes will still have to be rebuilt unless the system can guarantee that the sort order is proven to be unchanged" (with appropriate wordsmithing to be a bit less verbose if possible)? Thanks, James Coleman
On Wed, Mar 30, 2022 at 4:33 PM James Coleman <jtc331@gmail.com> wrote: > Hmm, having it match the way it works makes sense. Would you feel > comfortable with an intermediate step (queueing up that as a larger > change) changing the clause to something like "indexes will still have > to be rebuilt unless the system can guarantee that the sort order is > proven to be unchanged" (with appropriate wordsmithing to be a bit > less verbose if possible)? Yeah, that seems fine. It's arguable how much detail we should go into here - but a statement of the form you propose is not misleading, and that's what seems most important to me. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 30, 2022 at 5:41 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 30, 2022 at 4:33 PM James Coleman <jtc331@gmail.com> wrote: > > Hmm, having it match the way it works makes sense. Would you feel > > comfortable with an intermediate step (queueing up that as a larger > > change) changing the clause to something like "indexes will still have > > to be rebuilt unless the system can guarantee that the sort order is > > proven to be unchanged" (with appropriate wordsmithing to be a bit > > less verbose if possible)? > > Yeah, that seems fine. It's arguable how much detail we should go into > here - but a statement of the form you propose is not misleading, and > that's what seems most important to me. All right, thanks for feedback. Attached is v2 with such a change. I've not included examples, and I'm about 50/50 on doing so. What are your thoughts on adding in parens "e.g., changing from varchar to text avoids rebuilding indexes while changing from text to a domain of text with a different collation will require rebuilding indexes"? Thanks, James Coleman
Attachment
On Thu, Mar 31, 2022 at 9:17 AM James Coleman <jtc331@gmail.com> wrote: > All right, thanks for feedback. Attached is v2 with such a change. > I've not included examples, and I'm about 50/50 on doing so. What are > your thoughts on adding in parens "e.g., changing from varchar to text > avoids rebuilding indexes while changing from text to a domain of text > with a different collation will require rebuilding indexes"? On the patch, I suggest that instead of saying "can verify that sort order and/or hashing semantics are unchanged" you say something like "can verify that the new index would be logically equivalent to the current one", mostly because I do not think that "and/or" looks very good in formal writing. I think it would be fine to include examples, but I think that the phrasing you suggest here doesn't seem great. I'm not sure how to fix it exactly. Maybe it needs a little more explanation? -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 31, 2022 at 9:43 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 9:17 AM James Coleman <jtc331@gmail.com> wrote: > > All right, thanks for feedback. Attached is v2 with such a change. > > I've not included examples, and I'm about 50/50 on doing so. What are > > your thoughts on adding in parens "e.g., changing from varchar to text > > avoids rebuilding indexes while changing from text to a domain of text > > with a different collation will require rebuilding indexes"? > > On the patch, I suggest that instead of saying "can verify that sort > order and/or hashing semantics are unchanged" you say something like > "can verify that the new index would be logically equivalent to the > current one", mostly because I do not think that "and/or" looks very > good in formal writing. Agreed re: "and/or". > I think it would be fine to include examples, but I think that the > phrasing you suggest here doesn't seem great. I'm not sure how to fix > it exactly. Maybe it needs a little more explanation? Is the attached more along the lines of what you were thinking? Thanks, James Coleman
Attachment
On Thu, Mar 31, 2022 at 10:14 AM James Coleman <jtc331@gmail.com> wrote: > Is the attached more along the lines of what you were thinking? Yeah. Maybe this would be a little clearer: "For example, if the collation for a column has been changed, an index rebuild is always required, because the new sort order might be different. However, in the absence of a collation change, a column can be changed from text to varchar or vice versa without rebuilding the indexes, because these data types sort identically." We don't seem to be very consistent about whether we write type names like VARCHAR in upper case or lower case in the documentation. I'd vote for using lower-case, but it probably doesn't matter much. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 31, 2022 at 10:29 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 10:14 AM James Coleman <jtc331@gmail.com> wrote: > > Is the attached more along the lines of what you were thinking? > > Yeah. Maybe this would be a little clearer: "For example, if the > collation for a column has been changed, an index rebuild is always > required, because the new sort order might be different. However, in > the absence of a collation change, a column can be changed from text > to varchar or vice versa without rebuilding the indexes, because these > data types sort identically." Updated. > We don't seem to be very consistent about whether we write type names > like VARCHAR in upper case or lower case in the documentation. I'd > vote for using lower-case, but it probably doesn't matter much. Grepping suggests lower case is more common (2 to 1) so I switched to that. Interestingly for "text" it's 700+ to 1 :) Thanks, James Coleman
Attachment
On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote: > Updated. This version looks fine to me. If nobody objects I will commit it and credit myself as a co-author. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 31, 2022 at 3:25 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote: > > Updated. > > This version looks fine to me. If nobody objects I will commit it and > credit myself as a co-author. Sounds great; thanks again for the review. James Coleman
On Thu, Mar 31, 2022 at 4:19 PM James Coleman <jtc331@gmail.com> wrote: > On Thu, Mar 31, 2022 at 3:25 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote: > > > Updated. > > > > This version looks fine to me. If nobody objects I will commit it and > > credit myself as a co-author. > > Sounds great; thanks again for the review. Done. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 1, 2022 at 8:58 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 4:19 PM James Coleman <jtc331@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 3:25 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > On Thu, Mar 31, 2022 at 10:51 AM James Coleman <jtc331@gmail.com> wrote: > > > > Updated. > > > > > > This version looks fine to me. If nobody objects I will commit it and > > > credit myself as a co-author. > > > > Sounds great; thanks again for the review. > > Done. Thanks. I marked the CF entry as committed. James Coleman