Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans |
Date | |
Msg-id | CAEze2Wh7NEyQEJNJ6_eMwtPy6s-WV88kixqENq8q=jcK=BLC3w@mail.gmail.com Whole thread Raw |
In response to | Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans (James Coleman <jtc331@gmail.com>) |
Responses |
Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
|
List | pgsql-hackers |
On Fri, 1 Apr 2022 at 16:10, James Coleman <jtc331@gmail.com> wrote: > > On Thu, Mar 31, 2022 at 10:58 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Tue, 29 Mar 2022 at 16:20, James Coleman <jtc331@gmail.com> wrote: > > > > > > Over in the "Document atthasmissing default optimization avoids > > > verification table scan" thread David Johnston (who I've cc'd) > > > suggested that my goals might be better implemented with a simple > > > restructuring of the Notes section of the ALTER TABLE docs. I think > > > this is also along the lines of Tom Lane's suggestion of a "unified > > > discussion", but I've chosen for now (and simplicity's sake) not to > > > break this into an entirely new page. If reviewers feel that is > > > warranted at this stage, I can do that, but it seems to me that for > > > now this improves the structure and sets us up for such a future page > > > but falls short of sufficient content to move into its own page. > > > > > > One question on the changes: the current docs say "when attaching a > > > new partition it may be scanned to verify that existing rows meet the > > > partition constraint". The word "may" there seems to suggest there may > > > also be occasions where scans are not needed, but no examples of such > > > cases are present. I'm not immediately aware of such a case. Are these > > > constraints always validated? If not, in which cases can such a scan > > > be skipped? > > > > > > I've also incorporated the slight correction in "Correct docs re: > > > rewriting indexes when table rewrite is skipped" [2] here, and will > > > rebase this patch if that gets committed. > > > > See comments in that thread. > > Rebased since that thread has now resulted in a committed patch. > > > > + Changing the type of an existing column will require the entire table and its > > > + indexes to be rewritten. As an exception, if the <literal>USING</literal> clause > > > + does not change the column contents and the old type is either binary coercible > > > + to the new type or an unconstrained domain over the new type, a table rewrite is > > > + not needed. > > > > This implies "If the old type is [...] an unconstrained domain over > > the new type, a table rewrite is not needed.", which is the wrong way > > around. > > > > I'd go with something along the lines of: > > > > + Changing the type of an existing column will require the entire table to be > > + rewritten, unless the <literal>USING</literal> clause is only a > > binary coercible > > + cast, or if the new type is an unconstrained > > <literal>DOMAIN<literal> over the > > + old type. > > That language is actually unchanged from the existing docs; is there > an error in the existing docs you're seeing? I'm actually imagining > that it can probably got either way -- from unconstrained domain over > new type to new type or from old type to unconstrained domain over old > type. CREATE DOMAIN constrained AS text NOT NULL; CREATE DOMAIN unconstrained_on_constrained AS constrained; CREATE TABLE tst (col unconstrained_on_constrained); ALTER TABLE tst ALTER COLUMN col TYPE constrained; -- table scan Moving from an unconstrained domain over a constrained domain means that we still do the table scan. Domain nesting is weird in that way. > > That would drop the reference to index rebuilding; but that should be > > covered in other parts of the docs. > > Part of the whole point of this restructuring is to make both of those > clear; I think we should retain the comments about indexes. OK; I mentioned it because table rewrite also implies index rewrite; assuming this is correctly referenced in other parts of the docs. > > > + The following alterations of the table require the entire table, and in some > > > + cases its indexes as well, to be rewritten. > > > > It is impossible to rewrite the table without at the same time also > > rewriting the indexes; as the location of tuples changes and thus > > previously generated indexes will become invalid. At the same time; > > changes to columns might not require a table rewrite, while still > > requiring the indexes to be rewritten. I suggest changing the order of > > "table" and "index", or dropping the clause. > > Ah, that's a good point. I've rewritten that part. > > > > + [...] For a large table such a rewrite > > > + may take a significant amount of time and will temporarily require as much as > > > + double the disk space. > > > > I'd replace the will with could. Technically, this "double the disk > > space" could be even higher than that; due to index rebuilds taking up > > to 3x normal space (one original index which is only dropped at the > > end, one sorted tuple store for the rebuild, and one new index). > > That's also the existing language, but I agree it seems a bit overly > precise (and in the process probably incorrect). There's a lot of > complexity here: depending on the type change (and USING clause!) and > table width it could be even more than 3x. I've reworded to try to > capture what's really going on here. > > Why "could" instead of "will"? All table rewrites will always require > a extra disk space, right? Table bloat will be removed, as an equivalent of `VACUUM FULL` or `CLUSTER` is run on the database. This can remove up to 99.99...% of the current size of the table; e.g. if there's a few tuples of MaxHeapTupleSize at the end of the table (likely in 32-bit mode, or pre-pg14 systems). > > > - Similarly, when attaching a new partition it may be scanned to verify that > > > - existing rows meet the partition constraint. > > > + Attaching a new partition requires scanning the table to verify that existing > > > + rows meet the partition constraint. > > > > This is also (and better!) documented under section > > sql-altertable-attach-partition: we will skip full table scan if the > > table partition's existing constraints already imply the new partition > > constraints. The previous wording is better in that regard ("may > > need", instead of "requires"), though it could be improved by refering > > to the sql-altertable-attach-partition section. > > Updated, and I added an xref to that section (I think that's the > correct tagging). > + The following alterations of the table require the entire table to be rewritten > + and its indexes to be rebuilt. > + The following alterations of the table require that it be scanned in its entirety > + to ensure that no existing values are contrary to the new constraints placed on > + the table. Could you maybe reword these sentences to replace "alterations of" with "changes to"? Although fittingly used in the context of 'ALTER TABLE", I'm not a fan of the phrasing that comes with the use of 'alterations'. > + Attaching a new partition may require scanning the table to verify that existing > + rows meet the partition constraint. I think that "the table" should be "some tables", as default partitions of the parent table might need to be checked as well. Thanks! - Matthias
pgsql-hackers by date: