Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
Date
Msg-id CAEze2Wh7NEyQEJNJ6_eMwtPy6s-WV88kixqENq8q=jcK=BLC3w@mail.gmail.com
Whole thread Raw
In response to Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans  (James Coleman <jtc331@gmail.com>)
Responses Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
List pgsql-hackers
On Fri, 1 Apr 2022 at 16:10, James Coleman <jtc331@gmail.com> wrote:
>
> On Thu, Mar 31, 2022 at 10:58 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> >
> > On Tue, 29 Mar 2022 at 16:20, James Coleman <jtc331@gmail.com> wrote:
> > >
> > > Over in the "Document atthasmissing default optimization avoids
> > > verification table scan" thread David Johnston (who I've cc'd)
> > > suggested that my goals might be better implemented with a simple
> > > restructuring of the Notes section of the ALTER TABLE docs. I think
> > > this is also along the lines of Tom Lane's suggestion of a "unified
> > > discussion", but I've chosen for now (and simplicity's sake) not to
> > > break this into an entirely new page. If reviewers feel that is
> > > warranted at this stage, I can do that, but it seems to me that for
> > > now this improves the structure and sets us up for such a future page
> > > but falls short of sufficient content to move into its own page.
> > >
> > > One question on the changes: the current docs say "when attaching a
> > > new partition it may be scanned to verify that existing rows meet the
> > > partition constraint". The word "may" there seems to suggest there may
> > > also be occasions where scans are not needed, but no examples of such
> > > cases are present. I'm not immediately aware of such a case. Are these
> > > constraints always validated? If not, in which cases can such a scan
> > > be skipped?
> > >
> > > I've also incorporated the slight correction in "Correct docs re:
> > > rewriting indexes when table rewrite is skipped" [2] here, and will
> > > rebase this patch if that gets committed.
> >
> > See comments in that thread.
>
> Rebased since that thread has now resulted in a committed patch.
>
> > > +    Changing the type of an existing column will require the entire table and its
> > > +    indexes to be rewritten. As an exception, if the <literal>USING</literal> clause
> > > +    does not change the column contents and the old type is either binary coercible
> > > +    to the new type or an unconstrained domain over the new type, a table rewrite is
> > > +    not needed.
> >
> > This implies "If the old type is [...] an unconstrained domain over
> > the new type, a table rewrite is not needed.", which is the wrong way
> > around.
> >
> > I'd go with something along the lines of:
> >
> > +    Changing the type of an existing column will require the entire table to be
> > +    rewritten, unless the <literal>USING</literal> clause is only a
> > binary coercible
> > +    cast, or if the new type is an unconstrained
> > <literal>DOMAIN<literal> over the
> > +    old type.
>
> That language is actually unchanged from the existing docs; is there
> an error in the existing docs you're seeing? I'm actually imagining
> that it can probably got either way -- from unconstrained domain over
> new type to new type or from old type to unconstrained domain over old
> type.

CREATE DOMAIN constrained AS text NOT NULL;
CREATE DOMAIN unconstrained_on_constrained AS constrained;

CREATE TABLE tst (col unconstrained_on_constrained);
ALTER TABLE tst ALTER COLUMN col TYPE constrained; -- table scan

Moving from an unconstrained domain over a constrained domain means
that we still do the table scan. Domain nesting is weird in that way.

> > That would drop the reference to index rebuilding; but that should be
> > covered in other parts of the docs.
>
> Part of the whole point of this restructuring is to make both of those
> clear; I think we should retain the comments about indexes.

OK; I mentioned it because table rewrite also implies index rewrite;
assuming this is correctly referenced in other parts of the docs.

> > > +    The following alterations of the table require the entire table, and in some
> > > +    cases its indexes as well, to be rewritten.
> >
> > It is impossible to rewrite the table without at the same time also
> > rewriting the indexes; as the location of tuples changes and thus
> > previously generated indexes will become invalid. At the same time;
> > changes to columns might not require a table rewrite, while still
> > requiring the indexes to be rewritten. I suggest changing the order of
> > "table" and "index", or dropping the clause.
>
> Ah, that's a good point. I've rewritten that part.
>
> > > +    [...] For a large table such a rewrite
> > > +    may take a significant amount of time and will temporarily require as much as
> > > +    double the disk space.
> >
> > I'd replace the will with could. Technically, this "double the disk
> > space" could be even higher than that; due to index rebuilds taking up
> > to 3x normal space (one original index which is only dropped at the
> > end, one sorted tuple store for the rebuild, and one new index).
>
> That's also the existing language, but I agree it seems a bit overly
> precise (and in the process probably incorrect). There's a lot of
> complexity here: depending on the type change (and USING clause!) and
> table width it could be even more than 3x. I've reworded to try to
> capture what's really going on here.
>
> Why "could" instead of "will"? All table rewrites will always require
> a extra disk space, right?

Table bloat will be removed, as an equivalent of `VACUUM FULL` or
`CLUSTER` is run on the database. This can remove up to 99.99...% of
the current size of the table; e.g. if there's a few tuples of
MaxHeapTupleSize at the end of the table (likely in 32-bit mode, or
pre-pg14 systems).

> > > -    Similarly, when attaching a new partition it may be scanned to verify that
> > > -    existing rows meet the partition constraint.
> > > +    Attaching a new partition requires scanning the table to verify that existing
> > > +    rows meet the partition constraint.
> >
> > This is also (and better!) documented under section
> > sql-altertable-attach-partition: we will skip full table scan if the
> > table partition's existing constraints already imply the new partition
> > constraints. The previous wording is better in that regard ("may
> > need", instead of "requires"), though it could be improved by refering
> > to the sql-altertable-attach-partition section.
>
> Updated, and I added an xref to that section (I think that's the
> correct tagging).

> +    The following alterations of the table require the entire table to be rewritten
> +    and its indexes to be rebuilt.

> +    The following alterations of the table require that it be scanned in its entirety
> +    to ensure that no existing values are contrary to the new constraints placed on
> +    the table.

Could you maybe reword these sentences to replace "alterations of"
with "changes to"? Although fittingly used in the context of 'ALTER
TABLE", I'm not a fan of the phrasing that comes with the use of
'alterations'.

> +    Attaching a new partition may require scanning the table to verify that existing
> +    rows meet the partition constraint.

I think that "the table" should be "some tables", as default
partitions of the parent table might need to be checked as well.

Thanks!

- Matthias



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Temporary tables versus wraparound... again
Next
From: Jelte Fennema
Date:
Subject: Re: Add non-blocking version of PQcancel