Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date
Msg-id CAEze2Wh4zbYpixex7dbRJKPGKincVeopRtrA2jZzfRFjerooQQ@mail.gmail.com
Whole thread Raw
In response to Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements  (Álvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements  (Álvaro Herrera <alvherre@alvh.no-ip.org>)
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements  (Álvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> So one last remaining improvement was to have VACUUM ignore processes
> doing CIC and RC to compute the Xid horizon of tuples to remove.  I
> think we can do something simple like the attached patch.

Regarding the patch:

> +            if (statusFlags & PROC_IN_SAFE_IC)
> +                h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +            else
> +                h->data_oldest_nonremovable = h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->data_oldest_nonremovable, xmin);

Would this not need to be the following? Right now, it resets
potentially older h->catalog_oldest_nonremovable (which is set in the
PROC_IN_SAFE_IC branch).

> +            if (statusFlags & PROC_IN_SAFE_IC)
> +                h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +            else
> +            {
> +                h->data_oldest_nonremovable =
> +                    TransactionIdOlder(h->data_oldest_nonremovable, xmin);
> +                h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +            }


Prior to reading the rest of my response: I do not understand the
intricacies of the VACUUM process, nor the systems of db snapshots, so
it's reasonably possible I misunderstand this.
But would this patch not allow for tuples to be created, deleted and
vacuumed away from a table whilst rebuilding an index on that table,
resulting in invalid pointers in the index?

Example:

1.) RI starts
2.) PHASE 2: filling the index:
2.1.) scanning the heap (live tuple is cached)
< tuple is deleted
< last transaction other than RI commits, only snapshot of RI exists
< vacuum drops the tuple, and cannot remove it from the new index
because this new index is not yet populated.
2.2.) sorting tuples
2.3.) index filled with tuples, incl. deleted tuple
3.) PHASE 3: wait for transactions
4.) PHASE 4: validate does not remove the tuple from the index,
because it is not built to do so: it will only insert new tuples.
Tuples that are marked for deletion are removed from the index only
through VACUUM (and optimistic ALL_DEAD detection).

According to my limited knowledge of RI, it requires VACUUM to not run
on the table during the initial index build process (which is
currently guaranteed through the use of a snapshot).


Regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Add support for leading/trailing bytea trim()ing
Next
From: Tom Lane
Date:
Subject: Re: search_plan_tree(): handling of non-leaf CustomScanState nodes causes segfault