Thread: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Hello, In a previous thread [1], we added smarts so that processes running CREATE INDEX CONCURRENTLY would not wait for each other. One is adding the same to REINDEX CONCURRENTLY. I've attached patch 0002 here which does that. Why 0002, you ask? That's because preparatory patch 0001 simplifies the ReindexRelationConcurrently somewhat by adding a struct to be used of indexes that are going to be processed, instead of just a list of Oids. This is a good change in itself because it let us get rid of duplicative open/close of the index rels in order to obtain some info that's already known at the start. The other thing is that it'd be good if we can make VACUUM also ignore Xmin of processes doing CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY, when possible. I have two possible ideas to handle this, about which I'll post later. [1] https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql -- Álvaro Herrera Valdivia, Chile
Attachment
> On Mon, Nov 30, 2020 at 04:54:39PM -0300, Alvaro Herrera wrote: > > In a previous thread [1], we added smarts so that processes running > CREATE INDEX CONCURRENTLY would not wait for each other. > > One is adding the same to REINDEX CONCURRENTLY. I've attached patch > 0002 here which does that. > > Why 0002, you ask? That's because preparatory patch 0001 simplifies the > ReindexRelationConcurrently somewhat by adding a struct to be used of > indexes that are going to be processed, instead of just a list of Oids. > This is a good change in itself because it let us get rid of duplicative > open/close of the index rels in order to obtain some info that's already > known at the start. Thanks! The patch looks pretty good to me, after reading it I only have a few minor comments/questions: * ReindexIndexInfo sounds a bit weird for me because of the repeating part, although I see there is already a similar ReindexIndexCallbackState so probably it's fine. * This one is mostly for me to understand. There are couple of places with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the transaction only takes a snapshot to do some catalog manipulation'. But for some of them I don't immediately see in the relevant code anything related to snapshots. E.g. one in DefineIndex is followed by WaitForOlderSnapshots (which seems to only do waiting, not taking a snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid. Is taking a snapshot hidden somewhere there inside?
On 2020-Dec-04, Dmitry Dolgov wrote: > * This one is mostly for me to understand. There are couple of places > with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the > transaction only takes a snapshot to do some catalog manipulation'. > But for some of them I don't immediately see in the relevant code > anything related to snapshots. E.g. one in DefineIndex is followed by > WaitForOlderSnapshots (which seems to only do waiting, not taking a > snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid. > Is taking a snapshot hidden somewhere there inside? Well, they're actually going to acquire an Xid, not a snapshot, so the comment is slightly incorrect; I'll fix it, thanks for pointing that out. The point stands: because those transactions are of very short duration (at least of very short duration after the point where the XID is obtained), it's not necessary to set the PROC_IN_SAFE_IC flag, since it won't cause any disruption to other processes. It is possible that I copied the wrong comment in DefineIndex. (I only noticed that one after I had mulled over the ones in ReindexRelationConcurrently, each of which is skipping setting the flag for slightly different reasons.)
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation: not tested The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by Dmitry;and those have already been answered by you. So this patch is good to go from my side once you have updated the commentsper your last email.
On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: not tested > Spec compliant: not tested > Documentation: not tested > > The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by Dmitry;and those have already been answered by you. So this patch is good to go from my side once you have updated the commentsper your last email. For the 0001 patch, since ReindexIndexInfo is used only within ReindexRelationConcurrently() I think it’s a function-local structure type. So we can declare it within the function. What do you think? Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On 2021-Jan-04, Masahiko Sawada wrote: > On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: not tested > > Spec compliant: not tested > > Documentation: not tested > > > > The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by Dmitry;and those have already been answered by you. So this patch is good to go from my side once you have updated the commentsper your last email. > > For the 0001 patch, since ReindexIndexInfo is used only within > ReindexRelationConcurrently() I think it’s a function-local structure > type. So we can declare it within the function. What do you think? That's a good idea. Pushed with that change, thanks. Thanks also to Dmitry and Hamid for the reviews. -- Álvaro Herrera 39°49'30"S 73°17'W
On 2021-Jan-12, Álvaro Herrera wrote: > > For the 0001 patch, since ReindexIndexInfo is used only within > > ReindexRelationConcurrently() I think it’s a function-local structure > > type. So we can declare it within the function. What do you think? > > That's a good idea. Pushed with that change, thanks. Here's the other patch, with comments fixed per reviews, and rebased to account for the scope change. -- Álvaro Herrera Valdivia, Chile Tulio: oh, para qué servirá este boton, Juan Carlos? Policarpo: No, aléjense, no toquen la consola! Juan Carlos: Lo apretaré una y otra vez.
Attachment
On 2021-Jan-12, Álvaro Herrera wrote: > On 2021-Jan-12, Álvaro Herrera wrote: > > > > For the 0001 patch, since ReindexIndexInfo is used only within > > > ReindexRelationConcurrently() I think it’s a function-local structure > > > type. So we can declare it within the function. What do you think? > > > > That's a good idea. Pushed with that change, thanks. > > Here's the other patch, with comments fixed per reviews, and rebased to > account for the scope change. Pushed. At the last minute I decided to back off on reverting the flag set that DefineIndex has, because there was no point in doing that. I also updated the comment in (two places of) ReindexRelationConcurrently about why we don't do it there. The previous submission had this: > + /* > + * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because > + * it only acquires an Xid to do some catalog manipulations, after the > + * wait is over. > + */ but this fails to point out that the main reason not to do it, is to avoid having to decide whether to do it or not when some indexes are safe and others aren't. So I changed to: + /* + * While we could set PROC_IN_SAFE_IC if all indexes qualified, there's no + * real need for that, because we only acquire an Xid after the wait is + * done, and that lasts for a very short period. + */ Thanks! -- Álvaro Herrera Valdivia, Chile "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)
So one last remaining improvement was to have VACUUM ignore processes doing CIC and RC to compute the Xid horizon of tuples to remove. I think we can do something simple like the attached patch. -- Álvaro Herrera Valdivia, Chile "Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
Attachment
On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > So one last remaining improvement was to have VACUUM ignore processes > doing CIC and RC to compute the Xid horizon of tuples to remove. I > think we can do something simple like the attached patch. Regarding the patch: > + if (statusFlags & PROC_IN_SAFE_IC) > + h->catalog_oldest_nonremovable = > + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > + else > + h->data_oldest_nonremovable = h->catalog_oldest_nonremovable = > + TransactionIdOlder(h->data_oldest_nonremovable, xmin); Would this not need to be the following? Right now, it resets potentially older h->catalog_oldest_nonremovable (which is set in the PROC_IN_SAFE_IC branch). > + if (statusFlags & PROC_IN_SAFE_IC) > + h->catalog_oldest_nonremovable = > + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > + else > + { > + h->data_oldest_nonremovable = > + TransactionIdOlder(h->data_oldest_nonremovable, xmin); > + h->catalog_oldest_nonremovable = > + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > + } Prior to reading the rest of my response: I do not understand the intricacies of the VACUUM process, nor the systems of db snapshots, so it's reasonably possible I misunderstand this. But would this patch not allow for tuples to be created, deleted and vacuumed away from a table whilst rebuilding an index on that table, resulting in invalid pointers in the index? Example: 1.) RI starts 2.) PHASE 2: filling the index: 2.1.) scanning the heap (live tuple is cached) < tuple is deleted < last transaction other than RI commits, only snapshot of RI exists < vacuum drops the tuple, and cannot remove it from the new index because this new index is not yet populated. 2.2.) sorting tuples 2.3.) index filled with tuples, incl. deleted tuple 3.) PHASE 3: wait for transactions 4.) PHASE 4: validate does not remove the tuple from the index, because it is not built to do so: it will only insert new tuples. Tuples that are marked for deletion are removed from the index only through VACUUM (and optimistic ALL_DEAD detection). According to my limited knowledge of RI, it requires VACUUM to not run on the table during the initial index build process (which is currently guaranteed through the use of a snapshot). Regards, Matthias van de Meent
On 2021-Jan-18, Matthias van de Meent wrote: > Example: > > 1.) RI starts > 2.) PHASE 2: filling the index: > 2.1.) scanning the heap (live tuple is cached) > < tuple is deleted > < last transaction other than RI commits, only snapshot of RI exists > < vacuum drops the tuple, and cannot remove it from the new index > because this new index is not yet populated. > 2.2.) sorting tuples > 2.3.) index filled with tuples, incl. deleted tuple > 3.) PHASE 3: wait for transactions > 4.) PHASE 4: validate does not remove the tuple from the index, > because it is not built to do so: it will only insert new tuples. > Tuples that are marked for deletion are removed from the index only > through VACUUM (and optimistic ALL_DEAD detection). > > According to my limited knowledge of RI, it requires VACUUM to not run > on the table during the initial index build process (which is > currently guaranteed through the use of a snapshot). VACUUM cannot run concurrently with CIC or RI in a table -- both acquire ShareUpdateExclusiveLock, which conflicts with itself, so this cannot occur. I do wonder if the problem you suggest (or something similar) can occur via HOT pruning, though. -- Álvaro Herrera Valdivia, Chile
On 2021-Jan-18, Matthias van de Meent wrote: > On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Would this not need to be the following? Right now, it resets > potentially older h->catalog_oldest_nonremovable (which is set in the > PROC_IN_SAFE_IC branch). > > > + if (statusFlags & PROC_IN_SAFE_IC) > > + h->catalog_oldest_nonremovable = > > + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > > + else > > + { > > + h->data_oldest_nonremovable = > > + TransactionIdOlder(h->data_oldest_nonremovable, xmin); > > + h->catalog_oldest_nonremovable = > > + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > > + } Oops, you're right. -- Álvaro Herrera Valdivia, Chile
On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote: > > On 2021-Jan-18, Matthias van de Meent wrote: > > > Example: > > > > 1.) RI starts > > 2.) PHASE 2: filling the index: > > 2.1.) scanning the heap (live tuple is cached) > > < tuple is deleted > > < last transaction other than RI commits, only snapshot of RI exists > > < vacuum drops the tuple, and cannot remove it from the new index > > because this new index is not yet populated. > > 2.2.) sorting tuples > > 2.3.) index filled with tuples, incl. deleted tuple > > 3.) PHASE 3: wait for transactions > > 4.) PHASE 4: validate does not remove the tuple from the index, > > because it is not built to do so: it will only insert new tuples. > > Tuples that are marked for deletion are removed from the index only > > through VACUUM (and optimistic ALL_DEAD detection). > > > > According to my limited knowledge of RI, it requires VACUUM to not run > > on the table during the initial index build process (which is > > currently guaranteed through the use of a snapshot). > > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot > occur. Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock. Are there no other ways that pages are optimistically pruned? But the base case still stands, ignoring CIC snapshots in would give the semantic of all_dead to tuples that are actually still considered alive in some context, and should not yet be deleted (you're deleting data from an in-use snapshot). Any local pruning optimizations using all_dead mechanics now cannot be run on the table unless they hold an ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms (other than below). > I do wonder if the problem you suggest (or something similar) can occur > via HOT pruning, though. It could not, at least not at the current HEAD, as only one tuple in a HOT-chain can be alive at one point, and all indexes point to the root of the HOT-chain, which is never HOT-pruned. See also the src/backend/access/heap/README.HOT. Regards, Matthias van de Meent
On Tue, 19 Jan 2021 at 21:59, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote: > > > > On 2021-Jan-18, Matthias van de Meent wrote: > > > > > Example: > > > > > > 1.) RI starts > > > 2.) PHASE 2: filling the index: > > > 2.1.) scanning the heap (live tuple is cached) > > > < tuple is deleted > > > < last transaction other than RI commits, only snapshot of RI exists > > > < vacuum drops the tuple, and cannot remove it from the new index > > > because this new index is not yet populated. > > > 2.2.) sorting tuples > > > 2.3.) index filled with tuples, incl. deleted tuple > > > 3.) PHASE 3: wait for transactions > > > 4.) PHASE 4: validate does not remove the tuple from the index, > > > because it is not built to do so: it will only insert new tuples. > > > Tuples that are marked for deletion are removed from the index only > > > through VACUUM (and optimistic ALL_DEAD detection). > > > > > > According to my limited knowledge of RI, it requires VACUUM to not run > > > on the table during the initial index build process (which is > > > currently guaranteed through the use of a snapshot). > > > > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire > > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot > > occur. > > Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock. > Are there no other ways that pages are optimistically pruned? > > But the base case still stands, ignoring CIC snapshots in would give > the semantic of all_dead to tuples that are actually still considered > alive in some context, and should not yet be deleted (you're deleting > data from an in-use snapshot). Any local pruning optimizations using > all_dead mechanics now cannot be run on the table unless they hold an > ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms > (other than below). Re-thinking this, and after some research: Is the behaviour of "any process that invalidates TIDs in this table (that could be in an index on this table) always holds a lock that conflicts with CIC/RiC on that table" a requirement of tableams, or is it an implementation-detail? If it is a requirement, then this patch is a +1 for me (and that requirement should be documented in such case), otherwise a -1 while there is no mechanism in place to remove concurrently-invalidated TIDs from CIC-ed/RiC-ed indexes. This concurrently-invalidated check could be done through e.g. updating validate_index to have one more phase that removes unknown / incorrect TIDs from the index. As a note: index insertion logic would then also have to be able to handle duplicate TIDs in the index. Regards, Matthias van de Meent Regards, Matthias van de Meent
On Thu, Jan 21, 2021 at 3:08 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Re-thinking this, and after some research: > > Is the behaviour of "any process that invalidates TIDs in this table > (that could be in an index on this table) always holds a lock that > conflicts with CIC/RiC on that table" a requirement of tableams, or is > it an implementation-detail? > > If it is a requirement, then this patch is a +1 for me (and that > requirement should be documented in such case), otherwise a -1 while > there is no mechanism in place to remove concurrently-invalidated TIDs > from CIC-ed/RiC-ed indexes. I don't believe that the table AM machinery presumes that every kind of table AM necessarily has to support VACUUM; or at least, I don't think the original version presumed that. However, if you want it to work some other way, there's not really any infrastructure for that, either. Like, if you want to run your own background workers to clean up your table, you could, but you'd have to arrange that yourself. If you want a SWEEP command or an OPTIMIZE TABLE command instead of VACUUM, you'd have to hack the grammar. Likewise, I don't know that there's any guarantee that CLUSTER would work on a table of any old AM either, or that it would do anything useful. But having said that, if you have a table AM that *does* support VACUUM and CLUSTER, I think it would have to treat TIDs pretty much just as we do today. Almost by definition, they have to live with the way the rest of the system works. TIDs have to be 6 bytes, and lock levels can't be changed, because there's nothing in table AM that would let you tinker with that stuff. The table AM can opt out of that machinery altogether in favor of just throwing an error, but it can't make it work differently unless somebody extends the API. It's possible that this might suck a lot for some AMs. For instance, if your AM is an in-memory btree, you might want CLUSTER to work in place, rather than by copying the whole relation file, but I doubt it's possible to make that work using the APIs as they exist. Maybe someday we'll have better ones, but right now we don't. As far as the specific question asked here, I don't really understand how it could ever work any differently. If you want to invalidate some TIDs in such a way that they can be reused by unrelated tuples - in the case of the heap this would be by making the line pointers LP_UNUSED - that has to be coordinated with the indexing machinery. I can imagine a table AM that never reuses TIDs - especially if we eventually have TIDs > 6 bytes - but I can't imagine a table AM that reuses TIDs that might still be present in the index without telling the index. And that seems to be what is being proposed here: if CIC or RiC could be putting TIDs into indexes while at the same time some of those very same TIDs were being made available for reuse, that would be chaos, and right now we have no mechanism other than the lock level to prevent that. We could invent one, maybe, but it doesn't exist today, and zheap never tried to invent anything like that. I agree that, if we do this, the comment should make clear that the CIC/RiC lock has to conflict with VACUUM to avoid indexing tuples that VACUUM is busy marking LP_UNUSED, and that this is precisely because other backends will ignore the CIC/RiC snapshot while determining whether a tuple is live. I don't think this is the case. Hypothetically speaking, suppose we rejected this patch. Then, suppose someone proposed to replace ShareUpdateExclusiveLock with two new lock levels VacuumTuplesLock and IndexTheRelationLock each of which conflicts with itself and all other lock levels with which ShareUpdateExclusiveLock conflicts, but the two don't conflict with each other. AFAIK, that patch would be acceptable today and unacceptable after this change. While I'm not arguing that as a reason to reject this patch, it's not impossible that somebody could: they'd just need to argue that the separated lock levels would have greater value than this patch. I don't think I believe that, but it's not a totally crazy suggestion. My main point though is that this meaningfully changing some assumptions about how the system works, and we should be sure to be clear about that. I looked at the patch but don't really understand it; the code in this area seems to have changed a lot since I last looked at it. So, I'm just commenting mostly on the specific question that was asked, and a little bit on the theory of the patch, but without expressing an opinion on the implementation. -- Robert Haas EDB: http://www.enterprisedb.com