Thread: Using indexUnchanged with nbtree
Seems like we can skip the uniqueness check if indexUnchanged, which will speed up non-HOT UPDATEs on tables with B-Trees. Passes tests. Comments? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Mon, Jun 21, 2021 at 5:31 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > Seems like we can skip the uniqueness check if indexUnchanged, which > will speed up non-HOT UPDATEs on tables with B-Trees. I thought about doing this myself. Never got as far as thinking about the correctness implications in detail. One thing that I'm concerned about is LP_DEAD bit setting inside _bt_check_unique(), which isn't going to take place when the optimization from the patch is applied. That definitely used to be way more important than kill_prior_tuple-based LP_DEAD bit setting, which has real problems with non-HOT updates [1]. _bt_check_unique() clearly makes up for that in the case of unique indexes, at least for many years. On the other hand my thinking here might well be outdated, because of course bottom-up index deletion exists now. You're using indexUnchanged here, which is used to trigger bottom-up index deletion passes. Maybe that's enough for it to not matter now, meaning that the LP_DEAD bit stuff is not a real blocker here. Offhand I'm quite unsure. [1] https://www.postgresql.org/message-id/flat/CAH2-Wz%3DSfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA%40mail.gmail.com#b20ead9675225f12b6a80e53e19eed9d -- Peter Geoghegan
On Wed, Jun 23, 2021 at 5:17 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Jun 21, 2021 at 5:31 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > Seems like we can skip the uniqueness check if indexUnchanged, which > > will speed up non-HOT UPDATEs on tables with B-Trees. > > I thought about doing this myself. Never got as far as thinking about > the correctness implications in detail. > > One thing that I'm concerned about is LP_DEAD bit setting inside > _bt_check_unique(), which isn't going to take place when the > optimization from the patch is applied. That definitely used to be way > more important than kill_prior_tuple-based LP_DEAD bit setting, which > has real problems with non-HOT updates [1]. _bt_check_unique() clearly > makes up for that in the case of unique indexes, at least for many > years. > > On the other hand my thinking here might well be outdated, because of > course bottom-up index deletion exists now. You're using > indexUnchanged here, which is used to trigger bottom-up index deletion > passes. Maybe that's enough for it to not matter now, meaning that the > LP_DEAD bit stuff is not a real blocker here. Offhand I'm quite > unsure. You're right that skipping the check might also skip killing a prior row version, but it doesn't prevent later scans from killing them, so there is no correctness aspect to that. In the case of a non-HOT UPDATE the backend will see the index entry for the old row version and then check it, pointlessly. Since that has just been modified, that won't ever be killed, so skipping the check makes sense in those cases. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, Jun 23, 2021 at 9:31 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > You're right that skipping the check might also skip killing a prior > row version, but it doesn't prevent later scans from killing them, so > there is no correctness aspect to that. Probably not, no. I'll assume for now that there is no correctness issue. > In the case of a non-HOT UPDATE the backend will see the index entry > for the old row version and then check it, pointlessly. Since that has > just been modified, that won't ever be killed, so skipping the check > makes sense in those cases. I agree that the check itself is pointless here. But that in itself doesn't make the call to _bt_check_unique() useless. It might still manage to set LP_DEAD bits when nothing else will. I realize that the original reason for setting LP_DEAD bits in _bt_check_unique() was something like "well, might as well do this here too". But I believe that LP_DEAD bit setting inside _bt_check_unique() is nevertheless often more valuable than the better known kill_prior_tuple mechanism. I have seen clear and convincing examples of this in the past. Might not really be true anymore. Another thing is _bt_findinsertloc() and _bt_delete_or_dedup_one_page(), which themselves use the checkingunique flag that you're changing the value of. There could also be unintended side-effects there. OTOH they also use indexUnchanged too, so even if there is a problem it might be fixable. -- Peter Geoghegan
On Wed, Jun 23, 2021 at 5:42 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Jun 23, 2021 at 9:31 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > You're right that skipping the check might also skip killing a prior > > row version, but it doesn't prevent later scans from killing them, so > > there is no correctness aspect to that. > > Probably not, no. I'll assume for now that there is no correctness issue. > > > In the case of a non-HOT UPDATE the backend will see the index entry > > for the old row version and then check it, pointlessly. Since that has > > just been modified, that won't ever be killed, so skipping the check > > makes sense in those cases. > > I agree that the check itself is pointless here. But that in itself > doesn't make the call to _bt_check_unique() useless. It might still > manage to set LP_DEAD bits when nothing else will. This case occurs when we are doing non-HOT UPDATEs. That command is searched, so the scan will already have touched the heap and almost certainly the index also, setting any LP_DEAD bits already in the most frequent case. So the check isn't going to do anything useful in the vast majority of cases, which is why its OK to remove it. -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Jun 24, 2021 at 5:39 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > This case occurs when we are doing non-HOT UPDATEs. That command is > searched, so the scan will already have touched the heap and almost > certainly the index also, setting any LP_DEAD bits already in the most > frequent case. But it won't, because the restriction that I described with non-HOT updates in kill_prior_tuple in that old thread I linked to. This has been the case since commit 2ed5b87f96d from Postgres 9.5. This probably should probably be fixed, somehow, but for now I don't think you can assume anything about LP_DEAD bits being set -- they're clearly not set with a non-HOT update when the UPDATE's ModifyTable node is fed by a scan of the same index (unless we reach _bt_check_unique() because it's a unique index). -- Peter Geoghegan
On Fri, Jun 25, 2021 at 2:34 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jun 24, 2021 at 5:39 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > This case occurs when we are doing non-HOT UPDATEs. That command is > > searched, so the scan will already have touched the heap and almost > > certainly the index also, setting any LP_DEAD bits already in the most > > frequent case. > > But it won't, because the restriction that I described with non-HOT > updates in kill_prior_tuple in that old thread I linked to. This has > been the case since commit 2ed5b87f96d from Postgres 9.5. This > probably should probably be fixed, somehow, but for now I don't think > you can assume anything about LP_DEAD bits being set -- they're > clearly not set with a non-HOT update when the UPDATE's ModifyTable > node is fed by a scan of the same index (unless we reach > _bt_check_unique() because it's a unique index). Seems a little bizarre to have _bt_check_unique() call back into the heap block we literally just unpinned. This is another case of the UPDATE scan and later heap/index insertions not working together very well. This makes this case even harder to solve: https://www.postgresql.org/message-id/CA%2BU5nMKzsjwcpSoqLsfqYQRwW6udPtgBdqXz34fUwaVfgXKWhA%40mail.gmail.com If an UPDATE interferes with its own ability to kill_prior_tuple(), then we should fix it, not allow pointless work to be performed somewhere else instead just because it has some beneficial side effect. If an UPDATE scans via a index and remembers the block in so->currPos.currPage then we could use that to optimize the re-insertion by starting the insertion scan at that block (since we know the live unique key is either there or somewhere to the right). By connecting those together, we would then be able to know that the change in LSN was caused by ourself and allow the items to be killed correctly at that time. Do you think there is benefit in having PK UPDATEs as a special plan that links these things more closely together? -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, Jun 25, 2021 at 1:43 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > Seems a little bizarre to have _bt_check_unique() call back into the > heap block we literally just unpinned. I suppose it is a little bizarre. > This is another case of the UPDATE scan and later heap/index > insertions not working together very well. > This makes this case even harder to solve: > https://www.postgresql.org/message-id/CA%2BU5nMKzsjwcpSoqLsfqYQRwW6udPtgBdqXz34fUwaVfgXKWhA%40mail.gmail.com I wasn't aware of that thread, but I suspected that something like that was going on in some cases myself. > If an UPDATE interferes with its own ability to kill_prior_tuple(), > then we should fix it, not allow pointless work to be performed > somewhere else instead just because it has some beneficial side > effect. Definitely true. But the fact is that this is where we are today, and that complicates this business with bypassing _bt_check_unique(). > If an UPDATE scans via a index and remembers the block in > so->currPos.currPage then we could use that to optimize the > re-insertion by starting the insertion scan at that block (since we > know the live unique key is either there or somewhere to the right). > By connecting those together, we would then be able to know that the > change in LSN was caused by ourself and allow the items to be killed > correctly at that time. > > Do you think there is benefit in having PK UPDATEs as a special plan > that links these things more closely together? I think that it might be worth hinting to the index scan that it is feeding a ModifyTable node, and that it should not drop its pin per the optimization added to avoid blocking VACUUM (in commit 2ed5b87f96d). We can just not do that if for whatever reason we don't think it's worth it - the really important cases for that optimization involve cursors, things like that. It's not like the code that deals with this (that notices LSN change) cannot just recheck by going to the heap. The chances of it really being VACUUM are generally extremely low. OTOH I wonder if the whole idea of holding a pin on a leaf page to block VACUUM is one that should be removed entirely. -- Peter Geoghegan
On Fri, Jun 25, 2021 at 4:44 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Jun 25, 2021 at 1:43 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > Seems a little bizarre to have _bt_check_unique() call back into the > > heap block we literally just unpinned. > > I suppose it is a little bizarre. > > > This is another case of the UPDATE scan and later heap/index > > insertions not working together very well. > > This makes this case even harder to solve: > > https://www.postgresql.org/message-id/CA%2BU5nMKzsjwcpSoqLsfqYQRwW6udPtgBdqXz34fUwaVfgXKWhA%40mail.gmail.com > > I wasn't aware of that thread, but I suspected that something like > that was going on in some cases myself. > > > If an UPDATE interferes with its own ability to kill_prior_tuple(), > > then we should fix it, not allow pointless work to be performed > > somewhere else instead just because it has some beneficial side > > effect. > > Definitely true. But the fact is that this is where we are today, and > that complicates this business with bypassing _bt_check_unique(). > > > If an UPDATE scans via a index and remembers the block in > > so->currPos.currPage then we could use that to optimize the > > re-insertion by starting the insertion scan at that block (since we > > know the live unique key is either there or somewhere to the right). > > By connecting those together, we would then be able to know that the > > change in LSN was caused by ourself and allow the items to be killed > > correctly at that time. > > > > Do you think there is benefit in having PK UPDATEs as a special plan > > that links these things more closely together? > > I think that it might be worth hinting to the index scan that it is > feeding a ModifyTable node, and that it should not drop its pin per > the optimization added to avoid blocking VACUUM (in commit > 2ed5b87f96d). We can just not do that if for whatever reason we don't > think it's worth it - the really important cases for that optimization > involve cursors, things like that. > > It's not like the code that deals with this (that notices LSN change) > cannot just recheck by going to the heap. The chances of it really > being VACUUM are generally extremely low. > > OTOH I wonder if the whole idea of holding a pin on a leaf page to > block VACUUM is one that should be removed entirely. Definitely some good ideas here. I'm out of time to do anything for this CF, so I've moved this back to later CF. I'm planning to work on this more, but I won't try to fold in all of your ideas above. Not cos they are bad ones, just there is enough room for 2-4 related patches here. -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Jul 1, 2021 at 8:23 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > Definitely some good ideas here. I have been meaning to come up with some kind of solution to the problem of "self-blocking" LP_DEAD bit setting within the kill_prior_tuple mechanism. It's hard to argue against that. > I'm out of time to do anything for this CF, so I've moved this back to later CF. > > I'm planning to work on this more, but I won't try to fold in all of > your ideas above. Not cos they are bad ones, just there is enough room > for 2-4 related patches here. I'm a little concerned about relying on the indexUnchanged flag like this. It is currently just supposed to be a hint, but your proposal makes it truly critical. Currently the consequences are no worse than the risk that we'll maybe waste some cycles on the occasional useless bottom-up index deletion pass. With your patch it absolutely cannot be falsely set (though it should still be okay if it is falsely unset). Of course it should be correct (with or without this new optimization), but the difference still matters. And so I think that there ought to be a clear benefit to users from the new optimization, that justifies accepting the new risk. Some kind of benchmark showing an improvement in latency and/or throughput. Something like that. Doesn't have to be a huge improvement. -- Peter Geoghegan
On Thu, Jul 01, 2021 at 09:22:38AM -0700, Peter Geoghegan wrote: > On Thu, Jul 1, 2021 at 8:23 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > Definitely some good ideas here. > > I have been meaning to come up with some kind of solution to the > problem of "self-blocking" LP_DEAD bit setting within the > kill_prior_tuple mechanism. It's hard to argue against that. > > > I'm out of time to do anything for this CF, so I've moved this back to later CF. > > > > I'm planning to work on this more, but I won't try to fold in all of > > your ideas above. Not cos they are bad ones, just there is enough room > > for 2-4 related patches here. > > I'm a little concerned about relying on the indexUnchanged flag like > this. It is currently just supposed to be a hint, but your proposal > makes it truly critical. Currently the consequences are no worse than > the risk that we'll maybe waste some cycles on the occasional useless > bottom-up index deletion pass. With your patch it absolutely cannot be > falsely set (though it should still be okay if it is falsely unset). > > Of course it should be correct (with or without this new > optimization), but the difference still matters. And so I think that > there ought to be a clear benefit to users from the new optimization, > that justifies accepting the new risk. Some kind of benchmark showing > an improvement in latency and/or throughput. Something like that. > Doesn't have to be a huge improvement. > Hi Simon, This has been stalled since July, and based on Peter's comment i feel we should mark this as RwF. Which i'm doing now. Please feel free to resubmit for Next Commitfest. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Fri, 1 Oct 2021 at 15:20, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > This has been stalled since July, and based on Peter's comment i feel we > should mark this as RwF. Which i'm doing now. > > Please feel free to resubmit for Next Commitfest. Agreed, thank you Jaime. -- Simon Riggs http://www.EnterpriseDB.com/