Thread: Using indexUnchanged with nbtree

Using indexUnchanged with nbtree

From

Simon Riggs

Date:

21 June 2021, 12:31:07

Seems like we can skip the uniqueness check if indexUnchanged, which
will speed up non-HOT UPDATEs on tables with B-Trees.

Passes tests.

Comments?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

skip_nonHOT_btree_unique_check.v1.patch

Re: Using indexUnchanged with nbtree

From

Peter Geoghegan

Date:

23 June 2021, 16:17:30

On Mon, Jun 21, 2021 at 5:31 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> Seems like we can skip the uniqueness check if indexUnchanged, which
> will speed up non-HOT UPDATEs on tables with B-Trees.

I thought about doing this myself. Never got as far as thinking about
the correctness implications in detail.

One thing that I'm concerned about is LP_DEAD bit setting inside
_bt_check_unique(), which isn't going to take place when the
optimization from the patch is applied. That definitely used to be way
more important than kill_prior_tuple-based LP_DEAD bit setting, which
has real problems with non-HOT updates [1]. _bt_check_unique() clearly
makes up for that in the case of unique indexes, at least for many
years.

On the other hand my thinking here might well be outdated, because of
course bottom-up index deletion exists now. You're using
indexUnchanged here, which is used to trigger bottom-up index deletion
passes. Maybe that's enough for it to not matter now, meaning that the
LP_DEAD bit stuff is not a real blocker here. Offhand I'm quite
unsure.

[1]
https://www.postgresql.org/message-id/flat/CAH2-Wz%3DSfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA%40mail.gmail.com#b20ead9675225f12b6a80e53e19eed9d
-- 
Peter Geoghegan

Re: Using indexUnchanged with nbtree

From

Simon Riggs

Date:

23 June 2021, 16:31:02

On Wed, Jun 23, 2021 at 5:17 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Jun 21, 2021 at 5:31 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > Seems like we can skip the uniqueness check if indexUnchanged, which
> > will speed up non-HOT UPDATEs on tables with B-Trees.
>
> I thought about doing this myself. Never got as far as thinking about
> the correctness implications in detail.
>
> One thing that I'm concerned about is LP_DEAD bit setting inside
> _bt_check_unique(), which isn't going to take place when the
> optimization from the patch is applied. That definitely used to be way
> more important than kill_prior_tuple-based LP_DEAD bit setting, which
> has real problems with non-HOT updates [1]. _bt_check_unique() clearly
> makes up for that in the case of unique indexes, at least for many
> years.
>
> On the other hand my thinking here might well be outdated, because of
> course bottom-up index deletion exists now. You're using
> indexUnchanged here, which is used to trigger bottom-up index deletion
> passes. Maybe that's enough for it to not matter now, meaning that the
> LP_DEAD bit stuff is not a real blocker here. Offhand I'm quite
> unsure.

You're right that skipping the check might also skip killing a prior
row version, but it doesn't prevent later scans from killing them, so
there is no correctness aspect to that.

In the case of a non-HOT UPDATE the backend will see the index entry
for the old row version and then check it, pointlessly. Since that has
just been modified, that won't ever be killed, so skipping the check
makes sense in those cases.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Using indexUnchanged with nbtree

From

Peter Geoghegan

Date:

23 June 2021, 16:42:21

On Wed, Jun 23, 2021 at 9:31 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> You're right that skipping the check might also skip killing a prior
> row version, but it doesn't prevent later scans from killing them, so
> there is no correctness aspect to that.

Probably not, no. I'll assume for now that there is no correctness issue.

> In the case of a non-HOT UPDATE the backend will see the index entry
> for the old row version and then check it, pointlessly. Since that has
> just been modified, that won't ever be killed, so skipping the check
> makes sense in those cases.

I agree that the check itself is pointless here. But that in itself
doesn't make the call to _bt_check_unique() useless. It might still
manage to set LP_DEAD bits when nothing else will.

I realize that the original reason for setting LP_DEAD bits in
_bt_check_unique() was something like "well, might as well do this
here too". But I believe that LP_DEAD bit setting inside
_bt_check_unique() is nevertheless often more valuable than the better
known kill_prior_tuple mechanism. I have seen clear and convincing
examples of this in the past. Might not really be true anymore.

Another thing is _bt_findinsertloc() and
_bt_delete_or_dedup_one_page(), which themselves use the
checkingunique flag that you're changing the value of. There could
also be unintended side-effects there. OTOH they also use
indexUnchanged too, so even if there is a problem it might be fixable.

-- 
Peter Geoghegan

Re: Using indexUnchanged with nbtree

From

Simon Riggs

Date:

24 June 2021, 12:39:46

On Wed, Jun 23, 2021 at 5:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Jun 23, 2021 at 9:31 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > You're right that skipping the check might also skip killing a prior
> > row version, but it doesn't prevent later scans from killing them, so
> > there is no correctness aspect to that.
>
> Probably not, no. I'll assume for now that there is no correctness issue.
>
> > In the case of a non-HOT UPDATE the backend will see the index entry
> > for the old row version and then check it, pointlessly. Since that has
> > just been modified, that won't ever be killed, so skipping the check
> > makes sense in those cases.
>
> I agree that the check itself is pointless here. But that in itself
> doesn't make the call to _bt_check_unique() useless. It might still
> manage to set LP_DEAD bits when nothing else will.

This case occurs when we are doing non-HOT UPDATEs. That command is
searched, so the scan will already have touched the heap and almost
certainly the index also, setting any LP_DEAD bits already in the most
frequent case.

So the check isn't going to do anything useful in the vast majority of
cases, which is why its OK to remove it.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Using indexUnchanged with nbtree

From

Peter Geoghegan

Date:

25 June 2021, 01:33:59

On Thu, Jun 24, 2021 at 5:39 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> This case occurs when we are doing non-HOT UPDATEs. That command is
> searched, so the scan will already have touched the heap and almost
> certainly the index also, setting any LP_DEAD bits already in the most
> frequent case.

But it won't, because the restriction that I described with non-HOT
updates in kill_prior_tuple in that old thread I linked to. This has
been the case since commit 2ed5b87f96d from Postgres 9.5. This
probably should probably be fixed, somehow, but for now I don't think
you can assume anything about LP_DEAD bits being set -- they're
clearly not set with a non-HOT update when the UPDATE's ModifyTable
node is fed by a scan of the same index (unless we reach
_bt_check_unique() because it's a unique index).

-- 
Peter Geoghegan

Re: Using indexUnchanged with nbtree

From

Simon Riggs

Date:

25 June 2021, 08:42:56

On Fri, Jun 25, 2021 at 2:34 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jun 24, 2021 at 5:39 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > This case occurs when we are doing non-HOT UPDATEs. That command is
> > searched, so the scan will already have touched the heap and almost
> > certainly the index also, setting any LP_DEAD bits already in the most
> > frequent case.
>
> But it won't, because the restriction that I described with non-HOT
> updates in kill_prior_tuple in that old thread I linked to. This has
> been the case since commit 2ed5b87f96d from Postgres 9.5. This
> probably should probably be fixed, somehow, but for now I don't think
> you can assume anything about LP_DEAD bits being set -- they're
> clearly not set with a non-HOT update when the UPDATE's ModifyTable
> node is fed by a scan of the same index (unless we reach
> _bt_check_unique() because it's a unique index).

Seems a little bizarre to have _bt_check_unique() call back into the
heap block we literally just unpinned.
This is another case of the UPDATE scan and later heap/index
insertions not working together very well.
This makes this case even harder to solve:
https://www.postgresql.org/message-id/CA%2BU5nMKzsjwcpSoqLsfqYQRwW6udPtgBdqXz34fUwaVfgXKWhA%40mail.gmail.com

If an UPDATE interferes with its own ability to kill_prior_tuple(),
then we should fix it, not allow pointless work to be performed
somewhere else instead just because it has some beneficial side
effect.

If an UPDATE scans via a index and remembers the block in
so->currPos.currPage then we could use that to optimize the
re-insertion by starting the insertion scan at that block (since we
know the live unique key is either there or somewhere to the right).
By connecting those together, we would then be able to know that the
change in LSN was caused by ourself and allow the items to be killed
correctly at that time.

Do you think there is benefit in having PK UPDATEs as a special plan
that links these things more closely together?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Using indexUnchanged with nbtree

From

Peter Geoghegan

Date:

25 June 2021, 15:43:55

On Fri, Jun 25, 2021 at 1:43 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> Seems a little bizarre to have _bt_check_unique() call back into the
> heap block we literally just unpinned.

I suppose it is a little bizarre.

> This is another case of the UPDATE scan and later heap/index
> insertions not working together very well.
> This makes this case even harder to solve:
> https://www.postgresql.org/message-id/CA%2BU5nMKzsjwcpSoqLsfqYQRwW6udPtgBdqXz34fUwaVfgXKWhA%40mail.gmail.com

I wasn't aware of that thread, but I suspected that something like
that was going on in some cases myself.

> If an UPDATE interferes with its own ability to kill_prior_tuple(),
> then we should fix it, not allow pointless work to be performed
> somewhere else instead just because it has some beneficial side
> effect.

Definitely true. But the fact is that this is where we are today, and
that complicates this business with bypassing _bt_check_unique().

> If an UPDATE scans via a index and remembers the block in
> so->currPos.currPage then we could use that to optimize the
> re-insertion by starting the insertion scan at that block (since we
> know the live unique key is either there or somewhere to the right).
> By connecting those together, we would then be able to know that the
> change in LSN was caused by ourself and allow the items to be killed
> correctly at that time.
>
> Do you think there is benefit in having PK UPDATEs as a special plan
> that links these things more closely together?

I think that it might be worth hinting to the index scan that it is
feeding a ModifyTable node, and that it should not drop its pin per
the optimization added to avoid blocking VACUUM (in commit
2ed5b87f96d). We can just not do that if for whatever reason we don't
think it's worth it - the really important cases for that optimization
involve cursors, things like that.

It's not like the code that deals with this (that notices LSN change)
cannot just recheck by going to the heap. The chances of it really
being VACUUM are generally extremely low.

OTOH I wonder if the whole idea of holding a pin on a leaf page to
block VACUUM is one that should be removed entirely.

-- 
Peter Geoghegan

Re: Using indexUnchanged with nbtree

From

Simon Riggs

Date:

01 July 2021, 15:23:02

On Fri, Jun 25, 2021 at 4:44 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Jun 25, 2021 at 1:43 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > Seems a little bizarre to have _bt_check_unique() call back into the
> > heap block we literally just unpinned.
>
> I suppose it is a little bizarre.
>
> > This is another case of the UPDATE scan and later heap/index
> > insertions not working together very well.
> > This makes this case even harder to solve:
> > https://www.postgresql.org/message-id/CA%2BU5nMKzsjwcpSoqLsfqYQRwW6udPtgBdqXz34fUwaVfgXKWhA%40mail.gmail.com
>
> I wasn't aware of that thread, but I suspected that something like
> that was going on in some cases myself.
>
> > If an UPDATE interferes with its own ability to kill_prior_tuple(),
> > then we should fix it, not allow pointless work to be performed
> > somewhere else instead just because it has some beneficial side
> > effect.
>
> Definitely true. But the fact is that this is where we are today, and
> that complicates this business with bypassing _bt_check_unique().
>
> > If an UPDATE scans via a index and remembers the block in
> > so->currPos.currPage then we could use that to optimize the
> > re-insertion by starting the insertion scan at that block (since we
> > know the live unique key is either there or somewhere to the right).
> > By connecting those together, we would then be able to know that the
> > change in LSN was caused by ourself and allow the items to be killed
> > correctly at that time.
> >
> > Do you think there is benefit in having PK UPDATEs as a special plan
> > that links these things more closely together?
>
> I think that it might be worth hinting to the index scan that it is
> feeding a ModifyTable node, and that it should not drop its pin per
> the optimization added to avoid blocking VACUUM (in commit
> 2ed5b87f96d). We can just not do that if for whatever reason we don't
> think it's worth it - the really important cases for that optimization
> involve cursors, things like that.
>
> It's not like the code that deals with this (that notices LSN change)
> cannot just recheck by going to the heap. The chances of it really
> being VACUUM are generally extremely low.
>
> OTOH I wonder if the whole idea of holding a pin on a leaf page to
> block VACUUM is one that should be removed entirely.

Definitely some good ideas here.

I'm out of time to do anything for this CF, so I've moved this back to later CF.

I'm planning to work on this more, but I won't try to fold in all of
your ideas above. Not cos they are bad ones, just there is enough room
for 2-4 related patches here.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Using indexUnchanged with nbtree

From

Peter Geoghegan

Date:

01 July 2021, 16:22:38

On Thu, Jul 1, 2021 at 8:23 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> Definitely some good ideas here.

I have been meaning to come up with some kind of solution to the
problem of "self-blocking" LP_DEAD bit setting within the
kill_prior_tuple mechanism. It's hard to argue against that.

> I'm out of time to do anything for this CF, so I've moved this back to later CF.
>
> I'm planning to work on this more, but I won't try to fold in all of
> your ideas above. Not cos they are bad ones, just there is enough room
> for 2-4 related patches here.

I'm a little concerned about relying on the indexUnchanged flag like
this. It is currently just supposed to be a hint, but your proposal
makes it truly critical. Currently the consequences are no worse than
the risk that we'll maybe waste some cycles on the occasional useless
bottom-up index deletion pass. With your patch it absolutely cannot be
falsely set (though it should still be okay if it is falsely unset).

Of course it should be correct (with or without this new
optimization), but the difference still matters. And so I think that
there ought to be a clear benefit to users from the new optimization,
that justifies accepting the new risk. Some kind of benchmark showing
an improvement in latency and/or throughput. Something like that.
Doesn't have to be a huge improvement.

-- 
Peter Geoghegan

Re: Using indexUnchanged with nbtree

From

Jaime Casanova

Date:

01 October 2021, 14:20:01

On Thu, Jul 01, 2021 at 09:22:38AM -0700, Peter Geoghegan wrote:
> On Thu, Jul 1, 2021 at 8:23 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > Definitely some good ideas here.
> 
> I have been meaning to come up with some kind of solution to the
> problem of "self-blocking" LP_DEAD bit setting within the
> kill_prior_tuple mechanism. It's hard to argue against that.
> 
> > I'm out of time to do anything for this CF, so I've moved this back to later CF.
> >
> > I'm planning to work on this more, but I won't try to fold in all of
> > your ideas above. Not cos they are bad ones, just there is enough room
> > for 2-4 related patches here.
> 
> I'm a little concerned about relying on the indexUnchanged flag like
> this. It is currently just supposed to be a hint, but your proposal
> makes it truly critical. Currently the consequences are no worse than
> the risk that we'll maybe waste some cycles on the occasional useless
> bottom-up index deletion pass. With your patch it absolutely cannot be
> falsely set (though it should still be okay if it is falsely unset).
> 
> Of course it should be correct (with or without this new
> optimization), but the difference still matters. And so I think that
> there ought to be a clear benefit to users from the new optimization,
> that justifies accepting the new risk. Some kind of benchmark showing
> an improvement in latency and/or throughput. Something like that.
> Doesn't have to be a huge improvement.
> 

Hi Simon,

This has been stalled since July, and based on Peter's comment i feel we
should mark this as RwF. Which i'm doing now.

Please feel free to resubmit for Next Commitfest.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: Using indexUnchanged with nbtree

From

Simon Riggs

Date:

01 October 2021, 14:21:40

On Fri, 1 Oct 2021 at 15:20, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

> This has been stalled since July, and based on Peter's comment i feel we
> should mark this as RwF. Which i'm doing now.
>
> Please feel free to resubmit for Next Commitfest.

Agreed, thank you Jaime.

-- 
Simon Riggs                http://www.EnterpriseDB.com/