Thread: POC: Lock updated tuples in tuple_update() and tuple_delete()

POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

01 July 2022, 14:18:37

Hackers,

When working in the read committed transaction isolation mode
(default), we have the following sequence of actions when
tuple_update() or tuple_delete() find concurrently updated tuple.

1. tuple_update()/tuple_delete() returns TM_Updated
2. tuple_lock()
3. Re-evaluate plan qual (recheck if we still need to update/delete
and calculate the new tuple for update)
4. tuple_update()/tuple_delete() (this time should be successful,
since we've previously locked the tuple).

I wonder if we should merge steps 1 and 2. We could save some efforts
already done during tuple_update()/tuple_delete() for locking the
tuple. In heap table access method, we've to start tuple_lock() with
the first tuple in the chain, but tuple_update()/tuple_delete()
already visited it. For undo-based table access methods,
tuple_update()/tuple_delete() should start from the last version, why
don't place the tuple lock immediately once a concurrent update is
detected. I think this patch should have some performance benefits on
high concurrency.

Also, the patch simplifies code in nodeModifyTable.c getting rid of
the nested case. I also get rid of extra
table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
tuple, when it should be exactly the same tuple we've just locked.

I'm going to check the performance impact. Thoughts and feedback are welcome.

------
Regards,
Alexander Korotkov

Attachment

0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Aleksander Alekseev

Date:

05 July 2022, 16:38:28

Hi Alexander,

> Thoughts and feedback are welcome.

I took some preliminary look at the patch. I'm going to need more time
to meditate on the proposed changes and to figure out the performance
impact.

So far I just wanted to let you know that the patch applied OK for me
and passed all the tests. The `else` branch here seems to be redundant
here:

+        if (!updated)
+        {
+            /* Should not encounter speculative tuple on recheck */
+            Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
-             ReleaseBuffer(buffer);
+            ReleaseBuffer(buffer);
+        }
+        else
+        {
+            updated = false;
+        }

Also I wish there were a little bit more comments since some of the
proposed changes are not that straightforward.

-- 
Best regards,
Aleksander Alekseev

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Aleksander Alekseev

Date:

05 July 2022, 16:41:27

Hi again,

> +        if (!updated)
> +        {
> +            /* Should not encounter speculative tuple on recheck */
> +            Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
> -             ReleaseBuffer(buffer);
> +            ReleaseBuffer(buffer);
> +        }
> +        else
> +        {
> +            updated = false;
> +        }

OK, I got confused here. I suggest changing the if(!...) { .. } else {
.. } code to if() { .. } else { .. } here.

-- 
Best regards,
Aleksander Alekseev

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Aleksander Alekseev

Date:

07 July 2022, 12:42:52

Hi Alexander,

> I'm going to need more time to meditate on the proposed changes and to figure out the performance impact.

OK, turned out this patch is slightly more complicated than I
initially thought, but I think I managed to get some vague
understanding of what's going on.

I tried to reproduce the case with concurrently updated tuples you
described on the current `master` branch. I created a new table:

```
CREATE TABLE phonebook(
  "id" SERIAL PRIMARY KEY NOT NULL,
  "name" NAME NOT NULL,
  "phone" INT NOT NULL);

INSERT INTO phonebook ("name", "phone")
VALUES ('Alice', 123), ('Bob', 456), ('Charlie', 789);
```

Then I opened two sessions and attached them with LLDB. I did:

```
(lldb) b heapam_tuple_update
(lldb) c
```

... in both cases because I wanted to see two calls (steps 2 and 4) to
heapam_tuple_update() and check the return values.

Then I did:

```
session1 =# BEGIN;
session2 =# BEGIN;
session1 =# UPDATE phonebook SET name = 'Alex' WHERE name = 'Alice';
```

This update succeeds and I see heapam_tuple_update() returning TM_Ok.

```
session2 =# UPDATE phonebook SET name = 'Alfred' WHERE name = 'Alice';
```

This update hangs on a lock.

```
session1 =# COMMIT;
```

Now session2 unfreezes and returns 'UPDATE 0'. table_tuple_update()
was called once and returned TM_Updated. Also session2 sees an updated
tuple now. So apparently the visibility check (step 3) didn't pass.

At this point I'm slightly confused. I don't see where a performance
improvement is expected, considering that session2 gets blocked until
session1 commits.

Could you please walk me through here? Am I using the right test case
or maybe you had another one in mind? Which steps do you consider
expensive and expect to be mitigated by the patch?

-- 
Best regards,
Aleksander Alekseev

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

12 July 2022, 13:29:44

Hi Aleksander!

Thank you for your efforts reviewing this patch.

On Thu, Jul 7, 2022 at 12:43 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> > I'm going to need more time to meditate on the proposed changes and to figure out the performance impact.
>
> OK, turned out this patch is slightly more complicated than I
> initially thought, but I think I managed to get some vague
> understanding of what's going on.
>
> I tried to reproduce the case with concurrently updated tuples you
> described on the current `master` branch. I created a new table:
>
> ```
> CREATE TABLE phonebook(
>   "id" SERIAL PRIMARY KEY NOT NULL,
>   "name" NAME NOT NULL,
>   "phone" INT NOT NULL);
>
> INSERT INTO phonebook ("name", "phone")
> VALUES ('Alice', 123), ('Bob', 456), ('Charlie', 789);
> ```
>
> Then I opened two sessions and attached them with LLDB. I did:
>
> ```
> (lldb) b heapam_tuple_update
> (lldb) c
> ```
>
> ... in both cases because I wanted to see two calls (steps 2 and 4) to
> heapam_tuple_update() and check the return values.
>
> Then I did:
>
> ```
> session1 =# BEGIN;
> session2 =# BEGIN;
> session1 =# UPDATE phonebook SET name = 'Alex' WHERE name = 'Alice';
> ```
>
> This update succeeds and I see heapam_tuple_update() returning TM_Ok.
>
> ```
> session2 =# UPDATE phonebook SET name = 'Alfred' WHERE name = 'Alice';
> ```
>
> This update hangs on a lock.
>
> ```
> session1 =# COMMIT;
> ```
>
> Now session2 unfreezes and returns 'UPDATE 0'. table_tuple_update()
> was called once and returned TM_Updated. Also session2 sees an updated
> tuple now. So apparently the visibility check (step 3) didn't pass.

Yes.  But it's not exactly a visibility check.  Session2 re-evaluates
WHERE condition on the most recent row version (bypassing snapshot).
WHERE condition is not true anymore, thus the row is not upated.

> At this point I'm slightly confused. I don't see where a performance
> improvement is expected, considering that session2 gets blocked until
> session1 commits.
>
> Could you please walk me through here? Am I using the right test case
> or maybe you had another one in mind? Which steps do you consider
> expensive and expect to be mitigated by the patch?

This patch is not intended to change some high-level logic. On the
high level transaction, which updated the row, still holding a lock on
it until finished. The possible positive performance impact I expect
from doing the work of two calls tuple_update() and tuple_lock() in
the one call of tuple_update().  If we do this in one call, we can
save some efforts, for instance lock the same buffer once not twice.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

29 July 2022, 11:11:31

Hi, hackers!

I ran the following benchmark on master branch (15) vs patch (15-lock):

On the 36-vcore AWS server, I've run an UPDATE-only pgbench script with 50 connections on pgbench_tellers with 100 rows. The idea was to introduce as much as possible concurrency for updates but avoid much clients being in a wait state.

Indexes were not built to avoid index-update-related delays.

Done 2 runs each consisting of 6 series of updates (1st run: master-patch-master-patch-master-patch, 2nd run patch-master-patch-master-patch-master)

Each series started a fresh server and did VACUUM FULL to avoid bloating heap relation after the previous series to affect the current. It collected data for 10 minutes with first-minute data being dropped.

Disk-related operations were suppressed where possible (WAL, fsync etc.)

postgresql.conf:

fsync = off

autovacuum = off

full_page_writes = off

max_worker_processes = 99

max_parallel_workers = 99

max_connections = 100

shared_buffers = 4096MB

work_mem = 50MB

Attached are pictures of 2 runs, shell script, and SQL script that were running.

According to htop all 36-cores were loaded to ~94% in each series

I'm not sure how to interpret the results. Seems like a TPS difference between runs is significant, with average performance with lock-patch (15lock) seeming a little bit faster than the master (15).

Could someone try to repeat this on another server? What do you think?

Best regards,
Pavel Borisov,

Supabase, https://supabase.com/

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

29 July 2022, 11:35:33

Hi, Pavel!

On Fri, Jul 29, 2022 at 11:12 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I ran the following benchmark on master branch (15) vs patch (15-lock):
>
> On the 36-vcore AWS server, I've run an UPDATE-only pgbench script with 50 connections on pgbench_tellers with 100
rows.The idea was to introduce as much as possible concurrency for updates but avoid much clients being in a wait
state.
> Indexes were not built to avoid index-update-related delays.
> Done 2 runs each consisting of 6 series of updates (1st run: master-patch-master-patch-master-patch, 2nd run
patch-master-patch-master-patch-master)
> Each series started a fresh server and did VACUUM FULL to avoid bloating heap relation after the previous series to
affectthe current. It collected data for 10 minutes with first-minute data being dropped.
 
> Disk-related operations were suppressed where possible (WAL, fsync etc.)
>
> postgresql.conf:
> fsync = off
> autovacuum = off
> full_page_writes = off
> max_worker_processes = 99
> max_parallel_workers = 99
> max_connections = 100
> shared_buffers = 4096MB
> work_mem = 50MB
>
> Attached are pictures of 2 runs, shell script, and SQL script that were running.
> According to htop all 36-cores were loaded to ~94% in each series
>
> I'm not sure how to interpret the results. Seems like a TPS difference between runs is significant, with average
performancewith lock-patch (15lock) seeming a little bit faster than the master (15).
 
>
> Could someone try to repeat this on another server? What do you think?

Thank you for your benchmarks.  The TPS variation is high, and run
order heavily affects the result.  Nevertheless, I think there is a
small but noticeable positive effect of the patch.  I'll continue
working on the patch bringing it into more acceptable shape.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

vignesh C

Date:

04 January 2023, 12:40:48

On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hackers,
>
> When working in the read committed transaction isolation mode
> (default), we have the following sequence of actions when
> tuple_update() or tuple_delete() find concurrently updated tuple.
>
> 1. tuple_update()/tuple_delete() returns TM_Updated
> 2. tuple_lock()
> 3. Re-evaluate plan qual (recheck if we still need to update/delete
> and calculate the new tuple for update)
> 4. tuple_update()/tuple_delete() (this time should be successful,
> since we've previously locked the tuple).
>
> I wonder if we should merge steps 1 and 2. We could save some efforts
> already done during tuple_update()/tuple_delete() for locking the
> tuple. In heap table access method, we've to start tuple_lock() with
> the first tuple in the chain, but tuple_update()/tuple_delete()
> already visited it. For undo-based table access methods,
> tuple_update()/tuple_delete() should start from the last version, why
> don't place the tuple lock immediately once a concurrent update is
> detected. I think this patch should have some performance benefits on
> high concurrency.
>
> Also, the patch simplifies code in nodeModifyTable.c getting rid of
> the nested case. I also get rid of extra
> table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> tuple, when it should be exactly the same tuple we've just locked.
>
> I'm going to check the performance impact. Thoughts and feedback are welcome.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
=== applying patch
./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
patching file src/backend/executor/nodeModifyTable.c
...
Hunk #3 FAILED at 1376.
...
1 out of 15 hunks FAILED -- saving rejects to file
src/backend/executor/nodeModifyTable.c.rej

[1] - http://cfbot.cputube.org/patch_41_4099.log

Regards,
Vignesh

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

04 January 2023, 12:52:45

Hi, Vignesh!

On Wed, 4 Jan 2023 at 12:41, vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Hackers,
> >
> > When working in the read committed transaction isolation mode
> > (default), we have the following sequence of actions when
> > tuple_update() or tuple_delete() find concurrently updated tuple.
> >
> > 1. tuple_update()/tuple_delete() returns TM_Updated
> > 2. tuple_lock()
> > 3. Re-evaluate plan qual (recheck if we still need to update/delete
> > and calculate the new tuple for update)
> > 4. tuple_update()/tuple_delete() (this time should be successful,
> > since we've previously locked the tuple).
> >
> > I wonder if we should merge steps 1 and 2. We could save some efforts
> > already done during tuple_update()/tuple_delete() for locking the
> > tuple. In heap table access method, we've to start tuple_lock() with
> > the first tuple in the chain, but tuple_update()/tuple_delete()
> > already visited it. For undo-based table access methods,
> > tuple_update()/tuple_delete() should start from the last version, why
> > don't place the tuple lock immediately once a concurrent update is
> > detected. I think this patch should have some performance benefits on
> > high concurrency.
> >
> > Also, the patch simplifies code in nodeModifyTable.c getting rid of
> > the nested case. I also get rid of extra
> > table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> > tuple, when it should be exactly the same tuple we've just locked.
> >
> > I'm going to check the performance impact. Thoughts and feedback are welcome.
>
> The patch does not apply on top of HEAD as in [1], please post a rebased patch:
> === Applying patches on top of PostgreSQL commit ID
> eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
> === applying patch
> ./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
> patching file src/backend/executor/nodeModifyTable.c
> ...
> Hunk #3 FAILED at 1376.
> ...
> 1 out of 15 hunks FAILED -- saving rejects to file
> src/backend/executor/nodeModifyTable.c.rej
>
> [1] - http://cfbot.cputube.org/patch_41_4099.log

The rebased patch is attached. It's just a change in formatting, no
changes in code though.

Regards,
Pavel Borisov,
Supabase.

Attachment

v2-0001-Lock-updated-tuples-in-tuple_update-and-tuple_del.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

04 January 2023, 15:42:34

On Wed, 4 Jan 2023 at 12:52, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> Hi, Vignesh!
>
> On Wed, 4 Jan 2023 at 12:41, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > Hackers,
> > >
> > > When working in the read committed transaction isolation mode
> > > (default), we have the following sequence of actions when
> > > tuple_update() or tuple_delete() find concurrently updated tuple.
> > >
> > > 1. tuple_update()/tuple_delete() returns TM_Updated
> > > 2. tuple_lock()
> > > 3. Re-evaluate plan qual (recheck if we still need to update/delete
> > > and calculate the new tuple for update)
> > > 4. tuple_update()/tuple_delete() (this time should be successful,
> > > since we've previously locked the tuple).
> > >
> > > I wonder if we should merge steps 1 and 2. We could save some efforts
> > > already done during tuple_update()/tuple_delete() for locking the
> > > tuple. In heap table access method, we've to start tuple_lock() with
> > > the first tuple in the chain, but tuple_update()/tuple_delete()
> > > already visited it. For undo-based table access methods,
> > > tuple_update()/tuple_delete() should start from the last version, why
> > > don't place the tuple lock immediately once a concurrent update is
> > > detected. I think this patch should have some performance benefits on
> > > high concurrency.
> > >
> > > Also, the patch simplifies code in nodeModifyTable.c getting rid of
> > > the nested case. I also get rid of extra
> > > table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> > > tuple, when it should be exactly the same tuple we've just locked.
> > >
> > > I'm going to check the performance impact. Thoughts and feedback are welcome.
> >
> > The patch does not apply on top of HEAD as in [1], please post a rebased patch:
> > === Applying patches on top of PostgreSQL commit ID
> > eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
> > === applying patch
> > ./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
> > patching file src/backend/executor/nodeModifyTable.c
> > ...
> > Hunk #3 FAILED at 1376.
> > ...
> > 1 out of 15 hunks FAILED -- saving rejects to file
> > src/backend/executor/nodeModifyTable.c.rej
> >
> > [1] - http://cfbot.cputube.org/patch_41_4099.log
>
> The rebased patch is attached. It's just a change in formatting, no
> changes in code though.

One more update of a patchset to avoid compiler warnings.

Regards,
Pavel Borisov

Attachment

v3-0001-Lock-updated-tuples-in-tuple_update-and-tuple_del.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

04 January 2023, 17:05:03

Hi, Pavel!

On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Wed, 4 Jan 2023 at 12:52, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > On Wed, 4 Jan 2023 at 12:41, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >
> > > > Hackers,
> > > >
> > > > When working in the read committed transaction isolation mode
> > > > (default), we have the following sequence of actions when
> > > > tuple_update() or tuple_delete() find concurrently updated tuple.
> > > >
> > > > 1. tuple_update()/tuple_delete() returns TM_Updated
> > > > 2. tuple_lock()
> > > > 3. Re-evaluate plan qual (recheck if we still need to update/delete
> > > > and calculate the new tuple for update)
> > > > 4. tuple_update()/tuple_delete() (this time should be successful,
> > > > since we've previously locked the tuple).
> > > >
> > > > I wonder if we should merge steps 1 and 2. We could save some efforts
> > > > already done during tuple_update()/tuple_delete() for locking the
> > > > tuple. In heap table access method, we've to start tuple_lock() with
> > > > the first tuple in the chain, but tuple_update()/tuple_delete()
> > > > already visited it. For undo-based table access methods,
> > > > tuple_update()/tuple_delete() should start from the last version, why
> > > > don't place the tuple lock immediately once a concurrent update is
> > > > detected. I think this patch should have some performance benefits on
> > > > high concurrency.
> > > >
> > > > Also, the patch simplifies code in nodeModifyTable.c getting rid of
> > > > the nested case. I also get rid of extra
> > > > table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> > > > tuple, when it should be exactly the same tuple we've just locked.
> > > >
> > > > I'm going to check the performance impact. Thoughts and feedback are welcome.
> > >
> > > The patch does not apply on top of HEAD as in [1], please post a rebased patch:
> > > === Applying patches on top of PostgreSQL commit ID
> > > eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
> > > === applying patch
> > > ./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
> > > patching file src/backend/executor/nodeModifyTable.c
> > > ...
> > > Hunk #3 FAILED at 1376.
> > > ...
> > > 1 out of 15 hunks FAILED -- saving rejects to file
> > > src/backend/executor/nodeModifyTable.c.rej
> > >
> > > [1] - http://cfbot.cputube.org/patch_41_4099.log
> >
> > The rebased patch is attached. It's just a change in formatting, no
> > changes in code though.
>
> One more update of a patchset to avoid compiler warnings.

Thank you for your help.  I'm going to provide the revised version of
patch with comments and commit message in the next couple of days.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

05 January 2023, 15:11:43

On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > One more update of a patchset to avoid compiler warnings.
>
> Thank you for your help.  I'm going to provide the revised version of
> patch with comments and commit message in the next couple of days.

The revised patch is attached.  It contains describing commit message,
comments and some minor code improvements.

------
Regards,
Alexander Korotkov

Attachment

0001-Allow-locking-updated-tuples-in-tuple_update-and--v4.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

06 January 2023, 15:45:29

Hi, Alexander!

On Thu, 5 Jan 2023 at 15:11, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > One more update of a patchset to avoid compiler warnings.
> >
> > Thank you for your help.  I'm going to provide the revised version of
> > patch with comments and commit message in the next couple of days.
>
> The revised patch is attached.  It contains describing commit message,
> comments and some minor code improvements.

I've looked through the patch once again. It seems in a nice state to
be committed.
I also noticed that in tableam level and NodeModifyTable function
calls we have a one-to-one correspondence between *lockedSlot и bool
lockUpdated, but no checks on this in case something changes in the
code in the future. I'd propose combining these variables to remain
free from these checks. See v5 of a patch. Tests are successfully
passed.
Besides, the new version has only some minor changes in the comments
and the commit message.

Kind regards,
Pavel Borisov,
Supabase.

Attachment

v5-0001-Allow-locking-updated-tuples-in-tuple_update-and-.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Mason Sharp

Date:

08 January 2023, 21:33:40

On Fri, Jan 6, 2023 at 4:46 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

Hi, Alexander!

On Thu, 5 Jan 2023 at 15:11, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > One more update of a patchset to avoid compiler warnings.
> >
> > Thank you for your help. I'm going to provide the revised version of
> > patch with comments and commit message in the next couple of days.
>
> The revised patch is attached. It contains describing commit message,
> comments and some minor code improvements.

I've looked through the patch once again. It seems in a nice state to
be committed.
I also noticed that in tableam level and NodeModifyTable function
calls we have a one-to-one correspondence between *lockedSlot и bool
lockUpdated, but no checks on this in case something changes in the
code in the future. I'd propose combining these variables to remain
free from these checks. See v5 of a patch. Tests are successfully
passed.
Besides, the new version has only some minor changes in the comments
and the commit message.

Kind regards,
Pavel Borisov,
Supabase.

It looks good, and the greater the concurrency the greater the benefit will be. Just a few minor suggestions regarding comments.

"ExecDeleteAct() have already locked the old tuple for us", change "have" to "has".

The comments in heapam_tuple_delete() and heapam_tuple_update() might be a little clearer with something like:

"If the tuple has been concurrently updated, get lock already so that on
retry it will succeed, provided that the caller asked to do this by
providing a lockedSlot."

Also, not too important, but perhaps better clarify in the commit message that the repeated work is driven by ExecUpdate and ExecDelete and can happen multiple times depending on the concurrency.

Best Regards,

Mason

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

09 January 2023, 01:07:45

Hi, Mason!

Thank you very much for your review.

On Sun, Jan 8, 2023 at 9:33 PM Mason Sharp <masonlists@gmail.com> wrote:
> On Fri, Jan 6, 2023 at 4:46 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> Besides, the new version has only some minor changes in the comments
>> and the commit message.
> It looks good, and the greater the concurrency the greater the benefit will be. Just a few minor suggestions
regardingcomments.
 
>
> "ExecDeleteAct() have already locked the old tuple for us", change "have" to "has".
>
> The comments in heapam_tuple_delete() and heapam_tuple_update() might be a little clearer with something like:
>
> "If the tuple has been concurrently updated, get lock already so that on
> retry it will succeed, provided that the caller asked to do this by
> providing a lockedSlot."

Thank you.  These changes are incorporated into v6 of the patch.

> Also, not too important, but perhaps better clarify in the commit message that the repeated work is driven by
ExecUpdateand ExecDelete and can happen multiple times depending on the concurrency.
 

Hmm... It can't happen arbitrary number of times.  If tuple was
concurrently updated, the we lock it.  Once we lock, nobody can change
it until we finish out work.  So, I think no changes needed.

I'm going to push this if no objections.

------
Regards,
Alexander Korotkov

Attachment

0001-Allow-locking-updated-tuples-in-tuple_update-and--v6.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Aleksander Alekseev

Date:

09 January 2023, 12:56:12

Hi Alexander,

> I'm going to push this if no objections.

I took a fresh look at the patch and it LGTM. I only did a few
cosmetic changes, PFA v7.

Changes since v6 are:

```
@@ -318,12 +318,12 @@ heapam_tuple_delete(Relation relation,
ItemPointer tid, CommandId cid,
     result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd,
changingPart);

     /*
-     * If the tuple has been concurrently updated, get lock already so that on
-     * retry it will succeed, provided that the caller asked to do this by
-     * providing a lockedSlot.
+     * If lockUpdated is true and the tuple has been concurrently updated, get
+     * the lock immediately so that on retry we will succeed.
      */
     if (result == TM_Updated && lockUpdated)
     {
+        Assert(lockedSlot != NULL);
```

... and the same for heapam_tuple_update().

-- 
Best regards,
Aleksander Alekseev

Attachment

v7-0001-Allow-locking-updated-tuples-in-tuple_update-and-.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

09 January 2023, 13:10:56

Hi Aleksander,

On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> > I'm going to push this if no objections.
>
> I took a fresh look at the patch and it LGTM. I only did a few
> cosmetic changes, PFA v7.
>
> Changes since v6 are:

Thank you for looking into this.  It appears that I've applied changes
proposed by Mason to v5, not v6.  That lead to comment mismatch with
the code that you've noticed.  v8 should be correct.  Please, recheck.

------
Regards,
Alexander Korotkov

Attachment

0001-Allow-locking-updated-tuples-in-tuple_update-and--v8.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

09 January 2023, 13:29:18

On Mon, Jan 9, 2023 at 1:10 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> > > I'm going to push this if no objections.
> >
> > I took a fresh look at the patch and it LGTM. I only did a few
> > cosmetic changes, PFA v7.
> >
> > Changes since v6 are:
>
> Thank you for looking into this.  It appears that I've applied changes
> proposed by Mason to v5, not v6.  That lead to comment mismatch with
> the code that you've noticed.  v8 should be correct.  Please, recheck.

v9 also incorporates lost changes to the commit message by Pavel Borisov.

------
Regards,
Alexander Korotkov

Attachment

0001-Allow-locking-updated-tuples-in-tuple_update-and--v9.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

09 January 2023, 13:40:51

Hi, Alexander!

On Mon, 9 Jan 2023 at 13:29, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Mon, Jan 9, 2023 at 1:10 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
> > <aleksander@timescale.com> wrote:
> > > > I'm going to push this if no objections.
> > >
> > > I took a fresh look at the patch and it LGTM. I only did a few
> > > cosmetic changes, PFA v7.
> > >
> > > Changes since v6 are:
> >
> > Thank you for looking into this.  It appears that I've applied changes
> > proposed by Mason to v5, not v6.  That lead to comment mismatch with
> > the code that you've noticed.  v8 should be correct.  Please, recheck.
>
> v9 also incorporates lost changes to the commit message by Pavel Borisov.
I've looked through patch v9. It resembles patch v5 plus comments
clarification by Mason plus the right discussion link in the commit
message from v8. Aleksander's proposal of Assert in v7 was due to
changes lost between v5 and v6, as combining connected variables in v5
makes checks for them being in agreement one with the other
unnecessary. So changes from v7 are not in v9.

Sorry for being so detailed in small details. In my opinion the patch
now is ready to be committed.

Regards,
Pavel Borisov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Aleksander Alekseev

Date:

09 January 2023, 13:46:42

Alexander, Pavel,

> Sorry for being so detailed in small details. In my opinion the patch
> now is ready to be committed.

Agree.

Personally I liked the version with (lockUpdated, lockedSlot) pair a
bit more since it is a bit more readable, however the version without
lockUpdated is less error prone and slightly more efficient. So all in
all I have no strong opinion on which is better.

-- 
Best regards,
Aleksander Alekseev

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

09 January 2023, 13:46:50

Hi, Pavel!

On Mon, Jan 9, 2023 at 1:41 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Mon, 9 Jan 2023 at 13:29, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Mon, Jan 9, 2023 at 1:10 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
> > > <aleksander@timescale.com> wrote:
> > > > > I'm going to push this if no objections.
> > > >
> > > > I took a fresh look at the patch and it LGTM. I only did a few
> > > > cosmetic changes, PFA v7.
> > > >
> > > > Changes since v6 are:
> > >
> > > Thank you for looking into this.  It appears that I've applied changes
> > > proposed by Mason to v5, not v6.  That lead to comment mismatch with
> > > the code that you've noticed.  v8 should be correct.  Please, recheck.
> >
> > v9 also incorporates lost changes to the commit message by Pavel Borisov.
> I've looked through patch v9. It resembles patch v5 plus comments
> clarification by Mason plus the right discussion link in the commit
> message from v8. Aleksander's proposal of Assert in v7 was due to
> changes lost between v5 and v6, as combining connected variables in v5
> makes checks for them being in agreement one with the other
> unnecessary. So changes from v7 are not in v9.
>
> Sorry for being so detailed in small details. In my opinion the patch
> now is ready to be committed.

Sorry for creating this mess with lost changes.  And thank you for
confirming it's good now.  I'm going to push v9.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

10 January 2023, 02:38:58

Hi,

On 2023-01-09 13:46:50 +0300, Alexander Korotkov wrote:
> I'm going to push v9.

Could you hold off for a bit? I'd like to look at this, I'm not sure I like
the direction.

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

10 January 2023, 04:07:02

Hi,

I'm a bit worried that this is optimizing the rare case while hurting the
common case. See e.g. my point below about creating additional slots in the
happy path.

It's also not clear that change is right directionally. If we want to avoid
re-fetching the "original" row version, why don't we provide that
functionality via table_tuple_lock()?


On 2023-01-09 13:29:18 +0300, Alexander Korotkov wrote:
> @@ -53,6 +53,12 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
>                                     HeapTuple tuple,
>                                     OffsetNumber tupoffset);
>
> +static TM_Result heapam_tuple_lock_internal(Relation relation, ItemPointer tid,
> +                                            Snapshot snapshot, TupleTableSlot *slot,
> +                                            CommandId cid, LockTupleMode mode,
> +                                            LockWaitPolicy wait_policy, uint8 flags,
> +                                            TM_FailureData *tmfd, bool updated);
> +
>  static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan);
>
>  static const TableAmRoutine heapam_methods;
> @@ -299,14 +305,39 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot,
>  static TM_Result
>  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
>                      Snapshot snapshot, Snapshot crosscheck, bool wait,
> -                    TM_FailureData *tmfd, bool changingPart)
> +                    TM_FailureData *tmfd, bool changingPart,
> +                    TupleTableSlot *lockedSlot)
>  {
> +    TM_Result    result;
> +
>      /*
>       * Currently Deleting of index tuples are handled at vacuum, in case if
>       * the storage itself is cleaning the dead tuples by itself, it is the
>       * time to call the index tuple deletion also.
>       */
> -    return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> +    result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> +
> +    /*
> +     * If the tuple has been concurrently updated, get lock already so that on
> +     * retry it will succeed, provided that the caller asked to do this by
> +     * providing a lockedSlot.
> +     */
> +    if (result == TM_Updated && lockedSlot != NULL)
> +    {
> +        result = heapam_tuple_lock_internal(relation, tid, snapshot,
> +                                            lockedSlot, cid, LockTupleExclusive,
> +                                            LockWaitBlock,
> +                                            TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> +                                            tmfd, true);

You're ignoring the 'wait' parameter here, no? I think the modification to
heapam_tuple_update() has the same issue.


> +        if (result == TM_Ok)
> +        {
> +            tmfd->traversed = true;
> +            return TM_Updated;
> +        }
> +    }
> +
> +    return result;

Doesn't this mean that the caller can't easily distinguish between
heapam_tuple_delete() and heapam_tuple_lock_internal() returning a failure
state?


> @@ -350,213 +402,8 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
>                    LockWaitPolicy wait_policy, uint8 flags,
>                    TM_FailureData *tmfd)
>  {

Moving the entire body of the function around, makes it harder to review
this change, because the code movement is intermingled with "actual" changes.


> +/*
> + * This routine does the work for heapam_tuple_lock(), but also support
> + * `updated` to re-use the work done by heapam_tuple_update() or
> + * heapam_tuple_delete() on fetching tuple and checking its visibility.
> + */
> +static TM_Result
> +heapam_tuple_lock_internal(Relation relation, ItemPointer tid, Snapshot snapshot,
> +                           TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
> +                           LockWaitPolicy wait_policy, uint8 flags,
> +                           TM_FailureData *tmfd, bool updated)
> +{
> +    BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> +    TM_Result    result;
> +    Buffer        buffer = InvalidBuffer;
> +    HeapTuple    tuple = &bslot->base.tupdata;
> +    bool        follow_updates;
> +
> +    follow_updates = (flags & TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS) != 0;
> +    tmfd->traversed = false;
> +
> +    Assert(TTS_IS_BUFFERTUPLE(slot));
> +
> +tuple_lock_retry:
> +    tuple->t_self = *tid;
> +    if (!updated)
> +        result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
> +                                 follow_updates, &buffer, tmfd);
> +    else
> +        result = TM_Updated;

To make sure I understand: You're basically trying to have
heapam_tuple_lock_internal() work as before, except that you want to omit
fetching the first row version, assuming that the caller already tried to lock
it?

I think at the very this needs an assert verifying that the slot actually
contains a tuple in the "updated" path.


> +    if (result == TM_Updated &&
> +        (flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
> +    {
> +        if (!updated)
> +        {
> +            /* Should not encounter speculative tuple on recheck */
> +            Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));

Why shouldn't this be checked in the updated case as well?


> @@ -1490,7 +1492,16 @@ ExecDelete(ModifyTableContext *context,
>           * transaction-snapshot mode transactions.
>           */
>  ldelete:
> -        result = ExecDeleteAct(context, resultRelInfo, tupleid, changingPart);
> +
> +        /*
> +         * Ask ExecDeleteAct() to immediately place the lock on the updated
> +         * tuple if we will need EvalPlanQual() in that case to handle it.
> +         */
> +        if (!IsolationUsesXactSnapshot())
> +            slot = ExecGetReturningSlot(estate, resultRelInfo);
> +
> +        result = ExecDeleteAct(context, resultRelInfo, tupleid, changingPart,
> +                               slot);

I don't like that 'slot' is now used for multiple things. I think this could
best be addressed by simply moving the slot variable inside the blocks using
it. And here it should be named more accurately.

Is there a potential conflict with other uses of the ExecGetReturningSlot()?


Given that we now always create the slot, doesn't this increase the overhead
for the very common case of not needing EPQ? We'll create unnecessary slots
all the time, no?


>                       */
>                      EvalPlanQualBegin(context->epqstate);
>                      inputslot = EvalPlanQualSlot(context->epqstate, resultRelationDesc,
>                                                   resultRelInfo->ri_RangeTableIndex);
> +                    ExecCopySlot(inputslot, slot);

> -                    result = table_tuple_lock(resultRelationDesc, tupleid,
> -                                              estate->es_snapshot,
> -                                              inputslot, estate->es_output_cid,
> -                                              LockTupleExclusive, LockWaitBlock,
> -                                              TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> -                                              &context->tmfd);
> +                    Assert(context->tmfd.traversed);
> +                    epqslot = EvalPlanQual(context->epqstate,
> +                                           resultRelationDesc,
> +                                           resultRelInfo->ri_RangeTableIndex,
> +                                           inputslot);

The only point of using EvalPlanQualSlot() is to avoid copying the tuple from
one slot to another. Given that we're not benefiting from that anymore (due to
your manual ExecCopySlot() call), it seems we could just pass 'slot' to
EvalPlanQual() and not bother with EvalPlanQualSlot().



> @@ -1449,6 +1451,8 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
>   *    tmfd - filled in failure cases (see below)
>   *    changingPart - true iff the tuple is being moved to another partition
>   *        table due to an update of the partition key. Otherwise, false.
> + *    lockedSlot - slot to save the locked tuple if should lock the last row
> + *        version during the concurrent update. NULL if not needed.

The grammar in the new comments is off ("if should lock").

I think this is also needs to mention that this *significantly* changes the
behaviour of table_tuple_delete(). That's not at all clear from the comment.


Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Gregory Stark

Date:

01 March 2023, 00:02:46

It looks like this patch received some feedback from Andres and hasn't
had any further work posted. I'm going to move it to "Waiting on
Author".

It doesn't sound like this is likely to get committed this release
cycle unless responding to Andres's points simpler than I expect.

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

01 March 2023, 00:08:58

On Wed, Mar 1, 2023 at 12:03 AM Gregory Stark <stark@postgresql.org> wrote:
> It looks like this patch received some feedback from Andres and hasn't
> had any further work posted. I'm going to move it to "Waiting on
> Author".

I'll post the updated version in the next couple of days.

> It doesn't sound like this is likely to get committed this release
> cycle unless responding to Andres's points simpler than I expect.

I wouldn't think ahead that much.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

01 March 2023, 17:57:45

Hi, Andres.

Thank you for your review.  Sorry for the late reply.  I took some
time for me to figure out how to revise the patch.

The revised patchset is attached.  I decided to split the patch into two:
1) Avoid re-fetching the "original" row version during update and delete.
2) Save the efforts by re-using existing context of
tuple_update()/tuple_delete() for locking the tuple.
They are two separate optimizations. So let's evaluate their
performance separately.

On Tue, Jan 10, 2023 at 4:07 AM Andres Freund <andres@anarazel.de> wrote:
> I'm a bit worried that this is optimizing the rare case while hurting the
> common case. See e.g. my point below about creating additional slots in the
> happy path.

This makes sense.  It worth to allocate the slot only if we're going
to store a tuple there.  I implemented this by passing a callback for
slot allocation instead of the slot.

> It's also not clear that change is right directionally. If we want to avoid
> re-fetching the "original" row version, why don't we provide that
> functionality via table_tuple_lock()?

These are two distinct optimizations.  Now, they come as two distinct patches.

> On 2023-01-09 13:29:18 +0300, Alexander Korotkov wrote:
> > @@ -53,6 +53,12 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
> >                                                                  HeapTuple tuple,
> >                                                                  OffsetNumber tupoffset);
> >
> > +static TM_Result heapam_tuple_lock_internal(Relation relation, ItemPointer tid,
> > +                                                                                     Snapshot snapshot,
TupleTableSlot*slot,
 
> > +                                                                                     CommandId cid, LockTupleMode
mode,
> > +                                                                                     LockWaitPolicy wait_policy,
uint8flags,
 
> > +                                                                                     TM_FailureData *tmfd, bool
updated);
> > +
> >  static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan);
> >
> >  static const TableAmRoutine heapam_methods;
> > @@ -299,14 +305,39 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot,
> >  static TM_Result
> >  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
> >                                       Snapshot snapshot, Snapshot crosscheck, bool wait,
> > -                                     TM_FailureData *tmfd, bool changingPart)
> > +                                     TM_FailureData *tmfd, bool changingPart,
> > +                                     TupleTableSlot *lockedSlot)
> >  {
> > +     TM_Result       result;
> > +
> >       /*
> >        * Currently Deleting of index tuples are handled at vacuum, in case if
> >        * the storage itself is cleaning the dead tuples by itself, it is the
> >        * time to call the index tuple deletion also.
> >        */
> > -     return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> > +     result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> > +
> > +     /*
> > +      * If the tuple has been concurrently updated, get lock already so that on
> > +      * retry it will succeed, provided that the caller asked to do this by
> > +      * providing a lockedSlot.
> > +      */
> > +     if (result == TM_Updated && lockedSlot != NULL)
> > +     {
> > +             result = heapam_tuple_lock_internal(relation, tid, snapshot,
> > +                                                                                     lockedSlot, cid,
LockTupleExclusive,
> > +                                                                                     LockWaitBlock,
> > +
TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > +                                                                                     tmfd, true);
>
> You're ignoring the 'wait' parameter here, no? I think the modification to
> heapam_tuple_update() has the same issue.

Yep.  I didn't catch this, because currently we also call
tuple_update()/tuple_delete() with wait == true.  Fixed.

> > +             if (result == TM_Ok)
> > +             {
> > +                     tmfd->traversed = true;
> > +                     return TM_Updated;
> > +             }
> > +     }
> > +
> > +     return result;
>
> Doesn't this mean that the caller can't easily distinguish between
> heapam_tuple_delete() and heapam_tuple_lock_internal() returning a failure
> state?

Exactly.  But currently nodeModifyTable.c handles these failure states
in the similar way.  And I don't see why it should be different in
future.

> > @@ -350,213 +402,8 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
> >                                 LockWaitPolicy wait_policy, uint8 flags,
> >                                 TM_FailureData *tmfd)
> >  {
>
> Moving the entire body of the function around, makes it harder to review
> this change, because the code movement is intermingled with "actual" changes.

OK, fixed.

> > +/*
> > + * This routine does the work for heapam_tuple_lock(), but also support
> > + * `updated` to re-use the work done by heapam_tuple_update() or
> > + * heapam_tuple_delete() on fetching tuple and checking its visibility.
> > + */
> > +static TM_Result
> > +heapam_tuple_lock_internal(Relation relation, ItemPointer tid, Snapshot snapshot,
> > +                                                TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
> > +                                                LockWaitPolicy wait_policy, uint8 flags,
> > +                                                TM_FailureData *tmfd, bool updated)
> > +{
> > +     BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> > +     TM_Result       result;
> > +     Buffer          buffer = InvalidBuffer;
> > +     HeapTuple       tuple = &bslot->base.tupdata;
> > +     bool            follow_updates;
> > +
> > +     follow_updates = (flags & TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS) != 0;
> > +     tmfd->traversed = false;
> > +
> > +     Assert(TTS_IS_BUFFERTUPLE(slot));
> > +
> > +tuple_lock_retry:
> > +     tuple->t_self = *tid;
> > +     if (!updated)
> > +             result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
> > +                                                              follow_updates, &buffer, tmfd);
> > +     else
> > +             result = TM_Updated;
>
> To make sure I understand: You're basically trying to have
> heapam_tuple_lock_internal() work as before, except that you want to omit
> fetching the first row version, assuming that the caller already tried to lock
> it?
>
> I think at the very this needs an assert verifying that the slot actually
> contains a tuple in the "updated" path.

This part was re-written.

> > +     if (result == TM_Updated &&
> > +             (flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
> > +     {
> > +             if (!updated)
> > +             {
> > +                     /* Should not encounter speculative tuple on recheck */
> > +                     Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
>
> Why shouldn't this be checked in the updated case as well?
>
>
> > @@ -1490,7 +1492,16 @@ ExecDelete(ModifyTableContext *context,
> >                * transaction-snapshot mode transactions.
> >                */
> >  ldelete:
> > -             result = ExecDeleteAct(context, resultRelInfo, tupleid, changingPart);
> > +
> > +             /*
> > +              * Ask ExecDeleteAct() to immediately place the lock on the updated
> > +              * tuple if we will need EvalPlanQual() in that case to handle it.
> > +              */
> > +             if (!IsolationUsesXactSnapshot())
> > +                     slot = ExecGetReturningSlot(estate, resultRelInfo);
> > +
> > +             result = ExecDeleteAct(context, resultRelInfo, tupleid, changingPart,
> > +                                                        slot);
>
> I don't like that 'slot' is now used for multiple things. I think this could
> best be addressed by simply moving the slot variable inside the blocks using
> it. And here it should be named more accurately.

I didn't do that refactoring.  But now editing introduced by the 1st
patch of the set are more granular and doesn't affect usage of the
'slot' variable.

> Is there a potential conflict with other uses of the ExecGetReturningSlot()?

Yep.  The current revision evades this random usage of slots.

> Given that we now always create the slot, doesn't this increase the overhead
> for the very common case of not needing EPQ? We'll create unnecessary slots
> all the time, no?

Yes, this is addressed by allocating EPQ slot only once it is needed
via callback.  I'm thinking about wrapping this into some abstraction
called 'LazySlot'.

> >                                        */
> >                                       EvalPlanQualBegin(context->epqstate);
> >                                       inputslot = EvalPlanQualSlot(context->epqstate, resultRelationDesc,
> >
resultRelInfo->ri_RangeTableIndex);
> > +                                     ExecCopySlot(inputslot, slot);
>
> > -                                     result = table_tuple_lock(resultRelationDesc, tupleid,
> > -                                                                                       estate->es_snapshot,
> > -                                                                                       inputslot,
estate->es_output_cid,
> > -                                                                                       LockTupleExclusive,
LockWaitBlock,
> > -
TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > -                                                                                       &context->tmfd);
> > +                                     Assert(context->tmfd.traversed);
> > +                                     epqslot = EvalPlanQual(context->epqstate,
> > +                                                                                resultRelationDesc,
> > +
resultRelInfo->ri_RangeTableIndex,
> > +                                                                                inputslot);
>
> The only point of using EvalPlanQualSlot() is to avoid copying the tuple from
> one slot to another. Given that we're not benefiting from that anymore (due to
> your manual ExecCopySlot() call), it seems we could just pass 'slot' to
> EvalPlanQual() and not bother with EvalPlanQualSlot().

This makes sense.  Now, usage pattern of the slots is more clear.

> > @@ -1449,6 +1451,8 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
> >   *   tmfd - filled in failure cases (see below)
> >   *   changingPart - true iff the tuple is being moved to another partition
> >   *           table due to an update of the partition key. Otherwise, false.
> > + *   lockedSlot - slot to save the locked tuple if should lock the last row
> > + *           version during the concurrent update. NULL if not needed.
>
> The grammar in the new comments is off ("if should lock").
>
> I think this is also needs to mention that this *significantly* changes the
> behaviour of table_tuple_delete(). That's not at all clear from the comment.

Let's see the performance results for the patchset.  I'll properly
revise the comments if results will be good.

Pavel, could you please re-run your tests over revised patchset?

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

02 March 2023, 13:28:56

Hi, Alexander!

> Let's see the performance results for the patchset.  I'll properly
> revise the comments if results will be good.
>
> Pavel, could you please re-run your tests over revised patchset?

Since last time I've improved the test to avoid significant series
differences due to AWS storage access variation that is seen in [1].
I.e. each series of tests is run on a tmpfs with newly inited pgbench
tables and vacuum. Also, I've added a test for low-concurrency updates
where the locking optimization isn't expected to improve performance,
just to make sure the patches don't make things worse.

The tests are as follows:
1. Heap updates with high tuple concurrency:
Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 --unlogged-tables)
Update tellers 100 rows, 50 conns ( pgbench postgres -f
./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )

Result: Average of 5 series with patches (0001+0002) is around 5%
faster than both master and patch 0001. Still, there are some
fluctuations between different series of the measurements of the same
patch, but much less than in [1]

2. Heap updates with low tuple concurrency:
Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300 --unlogged-tables)
Update 3*10^7 rows, 50 conns (pgbench postgres -f
./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)

Result: Both patches and master are the same within a tolerance of
less than 0.7%.

Tests are run on the same 36-vcore AWS c5.9xlarge as [1]. The results
pictures are attached.

Using pkeys in low-concurrency cases is to make the index search of a
tuple to be updated. No pkeys in case of high concurrency is for
concurrent index updates not contribute to updates performance.

Common settings:
shared_memory 20Gb
max_worker_processes = 1024
max_parallel_workers = 1024
max_connections=10000
autovacuum_multixact_freeze_max_age=2000000000
autovacuum_freeze_max_age=2000000000
max_wal_senders=0
wal_level=minimal
max_wal_size = 10G
autovacuum = off
fsync = off
full_page_writes = off

Kind regards,
Pavel Borisov,
Supabase.

[1] https://www.postgresql.org/message-id/CALT9ZEGhxwh2_WOpOjdazW7CNkBzen17h7xMdLbBjfZb5aULgg%40mail.gmail.com

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

02 March 2023, 17:53:08

Hi, Pavel!

On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > Let's see the performance results for the patchset.  I'll properly
> > revise the comments if results will be good.
> >
> > Pavel, could you please re-run your tests over revised patchset?
>
> Since last time I've improved the test to avoid significant series
> differences due to AWS storage access variation that is seen in [1].
> I.e. each series of tests is run on a tmpfs with newly inited pgbench
> tables and vacuum. Also, I've added a test for low-concurrency updates
> where the locking optimization isn't expected to improve performance,
> just to make sure the patches don't make things worse.
>
> The tests are as follows:
> 1. Heap updates with high tuple concurrency:
> Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 --unlogged-tables)
> Update tellers 100 rows, 50 conns ( pgbench postgres -f
> ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
>
> Result: Average of 5 series with patches (0001+0002) is around 5%
> faster than both master and patch 0001. Still, there are some
> fluctuations between different series of the measurements of the same
> patch, but much less than in [1]

Thank you for running this that fast!

So, it appears that 0001 patch has no effect.  So, we probably should
consider to drop 0001 patch and consider just 0002 patch.

The attached patch v12 contains v11 0002 patch extracted separately.
Please, add it to the performance comparison.  Thanks.

------
Regards,
Alexander Korotkov

Attachment

0001-Allow-locking-updated-tuples-in-tuple_update-and-v12.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

02 March 2023, 21:17:19

Hi, Alexander!

On Thu, 2 Mar 2023 at 18:53, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Pavel!
>
> On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > Let's see the performance results for the patchset.  I'll properly
> > > revise the comments if results will be good.
> > >
> > > Pavel, could you please re-run your tests over revised patchset?
> >
> > Since last time I've improved the test to avoid significant series
> > differences due to AWS storage access variation that is seen in [1].
> > I.e. each series of tests is run on a tmpfs with newly inited pgbench
> > tables and vacuum. Also, I've added a test for low-concurrency updates
> > where the locking optimization isn't expected to improve performance,
> > just to make sure the patches don't make things worse.
> >
> > The tests are as follows:
> > 1. Heap updates with high tuple concurrency:
> > Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 --unlogged-tables)
> > Update tellers 100 rows, 50 conns ( pgbench postgres -f
> > ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
> >
> > Result: Average of 5 series with patches (0001+0002) is around 5%
> > faster than both master and patch 0001. Still, there are some
> > fluctuations between different series of the measurements of the same
> > patch, but much less than in [1]
>
> Thank you for running this that fast!
>
> So, it appears that 0001 patch has no effect.  So, we probably should
> consider to drop 0001 patch and consider just 0002 patch.
>
> The attached patch v12 contains v11 0002 patch extracted separately.
> Please, add it to the performance comparison.  Thanks.

I've done a benchmarking on a full series of four variants: master vs
v11-0001 vs v11-0001+0002 vs v12 in the same configuration as in the
previous measurement. The results are as follows:

1. Heap updates with high tuple concurrency:
Average of 5 series v11-0001+0002 is around 7% faster than the master.
I need to note that while v11-0001+0002 shows consistent performance
improvement over the master, its value can not be determined more
precisely than a couple of percents even with averaging. So I'd
suppose we may not conclude from the results if a more subtle
difference between v11-0001+0002 vs v12 (and master vs v11-0001)
really exists.

2. Heap updates with high tuple concurrency:
All patches and master are still the same within a tolerance of
less than 0.7%.

Overall patch v11-0001+0002 doesn't show performance degradation so I
don't see why to apply only patch 0002 skipping 0001.

Regards,
Pavel Borisov,
Supabase.

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

02 March 2023, 23:16:41

On Thu, Mar 2, 2023 at 9:17 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Thu, 2 Mar 2023 at 18:53, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > > Let's see the performance results for the patchset.  I'll properly
> > > > revise the comments if results will be good.
> > > >
> > > > Pavel, could you please re-run your tests over revised patchset?
> > >
> > > Since last time I've improved the test to avoid significant series
> > > differences due to AWS storage access variation that is seen in [1].
> > > I.e. each series of tests is run on a tmpfs with newly inited pgbench
> > > tables and vacuum. Also, I've added a test for low-concurrency updates
> > > where the locking optimization isn't expected to improve performance,
> > > just to make sure the patches don't make things worse.
> > >
> > > The tests are as follows:
> > > 1. Heap updates with high tuple concurrency:
> > > Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 --unlogged-tables)
> > > Update tellers 100 rows, 50 conns ( pgbench postgres -f
> > > ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
> > >
> > > Result: Average of 5 series with patches (0001+0002) is around 5%
> > > faster than both master and patch 0001. Still, there are some
> > > fluctuations between different series of the measurements of the same
> > > patch, but much less than in [1]
> >
> > Thank you for running this that fast!
> >
> > So, it appears that 0001 patch has no effect.  So, we probably should
> > consider to drop 0001 patch and consider just 0002 patch.
> >
> > The attached patch v12 contains v11 0002 patch extracted separately.
> > Please, add it to the performance comparison.  Thanks.
>
> I've done a benchmarking on a full series of four variants: master vs
> v11-0001 vs v11-0001+0002 vs v12 in the same configuration as in the
> previous measurement. The results are as follows:
>
> 1. Heap updates with high tuple concurrency:
> Average of 5 series v11-0001+0002 is around 7% faster than the master.
> I need to note that while v11-0001+0002 shows consistent performance
> improvement over the master, its value can not be determined more
> precisely than a couple of percents even with averaging. So I'd
> suppose we may not conclude from the results if a more subtle
> difference between v11-0001+0002 vs v12 (and master vs v11-0001)
> really exists.
>
> 2. Heap updates with high tuple concurrency:
> All patches and master are still the same within a tolerance of
> less than 0.7%.
>
> Overall patch v11-0001+0002 doesn't show performance degradation so I
> don't see why to apply only patch 0002 skipping 0001.

Thank you, Pavel.  So, it seems that we have substantial benefit only
with two patches.  So, I'll continue working on both of them.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

07 March 2023, 04:45:32

On Thu, Mar 2, 2023 at 11:16 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Thu, Mar 2, 2023 at 9:17 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > On Thu, 2 Mar 2023 at 18:53, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > > > Let's see the performance results for the patchset.  I'll properly
> > > > > revise the comments if results will be good.
> > > > >
> > > > > Pavel, could you please re-run your tests over revised patchset?
> > > >
> > > > Since last time I've improved the test to avoid significant series
> > > > differences due to AWS storage access variation that is seen in [1].
> > > > I.e. each series of tests is run on a tmpfs with newly inited pgbench
> > > > tables and vacuum. Also, I've added a test for low-concurrency updates
> > > > where the locking optimization isn't expected to improve performance,
> > > > just to make sure the patches don't make things worse.
> > > >
> > > > The tests are as follows:
> > > > 1. Heap updates with high tuple concurrency:
> > > > Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 --unlogged-tables)
> > > > Update tellers 100 rows, 50 conns ( pgbench postgres -f
> > > > ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
> > > >
> > > > Result: Average of 5 series with patches (0001+0002) is around 5%
> > > > faster than both master and patch 0001. Still, there are some
> > > > fluctuations between different series of the measurements of the same
> > > > patch, but much less than in [1]
> > >
> > > Thank you for running this that fast!
> > >
> > > So, it appears that 0001 patch has no effect.  So, we probably should
> > > consider to drop 0001 patch and consider just 0002 patch.
> > >
> > > The attached patch v12 contains v11 0002 patch extracted separately.
> > > Please, add it to the performance comparison.  Thanks.
> >
> > I've done a benchmarking on a full series of four variants: master vs
> > v11-0001 vs v11-0001+0002 vs v12 in the same configuration as in the
> > previous measurement. The results are as follows:
> >
> > 1. Heap updates with high tuple concurrency:
> > Average of 5 series v11-0001+0002 is around 7% faster than the master.
> > I need to note that while v11-0001+0002 shows consistent performance
> > improvement over the master, its value can not be determined more
> > precisely than a couple of percents even with averaging. So I'd
> > suppose we may not conclude from the results if a more subtle
> > difference between v11-0001+0002 vs v12 (and master vs v11-0001)
> > really exists.
> >
> > 2. Heap updates with high tuple concurrency:
> > All patches and master are still the same within a tolerance of
> > less than 0.7%.
> >
> > Overall patch v11-0001+0002 doesn't show performance degradation so I
> > don't see why to apply only patch 0002 skipping 0001.
>
> Thank you, Pavel.  So, it seems that we have substantial benefit only
> with two patches.  So, I'll continue working on both of them.

The revised patchset is attached.  The patch removing extra
table_tuple_fetch_row_version() is back.  The second patch now
implements a concept of LazyTupleTableSlot, a slot which gets
allocated only when needed.  Also, there is more minor refactoring and
more comments.

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

07 March 2023, 04:50:29

Hi,

On 2023-03-02 14:28:56 +0400, Pavel Borisov wrote:
> 2. Heap updates with low tuple concurrency:
> Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300 --unlogged-tables)
> Update 3*10^7 rows, 50 conns (pgbench postgres -f
> ./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)
> 
> Result: Both patches and master are the same within a tolerance of
> less than 0.7%.

What exactly does that mean? I would definitely not want to accept a 0.7%
regression of the uncontended case to benefit the contended case here...

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

07 March 2023, 05:09:56

On Tue, Mar 7, 2023 at 4:50 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-02 14:28:56 +0400, Pavel Borisov wrote:
> > 2. Heap updates with low tuple concurrency:
> > Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300 --unlogged-tables)
> > Update 3*10^7 rows, 50 conns (pgbench postgres -f
> > ./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)
> >
> > Result: Both patches and master are the same within a tolerance of
> > less than 0.7%.
>
> What exactly does that mean? I would definitely not want to accept a 0.7%
> regression of the uncontended case to benefit the contended case here...

I don't know what exactly Pavel meant, but average overall numbers for
low concurrency are.
master: 420401 (stddev of average 233)
patchset v11: 420111 (stddev of average 199)
The difference is less than 0.1% and that is very safely within the error.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

07 March 2023, 19:26:19

Hi, Andres and Alexander!

On Tue, 7 Mar 2023, 10:10 Alexander Korotkov, <aekorotkov@gmail.com> wrote:

On Tue, Mar 7, 2023 at 4:50 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-02 14:28:56 +0400, Pavel Borisov wrote:
> > 2. Heap updates with low tuple concurrency:
> > Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300 --unlogged-tables)
> > Update 3*10^7 rows, 50 conns (pgbench postgres -f
> > ./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)
> >
> > Result: Both patches and master are the same within a tolerance of
> > less than 0.7%.
>
> What exactly does that mean? I would definitely not want to accept a 0.7%
> regression of the uncontended case to benefit the contended case here...

I don't know what exactly Pavel meant, but average overall numbers for
low concurrency are.
master: 420401 (stddev of average 233)
patchset v11: 420111 (stddev of average 199)
The difference is less than 0.1% and that is very safely within the error.

Yes, the only thing that I meant is that for low-concurrency case the results between patch and master are within the difference between repeated series of measurements. So I concluded that the test can not prove any difference between patch and master.

I haven't meant or written there is some performance degradation.

Alexander, I suppose did an extra step and calculated overall average and stddev, from raw data provided. Thanks!

Regards,

Pavel.

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

08 March 2023, 02:17:21

On Tue, Mar 7, 2023 at 7:26 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 7 Mar 2023, 10:10 Alexander Korotkov, <aekorotkov@gmail.com> wrote:
>> I don't know what exactly Pavel meant, but average overall numbers for
>> low concurrency are.
>> master: 420401 (stddev of average 233)
>> patchset v11: 420111 (stddev of average 199)
>> The difference is less than 0.1% and that is very safely within the error.
>
>
> Yes, the only thing that I meant is that for low-concurrency case the results between patch and master are within the
differencebetween repeated series of measurements. So I concluded that the test can not prove any difference between
patchand master. 
>
> I haven't meant or written there is some performance degradation.
>
> Alexander, I suppose did an extra step and calculated overall average and stddev, from raw data provided. Thanks!

Pavel, thank you for verifying this.

Could you, please, rerun performance benchmarks for the v13?  It
introduces LazyTupleTableSlot, which shouldn't do any measurable
impact on performance.  But still.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

08 March 2023, 04:21:57

Hi,

On 2023-03-07 04:45:32 +0300, Alexander Korotkov wrote:
> The second patch now implements a concept of LazyTupleTableSlot, a slot
> which gets allocated only when needed.  Also, there is more minor
> refactoring and more comments.

This patch already is pretty big for what it actually improves. Introducing
even infrastructure to get a not that big win, in a not particularly
interesting, extreme, workload...

What is motivating this?

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

10 March 2023, 11:47:56

On Wed, Mar 8, 2023 at 4:22 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-07 04:45:32 +0300, Alexander Korotkov wrote:
> > The second patch now implements a concept of LazyTupleTableSlot, a slot
> > which gets allocated only when needed.  Also, there is more minor
> > refactoring and more comments.
>
> This patch already is pretty big for what it actually improves. Introducing
> even infrastructure to get a not that big win, in a not particularly
> interesting, extreme, workload...

It's true that the win isn't dramatic.  But can't agree that workload
isn't interesting.  In my experience, high-contention over limited set
of row is something that frequently happen is production.  I
personally took part in multiple investigations over such workloads.

> What is motivating this?

Right, the improvement this patch gives to heap is not the full
motivation.  Another motivation is improvement it gives to TableAM
API.  Our current API implies that the effort on locating the tuple by
tid is small.  This is more or less true for heap, where we just need
to pin and lock the buffer.  But imagine other TableAM
implementations, where locating a tuple is more expensive.  Current
API insist that we do that twice in update attempt and lock.  Doing
that in single call could give such TableAM's singification economy
(but even for heap it's something).  I'm working on such TableAM: it's
OrioleDB which implements index-organized tables.  And I know there
are other examples (for instance, zedstore), where TID lookup includes
some indirection.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Chris Travers

Date:

10 March 2023, 20:16:40

"Right, the improvement this patch gives to the heap is not the full motivation. Another motivation is the improvement
itgives to TableAM API. Our current API implies that the effort on locating the tuple by tid is small. This is more or
lesstrue for the heap, where we just need to pin and lock the buffer. But imagine other TableAM implementations, where
locatinga tuple is more expensive."
 

Yeah. Our TableAM API is a very nice start to getting pluggable storage, but we still have a long ways to go to have an
abilityto really provide a wide variety of pluggable storage engines.
 

In particular, the following approaches are likely to have much more expensive tid lookups:
 - columnar storage (may require a lot of random IO to reconstruct a tuple)
 - index oriented storage (tid no longer physically locatable in the file via seek)
 - compressed cold storage like pg_ctyogen (again seek may be problematic).

To my mind I think the performance benefits are a nice side benefit, but the main interest I have on this is regarding
improvementsin the TableAM capabilities.  I cannot see how to do this without a lot more infrastructure.

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

12 March 2023, 19:05:47

On Fri, Mar 10, 2023 at 8:17 PM Chris Travers <chris.travers@gmail.com> wrote:
> "Right, the improvement this patch gives to the heap is not the full motivation. Another motivation is the
improvementit gives to TableAM API. Our current API implies that the effort on locating the tuple by tid is small. This
ismore or less true for the heap, where we just need to pin and lock the buffer. But imagine other TableAM
implementations,where locating a tuple is more expensive." 
>
> Yeah. Our TableAM API is a very nice start to getting pluggable storage, but we still have a long ways to go to have
anability to really provide a wide variety of pluggable storage engines. 
>
> In particular, the following approaches are likely to have much more expensive tid lookups:
>  - columnar storage (may require a lot of random IO to reconstruct a tuple)
>  - index oriented storage (tid no longer physically locatable in the file via seek)
>  - compressed cold storage like pg_ctyogen (again seek may be problematic).
>
> To my mind I think the performance benefits are a nice side benefit, but the main interest I have on this is
regardingimprovements in the TableAM capabilities.  I cannot see how to do this without a lot more infrastructure. 

Chris, thank you for your feedback.

The revised patch set is attached.  Some comments are improved.  Also,
we implicitly skip the new facility for the MERGE case.  As I get Dean
Rasheed is going to revise the locking for MERGE soon [1].

Pavel, could you please re-run your test case on the revised patch?

1. https://www.postgresql.org/message-id/CAEZATCU9e9Ccbi70yNbCcF7xvZ+zrjiD0_6eEq2zEZA1p+707A@mail.gmail.com

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

21 March 2023, 01:25:11

Hi!

On Sun, Mar 12, 2023 at 7:05 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> The revised patch set is attached.  Some comments are improved.  Also,
> we implicitly skip the new facility for the MERGE case.  As I get Dean
> Rasheed is going to revise the locking for MERGE soon [1].
>
> Pavel, could you please re-run your test case on the revised patch?

I found the experiments made by Pavel [1] hard to reproduce due to the
high variation of the TPS.  Instead, I constructed a different
benchmark, which includes multiple updates (40 rows) in one query, and
run it on c5d.18xlarge.  That produces stable performance results as
well as measurable performance benefits of the patch.

I found that patchsets v11 and v14 not showing any performance
improvements over v10.  v10 is also much less invasive for
heap-related code.  This is why I made v15 using the v10 approach and
porting  LazyTupleTableSlot and improved comments there.  I think this
should address some of Andres's complaints regarding introducing too
much infrastructure [2].

The average results for high concurrency case (errors are given for a
95% confidence level) are given below.  We can see that v15 gives a
measurable performance improvement.

master = 40084 +- 447 tps
patchset v10 = 41761 +- 1117 tps
patchset v11 = 41473 +- 773 tps
patchset v14 = 40966 +- 1008 tps
patchset v15 = 42855 +- 977 tps

The average results for low concurrency case (errors are given for a
95% confidence level) are given below. It verifies that the patch
introduces no overhead in the low concurrency case.

master = 50626 +- 784 tps
patchset v15 = 51297 +- 876 tps

See attachments for raw experiment data and scripts.

So, as we can see patch gives a small performance improvement for the
heap in edge high concurrency case.  But also it improves table AM API
for future use cases [3][4].

I'm going to push patchset v15 if no objections.

Links
1. https://www.postgresql.org/message-id/CALT9ZEHKdCF_jCoK2ErUuUtCuYPf82%2BZr1XE5URzneSFxz3zqA%40mail.gmail.com
2. https://www.postgresql.org/message-id/20230308012157.wo73y22ll2cojpvk%40awork3.anarazel.de
3. https://www.postgresql.org/message-id/CAPpHfdu1dqqcTz9V9iG-ZRewYAFL2VhizwfiN5SW%3DZ%2B1rj99-g%40mail.gmail.com
4. https://www.postgresql.org/message-id/167846860062.628976.2440696515718158538.pgcf%40coridan.postgresql.org

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

23 March 2023, 03:30:03

Hi,

On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> I'm going to push patchset v15 if no objections.

Just saw that this went in - didn't catch up with the thread before,
unfortunately. At the very least I'd like to see some more work on cleaning up
the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
hazards - I realize that there's some of those already, but I don't think we
should go further down that route. As far as I can tell there's no need for
any of this to be macros.


> From a8b4e8a7b27815e013ea07b8cc9ac68541a9ac07 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Tue, 21 Mar 2023 00:34:15 +0300
> Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in
>  ExecUpdate()/ExecDelete()
>
> When we lock tuple using table_tuple_lock() then we at the same time fetch
> the locked tuple to the slot.  In this case we can skip extra
> table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple
> and nobody can change it concurrently since it's locked.
>
> Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
> Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
> Reviewed-by: Andres Freund, Chris Travers
> ---
>  src/backend/executor/nodeModifyTable.c | 48 +++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 13 deletions(-)
>
> diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
> index 3a673895082..93ebfdbb0d8 100644
> --- a/src/backend/executor/nodeModifyTable.c
> +++ b/src/backend/executor/nodeModifyTable.c
> @@ -1559,6 +1559,22 @@ ldelete:
>                      {
>                          case TM_Ok:
>                              Assert(context->tmfd.traversed);
> +
> +                            /*
> +                             * Save locked tuple for further processing of
> +                             * RETURNING clause.
> +                             */
> +                            if (processReturning &&
> +                                resultRelInfo->ri_projectReturning &&
> +                                !resultRelInfo->ri_FdwRoutine)
> +                            {
> +                                TupleTableSlot *returningSlot;
> +
> +                                returningSlot = ExecGetReturningSlot(estate, resultRelInfo);
> +                                ExecCopySlot(returningSlot, inputslot);
> +                                ExecMaterializeSlot(returningSlot);
> +                            }
> +
>                              epqslot = EvalPlanQual(context->epqstate,
>                                                     resultRelationDesc,
>                                                     resultRelInfo->ri_RangeTableIndex,

This seems a bit byzantine. We use inputslot = EvalPlanQualSlot(...) to make
EvalPlanQual() a bit cheaper, because that avoids a slot copy inside
EvalPlanQual().  But now we copy and materialize that slot anyway - and we do
so even if EPQ fails. And we afaics also do it when epqreturnslot is set, in
which case we'll afaics never use the copied slot.

Read the next paragraph below before replying to the above - I don't think
this is right for other reasons:

> @@ -1673,12 +1689,17 @@ ldelete:
>          }
>          else
>          {
> +            /*
> +             * Tuple can be already fetched to the returning slot in case
> +             * we've previously locked it.  Fetch the tuple only if the slot
> +             * is empty.
> +             */
>              slot = ExecGetReturningSlot(estate, resultRelInfo);
>              if (oldtuple != NULL)
>              {
>                  ExecForceStoreHeapTuple(oldtuple, slot, false);
>              }
> -            else
> +            else if (TupIsNull(slot))
>              {
>                  if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
>                                                     SnapshotAny, slot))


I don't think this is correct as-is - what if ExecDelete() is called with some
older tuple in the returning slot? If we don't enter the TM_Updated path, it
won't get updated, and we'll return the wrong tuple.  It certainly looks
possible to me - consider what happens if a first tuple enter the TM_Updated
path but then fails EvalPlanQual(). If a second tuple is deleted without
entering the TM_Updated path, the wrong tuple will be used for RETURNING.

<plays around with isolationtester>

Yes, indeed. The attached isolationtest breaks with 764da7710bf.


I think it's entirely sensible to avoid the tuple fetching in ExecDelete(),
but it needs a bit less localized work. Instead of using the presence of a
tuple in the returning slot, ExecDelete() should track whether it already has
fetched the deleted tuple.

Or alternatively, do the work to avoid refetching the tuple for the much more
common case of not needing EPQ at all.

I guess this really is part of my issue with this change - it optimizes the
rare case, while not addressing the same inefficiency in the common case.



> @@ -299,14 +305,46 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot,
>  static TM_Result
>  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
>                      Snapshot snapshot, Snapshot crosscheck, bool wait,
> -                    TM_FailureData *tmfd, bool changingPart)
> +                    TM_FailureData *tmfd, bool changingPart,
> +                    LazyTupleTableSlot *lockedSlot)
>  {
> +    TM_Result    result;
> +
>      /*
>       * Currently Deleting of index tuples are handled at vacuum, in case if
>       * the storage itself is cleaning the dead tuples by itself, it is the
>       * time to call the index tuple deletion also.
>       */
> -    return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> +    result = heap_delete(relation, tid, cid, crosscheck, wait,
> +                         tmfd, changingPart);
> +
> +    /*
> +     * If the tuple has been concurrently updated, then get the lock on it.
> +     * (Do this if caller asked for tat by providing a 'lockedSlot'.) With the
> +     * lock held retry of delete should succeed even if there are more
> +     * concurrent update attempts.
> +     */
> +    if (result == TM_Updated && lockedSlot)
> +    {
> +        TupleTableSlot *evalSlot;
> +
> +        Assert(wait);
> +
> +        evalSlot = LAZY_TTS_EVAL(lockedSlot);
> +        result = heapam_tuple_lock_internal(relation, tid, snapshot,
> +                                            evalSlot, cid, LockTupleExclusive,
> +                                            LockWaitBlock,
> +                                            TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> +                                            tmfd, true);

Oh, huh? As I mentioned before, the unconditional use of LockWaitBlock means
the wait parameter is ignored.

I'm frankly getting annoyed here.


> +/*
> + * This routine does the work for heapam_tuple_lock(), but also support
> + * `updated` argument to re-use the work done by heapam_tuple_update() or
> + * heapam_tuple_delete() on figuring out that tuple was concurrently updated.
> + */
> +static TM_Result
> +heapam_tuple_lock_internal(Relation relation, ItemPointer tid,
> +                           Snapshot snapshot, TupleTableSlot *slot,
> +                           CommandId cid, LockTupleMode mode,
> +                           LockWaitPolicy wait_policy, uint8 flags,
> +                           TM_FailureData *tmfd, bool updated)

Why is the new parameter named 'updated'?


>  {
>      BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
>      TM_Result    result;
> -    Buffer        buffer;
> +    Buffer        buffer = InvalidBuffer;
>      HeapTuple    tuple = &bslot->base.tupdata;
>      bool        follow_updates;
>
> @@ -374,16 +455,26 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
>
>  tuple_lock_retry:
>      tuple->t_self = *tid;
> -    result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
> -                             follow_updates, &buffer, tmfd);
> +    if (!updated)
> +        result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
> +                                 follow_updates, &buffer, tmfd);
> +    else
> +        result = TM_Updated;
>
>      if (result == TM_Updated &&
>          (flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
>      {
> -        /* Should not encounter speculative tuple on recheck */
> -        Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
> +        if (!updated)
> +        {
> +            /* Should not encounter speculative tuple on recheck */
> +            Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));

Hm, why is it ok to encounter speculative tuples in the updated case? Oh, I
guess you got failures because slot doesn't point anywhere at this point.


> -        ReleaseBuffer(buffer);
> +            ReleaseBuffer(buffer);
> +        }
> +        else
> +        {
> +            updated = false;
> +        }
>
>          if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
>          {

Which means this is completely bogus now?

    HeapTuple    tuple = &bslot->base.tupdata;

In the first iteration this just points to the newly created slot. Which
doesn't have a tuple stored in it. So the above checks some uninitialized
memory.


Giving up at this point.


This doesn't seem ready to have been committed.

Greetings,

Andres Freund

Attachment

0001-epq-delete-returning-spec-test.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

23 March 2023, 18:08:36

Hi!

On Thu, Mar 23, 2023 at 3:30 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> > I'm going to push patchset v15 if no objections.
>
> Just saw that this went in - didn't catch up with the thread before,
> unfortunately. At the very least I'd like to see some more work on cleaning up
> the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
> hazards - I realize that there's some of those already, but I don't think we
> should go further down that route. As far as I can tell there's no need for
> any of this to be macros.

Thank you for taking a look at this, even post-commit.  Regarding
marcos, do you think inline functions would be good instead?

> > From a8b4e8a7b27815e013ea07b8cc9ac68541a9ac07 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Tue, 21 Mar 2023 00:34:15 +0300
> > Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in
> >  ExecUpdate()/ExecDelete()
> >
> > When we lock tuple using table_tuple_lock() then we at the same time fetch
> > the locked tuple to the slot.  In this case we can skip extra
> > table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple
> > and nobody can change it concurrently since it's locked.
> >
> > Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
> > Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
> > Reviewed-by: Andres Freund, Chris Travers
> > ---
> >  src/backend/executor/nodeModifyTable.c | 48 +++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
> > index 3a673895082..93ebfdbb0d8 100644
> > --- a/src/backend/executor/nodeModifyTable.c
> > +++ b/src/backend/executor/nodeModifyTable.c
> > @@ -1559,6 +1559,22 @@ ldelete:
> >                                       {
> >                                               case TM_Ok:
> >                                                       Assert(context->tmfd.traversed);
> > +
> > +                                                     /*
> > +                                                      * Save locked tuple for further processing of
> > +                                                      * RETURNING clause.
> > +                                                      */
> > +                                                     if (processReturning &&
> > +                                                             resultRelInfo->ri_projectReturning &&
> > +                                                             !resultRelInfo->ri_FdwRoutine)
> > +                                                     {
> > +                                                             TupleTableSlot *returningSlot;
> > +
> > +                                                             returningSlot = ExecGetReturningSlot(estate,
resultRelInfo);
> > +                                                             ExecCopySlot(returningSlot, inputslot);
> > +                                                             ExecMaterializeSlot(returningSlot);
> > +                                                     }
> > +
> >                                                       epqslot = EvalPlanQual(context->epqstate,
> >
resultRelationDesc,
> >
resultRelInfo->ri_RangeTableIndex,
>
> This seems a bit byzantine. We use inputslot = EvalPlanQualSlot(...) to make
> EvalPlanQual() a bit cheaper, because that avoids a slot copy inside
> EvalPlanQual().  But now we copy and materialize that slot anyway - and we do
> so even if EPQ fails. And we afaics also do it when epqreturnslot is set, in
> which case we'll afaics never use the copied slot.

Yes, I agree that there is a redundancy we could avoid.

> Read the next paragraph below before replying to the above - I don't think
> this is right for other reasons:
>
> > @@ -1673,12 +1689,17 @@ ldelete:
> >               }
> >               else
> >               {
> > +                     /*
> > +                      * Tuple can be already fetched to the returning slot in case
> > +                      * we've previously locked it.  Fetch the tuple only if the slot
> > +                      * is empty.
> > +                      */
> >                       slot = ExecGetReturningSlot(estate, resultRelInfo);
> >                       if (oldtuple != NULL)
> >                       {
> >                               ExecForceStoreHeapTuple(oldtuple, slot, false);
> >                       }
> > -                     else
> > +                     else if (TupIsNull(slot))
> >                       {
> >                               if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
> >                                                                                                  SnapshotAny,
slot))
>
>
> I don't think this is correct as-is - what if ExecDelete() is called with some
> older tuple in the returning slot? If we don't enter the TM_Updated path, it
> won't get updated, and we'll return the wrong tuple.  It certainly looks
> possible to me - consider what happens if a first tuple enter the TM_Updated
> path but then fails EvalPlanQual(). If a second tuple is deleted without
> entering the TM_Updated path, the wrong tuple will be used for RETURNING.
>
> <plays around with isolationtester>
>
> Yes, indeed. The attached isolationtest breaks with 764da7710bf.

Thank you for cathing this!  This is definitely a bug.

> I think it's entirely sensible to avoid the tuple fetching in ExecDelete(),
> but it needs a bit less localized work. Instead of using the presence of a
> tuple in the returning slot, ExecDelete() should track whether it already has
> fetched the deleted tuple.
>
> Or alternatively, do the work to avoid refetching the tuple for the much more
> common case of not needing EPQ at all.
>
> I guess this really is part of my issue with this change - it optimizes the
> rare case, while not addressing the same inefficiency in the common case.

I'm going to fix this for ExecDelete().  Avoiding refetching the tuple
in more common case is something I'm definitely very interested in.
But I would leave it for the future.

> > @@ -299,14 +305,46 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot,
> >  static TM_Result
> >  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
> >                                       Snapshot snapshot, Snapshot crosscheck, bool wait,
> > -                                     TM_FailureData *tmfd, bool changingPart)
> > +                                     TM_FailureData *tmfd, bool changingPart,
> > +                                     LazyTupleTableSlot *lockedSlot)
> >  {
> > +     TM_Result       result;
> > +
> >       /*
> >        * Currently Deleting of index tuples are handled at vacuum, in case if
> >        * the storage itself is cleaning the dead tuples by itself, it is the
> >        * time to call the index tuple deletion also.
> >        */
> > -     return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> > +     result = heap_delete(relation, tid, cid, crosscheck, wait,
> > +                                              tmfd, changingPart);
> > +
> > +     /*
> > +      * If the tuple has been concurrently updated, then get the lock on it.
> > +      * (Do this if caller asked for tat by providing a 'lockedSlot'.) With the
> > +      * lock held retry of delete should succeed even if there are more
> > +      * concurrent update attempts.
> > +      */
> > +     if (result == TM_Updated && lockedSlot)
> > +     {
> > +             TupleTableSlot *evalSlot;
> > +
> > +             Assert(wait);
> > +
> > +             evalSlot = LAZY_TTS_EVAL(lockedSlot);
> > +             result = heapam_tuple_lock_internal(relation, tid, snapshot,
> > +                                                                                     evalSlot, cid,
LockTupleExclusive,
> > +                                                                                     LockWaitBlock,
> > +
TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > +                                                                                     tmfd, true);
>
> Oh, huh? As I mentioned before, the unconditional use of LockWaitBlock means
> the wait parameter is ignored.
>
> I'm frankly getting annoyed here.

lockedSlot shoudln't be provided when wait == false.  The assertion
above expresses this intention.  However, the code lacking of comment
directly expressing this idea.

And sorry for getting you annoyed.  The relevant comment should be
already there.

> > +/*
> > + * This routine does the work for heapam_tuple_lock(), but also support
> > + * `updated` argument to re-use the work done by heapam_tuple_update() or
> > + * heapam_tuple_delete() on figuring out that tuple was concurrently updated.
> > + */
> > +static TM_Result
> > +heapam_tuple_lock_internal(Relation relation, ItemPointer tid,
> > +                                                Snapshot snapshot, TupleTableSlot *slot,
> > +                                                CommandId cid, LockTupleMode mode,
> > +                                                LockWaitPolicy wait_policy, uint8 flags,
> > +                                                TM_FailureData *tmfd, bool updated)
>
> Why is the new parameter named 'updated'?

To indicate that we know that we're locking the updated tuple.
Probably not descriptive enough.

> >  {
> >       BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> >       TM_Result       result;
> > -     Buffer          buffer;
> > +     Buffer          buffer = InvalidBuffer;
> >       HeapTuple       tuple = &bslot->base.tupdata;
> >       bool            follow_updates;
> >
> > @@ -374,16 +455,26 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
> >
> >  tuple_lock_retry:
> >       tuple->t_self = *tid;
> > -     result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
> > -                                                      follow_updates, &buffer, tmfd);
> > +     if (!updated)
> > +             result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy,
> > +                                                              follow_updates, &buffer, tmfd);
> > +     else
> > +             result = TM_Updated;
> >
> >       if (result == TM_Updated &&
> >               (flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
> >       {
> > -             /* Should not encounter speculative tuple on recheck */
> > -             Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
> > +             if (!updated)
> > +             {
> > +                     /* Should not encounter speculative tuple on recheck */
> > +                     Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
>
> Hm, why is it ok to encounter speculative tuples in the updated case? Oh, I
> guess you got failures because slot doesn't point anywhere at this point.

Yes, given that tuple is not accessible, this assert can't work.  I've
a couple ideas on how to replace it.
1) As I get the primary point of of this assertion is to be sure that
tmfd->ctid really points us to a correct tuple (while speculative
token doesn't do).  So, for the 'updated' case we can check tmfd->ctid
directly (or even for both cases?).  However, I can't find the
relevant macro for this, probably
2) We can check that we don't return TM_Updated from heap_update() and
heap_delete(), when old tuple is speculative token.

> > -             ReleaseBuffer(buffer);
> > +                     ReleaseBuffer(buffer);
> > +             }
> > +             else
> > +             {
> > +                     updated = false;
> > +             }
> >
> >               if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
> >               {
>
> Which means this is completely bogus now?
>
>         HeapTuple       tuple = &bslot->base.tupdata;
>
> In the first iteration this just points to the newly created slot. Which
> doesn't have a tuple stored in it. So the above checks some uninitialized
> memory.

No, this is not so.  tuple->t_self is unconditionally assigned few
lines before.  So, it can't be uninitialized memory given that *tid is
initialized.

I seriously doubt this patch could pass the tests if that comparison
would use uninitialized memory.

> Giving up at this point.
>
>
> This doesn't seem ready to have been committed.

Yep, that could be better.  Given, that item pointer comparison
doesn't really use uninitialized memory, probably not as bad as you
thought at the first glance.

I'm going to post a patch to address the issues you've raised in next 24h.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

23 March 2023, 19:50:20

Hi,

On 2023-03-23 18:08:36 +0300, Alexander Korotkov wrote:
> On Thu, Mar 23, 2023 at 3:30 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> > > I'm going to push patchset v15 if no objections.
> >
> > Just saw that this went in - didn't catch up with the thread before,
> > unfortunately. At the very least I'd like to see some more work on cleaning up
> > the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
> > hazards - I realize that there's some of those already, but I don't think we
> > should go further down that route. As far as I can tell there's no need for
> > any of this to be macros.
> 
> Thank you for taking a look at this, even post-commit.  Regarding
> marcos, do you think inline functions would be good instead?

Yes.


> > I think it's entirely sensible to avoid the tuple fetching in ExecDelete(),
> > but it needs a bit less localized work. Instead of using the presence of a
> > tuple in the returning slot, ExecDelete() should track whether it already has
> > fetched the deleted tuple.
> >
> > Or alternatively, do the work to avoid refetching the tuple for the much more
> > common case of not needing EPQ at all.
> >
> > I guess this really is part of my issue with this change - it optimizes the
> > rare case, while not addressing the same inefficiency in the common case.
> 
> I'm going to fix this for ExecDelete().  Avoiding refetching the tuple
> in more common case is something I'm definitely very interested in.
> But I would leave it for the future.

It doesn't seem like a good plan to start with the rare and then address the
common case. The solution for the common case might solve the rare case as
well. One way to make to fix the common case would be to return a tuple
suitable for returning computation as part of the input plan - which would
also fix the EPQ case, since we could just use the EPQ output. Of course there
are complications like triggers, but they seem like they could be dealt with.


> > > @@ -299,14 +305,46 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot,
> > >  static TM_Result
> > >  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
> > >                                       Snapshot snapshot, Snapshot crosscheck, bool wait,
> > > -                                     TM_FailureData *tmfd, bool changingPart)
> > > +                                     TM_FailureData *tmfd, bool changingPart,
> > > +                                     LazyTupleTableSlot *lockedSlot)
> > >  {
> > > +     TM_Result       result;
> > > +
> > >       /*
> > >        * Currently Deleting of index tuples are handled at vacuum, in case if
> > >        * the storage itself is cleaning the dead tuples by itself, it is the
> > >        * time to call the index tuple deletion also.
> > >        */
> > > -     return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> > > +     result = heap_delete(relation, tid, cid, crosscheck, wait,
> > > +                                              tmfd, changingPart);
> > > +
> > > +     /*
> > > +      * If the tuple has been concurrently updated, then get the lock on it.
> > > +      * (Do this if caller asked for tat by providing a 'lockedSlot'.) With the
> > > +      * lock held retry of delete should succeed even if there are more
> > > +      * concurrent update attempts.
> > > +      */
> > > +     if (result == TM_Updated && lockedSlot)
> > > +     {
> > > +             TupleTableSlot *evalSlot;
> > > +
> > > +             Assert(wait);
> > > +
> > > +             evalSlot = LAZY_TTS_EVAL(lockedSlot);
> > > +             result = heapam_tuple_lock_internal(relation, tid, snapshot,
> > > +                                                                                     evalSlot, cid,
LockTupleExclusive,
> > > +                                                                                     LockWaitBlock,
> > > +
TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > > +                                                                                     tmfd, true);
> >
> > Oh, huh? As I mentioned before, the unconditional use of LockWaitBlock means
> > the wait parameter is ignored.
> >
> > I'm frankly getting annoyed here.
> 
> lockedSlot shoudln't be provided when wait == false.  The assertion
> above expresses this intention.  However, the code lacking of comment
> directly expressing this idea.

I don't think a comment here is going to fix things. You can't expect somebody
trying to use tableam to look into the guts of a heapam function to understand
the API constraints to this degree. And there's afaict no comments in tableam
that indicate any of this.

I also just don't see why this is a sensible constraint? Why should this only
work if wait == false?


> > > +/*
> > > + * This routine does the work for heapam_tuple_lock(), but also support
> > > + * `updated` argument to re-use the work done by heapam_tuple_update() or
> > > + * heapam_tuple_delete() on figuring out that tuple was concurrently updated.
> > > + */
> > > +static TM_Result
> > > +heapam_tuple_lock_internal(Relation relation, ItemPointer tid,
> > > +                                                Snapshot snapshot, TupleTableSlot *slot,
> > > +                                                CommandId cid, LockTupleMode mode,
> > > +                                                LockWaitPolicy wait_policy, uint8 flags,
> > > +                                                TM_FailureData *tmfd, bool updated)
> >
> > Why is the new parameter named 'updated'?
> 
> To indicate that we know that we're locking the updated tuple.
> Probably not descriptive enough.

Given it's used for deletions, I'd say so.


> > > -             ReleaseBuffer(buffer);
> > > +                     ReleaseBuffer(buffer);
> > > +             }
> > > +             else
> > > +             {
> > > +                     updated = false;
> > > +             }
> > >
> > >               if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
> > >               {
> >
> > Which means this is completely bogus now?
> >
> >         HeapTuple       tuple = &bslot->base.tupdata;
> >
> > In the first iteration this just points to the newly created slot. Which
> > doesn't have a tuple stored in it. So the above checks some uninitialized
> > memory.
> 
> No, this is not so.  tuple->t_self is unconditionally assigned few
> lines before.  So, it can't be uninitialized memory given that *tid is
> initialized.

Ugh. This means you're basically leaving uninitialized / not initialized state
in the other portions of the tuple/slot, without even documenting that. The
code was ugly starting out, but this certainly makes it worse.

There's also no comment explaining that tmfd suddenly is load-bearing *input*
into heapam_tuple_lock_internal(), whereas previously it was purely an output
parameter - and is documented as such:
 * Output parameters:
 *    *slot: contains the target tuple
 *    *tmfd: filled in failure cases (see below)

This is an *awful* API.


> I seriously doubt this patch could pass the tests if that comparison
> would use uninitialized memory.

IDK about that - it's hard to exercise this code in the regression tests, and
plenty things are zero initialized, which often makes things appear to work in
the first iteration.


> > Giving up at this point.
> >
> >
> > This doesn't seem ready to have been committed.
> 
> Yep, that could be better.  Given, that item pointer comparison
> doesn't really use uninitialized memory, probably not as bad as you
> thought at the first glance.

The details of how it's bad maybe differ slightly, but looking at it a bit
longer I also found new things. So I don't really think it's better than what
I though it was.

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

24 March 2023, 03:39:12

Hi,

An off-list conversation veered on-topic again. Reposting for posterity:

On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> On Thu, Mar 23, 2023 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
> > I seriously doubt that solving this at the tuple locking level is the right
> > thing. If we want to avoid refetching tuples, why don't we add a parameter to
> > delete/update to generally put the old tuple version into a slot, not just as
> > an optimization for a subsequent lock_tuple()? Then we could remove all
> > refetching tuples for triggers. It'd also provide the basis for adding support
> > for referencing the OLD version in RETURNING, which'd be quite powerful.
>
> I spent some time thinking on this.  Does our attempt to update/delete
> tuple imply that we've already fetched the old tuple version?

Yes, but somewhat "far away", below the ExecProcNode() in ExecModifyTable(). I
don't think we can rely on that. The old tuple is just identified via a junk
attribute (c.f. "For UPDATE/DELETE/MERGE, fetch the row identity info for the
tuple..."). The NEW tuple is computed in the target list of the source query.
It's possible that for some simpler cases we could figure out that the
returned slot is the "old" tuple, but it'd be hard to make that work.

Alternatively we could evaluate returning as part of the source query
plan. While that'd work nicely for the EPQ cases (the EPQ evaluation would
compute the new values), it could not be relied upon for before triggers.

It might or might not be a win to try to do so - if you have a selective
query, ferrying around the entire source tuple might cost more than it
saves.

> We needed that at least to do initial qual check and calculation of the new
> tuple (for update case).

The NEW tuple is computed in the source query, as I mentioned, I don't think
we easily can get access to the source row in the general case.

> We currently may not have the old tuple at hand at the time we do
> table_tuple_update()/table_tuple_delete().  But that seems to be just and
> issue of our executor code.  Do it worth to make table AM fetch the old
> *unmodified* tuple given that we've already fetched it for sure?

Not unconditionally (e.g. if you neither have triggers, nor RETURNING, there's
not much point, unless the query is simple enough that we could make it
free). But in the other cases it seems beneficial. The caller would reliably
know whether they want the source tuple to be fetched, or not.

We could make it so that iff we already have the "old" tuple in the slot,
it'll not be put in there "again", but if it's not the right row version, it
is.

We could use the same approach to make the "happy path" in update/delete
cheaper. If the source tuple is provided, heap_delete(), heap_update() won't
need to do a ReadBuffer(), they could just IncrBufferRefCount(). That'd be a
quite substantial win.

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

27 March 2023, 13:49:22

Hi!

On Fri, Mar 24, 2023 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
> > > I seriously doubt that solving this at the tuple locking level is the right
> > > thing. If we want to avoid refetching tuples, why don't we add a parameter to
> > > delete/update to generally put the old tuple version into a slot, not just as
> > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > refetching tuples for triggers. It'd also provide the basis for adding support
> > > for referencing the OLD version in RETURNING, which'd be quite powerful.

After some thoughts, I think I like idea of fetching old tuple version
in update/delete.  Everything that evades extra tuple fetching and do
more of related work in a single table AM call, makes table AM API
more flexible.

I'm working on patch implementing this.  I'm going to post it later today.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

29 March 2023, 20:34:10

Andres,

On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Fri, Mar 24, 2023 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
> > > > I seriously doubt that solving this at the tuple locking level is the right
> > > > thing. If we want to avoid refetching tuples, why don't we add a parameter to
> > > > delete/update to generally put the old tuple version into a slot, not just as
> > > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > > refetching tuples for triggers. It'd also provide the basis for adding support
> > > > for referencing the OLD version in RETURNING, which'd be quite powerful.
>
> After some thoughts, I think I like idea of fetching old tuple version
> in update/delete.  Everything that evades extra tuple fetching and do
> more of related work in a single table AM call, makes table AM API
> more flexible.
>
> I'm working on patch implementing this.  I'm going to post it later today.

Here is the patchset.  I'm continue to work on comments and refactoring.

My quick question is why do we need ri_TrigOldSlot for triggers?
Can't we just pass the old tuple for after row trigger in
ri_oldTupleSlot?

Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
extra tuple slot allocation.  But as I get in the end the tuple slot
allocation is just a single palloc.  I bet the effect would be
invisible in the benchmarks.

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

31 March 2023, 16:57:41

On Wed, Mar 29, 2023 at 8:34 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Fri, Mar 24, 2023 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > I seriously doubt that solving this at the tuple locking level is the right
> > > > > thing. If we want to avoid refetching tuples, why don't we add a parameter to
> > > > > delete/update to generally put the old tuple version into a slot, not just as
> > > > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > > > refetching tuples for triggers. It'd also provide the basis for adding support
> > > > > for referencing the OLD version in RETURNING, which'd be quite powerful.
> >
> > After some thoughts, I think I like idea of fetching old tuple version
> > in update/delete.  Everything that evades extra tuple fetching and do
> > more of related work in a single table AM call, makes table AM API
> > more flexible.
> >
> > I'm working on patch implementing this.  I'm going to post it later today.
>
> Here is the patchset.  I'm continue to work on comments and refactoring.
>
> My quick question is why do we need ri_TrigOldSlot for triggers?
> Can't we just pass the old tuple for after row trigger in
> ri_oldTupleSlot?
>
> Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
> extra tuple slot allocation.  But as I get in the end the tuple slot
> allocation is just a single palloc.  I bet the effect would be
> invisible in the benchmarks.

Sorry, previous patches don't even compile.  The fixed version is attached.
I'm going to post significantly revised patchset soon.

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

01 April 2023, 08:21:27

Hi,

On 2023-03-31 16:57:41 +0300, Alexander Korotkov wrote:
> On Wed, Mar 29, 2023 at 8:34 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Fri, Mar 24, 2023 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > > I seriously doubt that solving this at the tuple locking level is the right
> > > > > > thing. If we want to avoid refetching tuples, why don't we add a parameter to
> > > > > > delete/update to generally put the old tuple version into a slot, not just as
> > > > > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > > > > refetching tuples for triggers. It'd also provide the basis for adding support
> > > > > > for referencing the OLD version in RETURNING, which'd be quite powerful.
> > >
> > > After some thoughts, I think I like idea of fetching old tuple version
> > > in update/delete.  Everything that evades extra tuple fetching and do
> > > more of related work in a single table AM call, makes table AM API
> > > more flexible.
> > >
> > > I'm working on patch implementing this.  I'm going to post it later today.
> >
> > Here is the patchset.  I'm continue to work on comments and refactoring.
> >
> > My quick question is why do we need ri_TrigOldSlot for triggers?
> > Can't we just pass the old tuple for after row trigger in
> > ri_oldTupleSlot?
> >
> > Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
> > extra tuple slot allocation.  But as I get in the end the tuple slot
> > allocation is just a single palloc.  I bet the effect would be
> > invisible in the benchmarks.
> 
> Sorry, previous patches don't even compile.  The fixed version is attached.
> I'm going to post significantly revised patchset soon.

Given that the in-tree state has been broken for a week, I think it probably
is time to revert the commits that already went in.

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

01 April 2023, 10:24:09

Hi, Andres!

On Sat, 1 Apr 2023, 09:21 Andres Freund, <andres@anarazel.de> wrote:

Given that the in-tree state has been broken for a week, I think it probably
is time to revert the commits that already went in.

It seems that although the patch addressing the issues is not a quick fix, there is a big progress in it already. I propose to see it's status a week later and if it is not ready then to revert existing. Hope there are no other patches in the existing branch complained to suffer this.

Kind regards,

Pavel Borisov,

Supabase

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

02 April 2023, 03:37:19

.Hi!

On Sat, Apr 1, 2023 at 8:21 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-31 16:57:41 +0300, Alexander Korotkov wrote:
> > On Wed, Mar 29, 2023 at 8:34 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > On Fri, Mar 24, 2023 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> > > > > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > > > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > > > I seriously doubt that solving this at the tuple locking level is the right
> > > > > > > thing. If we want to avoid refetching tuples, why don't we add a parameter to
> > > > > > > delete/update to generally put the old tuple version into a slot, not just as
> > > > > > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > > > > > refetching tuples for triggers. It'd also provide the basis for adding support
> > > > > > > for referencing the OLD version in RETURNING, which'd be quite powerful.
> > > >
> > > > After some thoughts, I think I like idea of fetching old tuple version
> > > > in update/delete.  Everything that evades extra tuple fetching and do
> > > > more of related work in a single table AM call, makes table AM API
> > > > more flexible.
> > > >
> > > > I'm working on patch implementing this.  I'm going to post it later today.
> > >
> > > Here is the patchset.  I'm continue to work on comments and refactoring.
> > >
> > > My quick question is why do we need ri_TrigOldSlot for triggers?
> > > Can't we just pass the old tuple for after row trigger in
> > > ri_oldTupleSlot?
> > >
> > > Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
> > > extra tuple slot allocation.  But as I get in the end the tuple slot
> > > allocation is just a single palloc.  I bet the effect would be
> > > invisible in the benchmarks.
> >
> > Sorry, previous patches don't even compile.  The fixed version is attached.
> > I'm going to post significantly revised patchset soon.
>
> Given that the in-tree state has been broken for a week, I think it probably
> is time to revert the commits that already went in.

The revised patch is attached.  The most notable change is getting rid
of LazyTupleTableSlot.  Also get rid of complex computations to detect
how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
as an argument of ExecUpdate() and ExecDelete().  The price for this
is just preallocation of ri_oldTupleSlot before calling ExecDelete().
The slot allocation is quite cheap.  After all wrappers it's
table_slot_callbacks(), which is very cheap, single palloc() and few
fields initialization.  It doesn't seem reasonable to introduce an
infrastructure to evade this.

I think patch resolves all the major issues you've highlighted.  Even
if there are some minor things missed, I'd prefer to push this rather
than reverting the whole work.

------
Regards,
Alexander Korotkov

Attachment

0001-Revise-changes-in-764da7710b-and-11470f544e-v3.patch

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Andres Freund

Date:

02 April 2023, 03:47:18

Hi,

On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> On Sat, Apr 1, 2023 at 8:21 AM Andres Freund <andres@anarazel.de> wrote:
> > Given that the in-tree state has been broken for a week, I think it probably
> > is time to revert the commits that already went in.
> 
> The revised patch is attached.  The most notable change is getting rid
> of LazyTupleTableSlot.  Also get rid of complex computations to detect
> how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> as an argument of ExecUpdate() and ExecDelete().  The price for this
> is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> The slot allocation is quite cheap.  After all wrappers it's
> table_slot_callbacks(), which is very cheap, single palloc() and few
> fields initialization.  It doesn't seem reasonable to introduce an
> infrastructure to evade this.
> 
> I think patch resolves all the major issues you've highlighted.  Even
> if there are some minor things missed, I'd prefer to push this rather
> than reverting the whole work.

Shrug. You're designing new APIs, days before the feature freeze. This just
doesn't seem ready in time for 16. I certainly won't have time to look at it
sufficiently in the next 5 days.

Greetings,

Andres Freund

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

03 April 2023, 16:57:05

On Sun, Apr 2, 2023 at 3:47 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> > On Sat, Apr 1, 2023 at 8:21 AM Andres Freund <andres@anarazel.de> wrote:
> > > Given that the in-tree state has been broken for a week, I think it probably
> > > is time to revert the commits that already went in.
> >
> > The revised patch is attached.  The most notable change is getting rid
> > of LazyTupleTableSlot.  Also get rid of complex computations to detect
> > how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> > as an argument of ExecUpdate() and ExecDelete().  The price for this
> > is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> > The slot allocation is quite cheap.  After all wrappers it's
> > table_slot_callbacks(), which is very cheap, single palloc() and few
> > fields initialization.  It doesn't seem reasonable to introduce an
> > infrastructure to evade this.
> >
> > I think patch resolves all the major issues you've highlighted.  Even
> > if there are some minor things missed, I'd prefer to push this rather
> > than reverting the whole work.
>
> Shrug. You're designing new APIs, days before the feature freeze. This just
> doesn't seem ready in time for 16. I certainly won't have time to look at it
> sufficiently in the next 5 days.

OK.  Reverted.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

03 April 2023, 16:57:37

Hi, Alexander!

On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> On Sat, Apr 1, 2023 at 8:21 AM Andres Freund <andres@anarazel.de> wrote:
> > Given that the in-tree state has been broken for a week, I think it probably
> > is time to revert the commits that already went in.
>
> The revised patch is attached.  The most notable change is getting rid
> of LazyTupleTableSlot.  Also get rid of complex computations to detect
> how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> as an argument of ExecUpdate() and ExecDelete().  The price for this
> is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> The slot allocation is quite cheap.  After all wrappers it's
> table_slot_callbacks(), which is very cheap, single palloc() and few
> fields initialization.  It doesn't seem reasonable to introduce an
> infrastructure to evade this.
>
> I think patch resolves all the major issues you've highlighted.  Even
> if there are some minor things missed, I'd prefer to push this rather
> than reverting the whole work.

I looked into the latest patch v3.
In my view, it addresses all the issues discussed in [1]. Also, with
the pushing oldslot logic outside  code becomes more transparent. I've
added some very minor modifications to the code and comments in patch
v4-0001. Also, I'm for committing Andres' isolation test. I've added
some minor revisions to make the test run routinely among the other
isolation tests. The test could also be made a part of the existing
eval-plan-qual.spec, but I have left it separate yet.

Also, I think that signatures of ExecUpdate() and ExecDelete()
functions, especially the last one are somewhat overloaded with
different status bool variables added by different authors on
different occasions. If they are combined into some kind of status
variable, it would be nice. But as this doesn't touch API, is not
related to the current update/delete optimization, it could be
modified anytime in the future as well.

The changes that indeed touch API are adding TupleTableSlot and
conversion of bool wait flag into now four-state options variable for
tuple_update(), tuple_delete(), heap_update(), heap_delete() and
heap_lock_tuple() and a couple of Exec*DeleteTriggers(). I think they
are justified.

One thing that is not clear to me is that we pass oldSlot into
simple_table_tuple_update() whereas as per the comment on this
function "concurrent updates of
the target tuple is not expected (for example, because we have a lock
on the relation associated with the tuple)". It seems not to break
anything but maybe this could be simplified.

Overall I think the patch is good enough.

Regards,
Pavel Borisov,
Supabase.

[1] https://www.postgresql.org/message-id/CAPpHfdtwKb5UVXKkDQZYW8nQCODy0fY_S7mV5Z%2Bcg7urL%3DzDEA%40mail.gmail.com

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

03 April 2023, 17:12:09

Upon Alexander reverting patches v15 from master, I've rebased what
was correction patches v4 in a message above on a fresh master
(together with patches v15). The resulting patch v16 is attached.

Pavel.

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

04 April 2023, 01:25:46

On Mon, Apr 3, 2023 at 5:12 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> Upon Alexander reverting patches v15 from master, I've rebased what
> was correction patches v4 in a message above on a fresh master
> (together with patches v15). The resulting patch v16 is attached.

Pavel, thank you for you review, revisions and rebase.
We'll reconsider this once v17 is branched.

------
Regards,
Alexander Korotkov

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Michael Paquier

Date:

05 April 2023, 05:54:06

On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> Pavel, thank you for you review, revisions and rebase.
> We'll reconsider this once v17 is branched.

The patch was still in the current CF, so I have moved it to the next
one based on the latest updates.
--
Michael

Attachment

signature.asc

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Pavel Borisov

Date:

28 November 2023, 12:00:07

Hi, hackers!

> You're designing new APIs, days before the feature freeze.
On Wed, 5 Apr 2023 at 06:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> > Pavel, thank you for you review, revisions and rebase.
> > We'll reconsider this once v17 is branched.

I've looked through patches v16 once more and think they're good
enough, and previous issues are all addressed. I see that there is
nothing that blocks it from being committed except the last iteration
was days before v16 feature freeze.

Recently in another thread [1] Alexander posted a new version of
patches v16 (as 0001 and 0002) In 0001 only indenation, comments, and
commit messages changed from v16 in this thread. In 0002 new test
eval-plan-qual-2 was integrated into the existing eval-plan-qual test.
For maintaining the most recent versions in this thread I'm attaching
them under v17. I suppose that we can commit these patches to v17 if
there are no objections or additional reviews.

[1] https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com

Kind regards,
Pavel Borisov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

19 March, 18:20:18

Hi, Pavel!

On Tue, Nov 28, 2023 at 11:00 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > You're designing new APIs, days before the feature freeze.
> On Wed, 5 Apr 2023 at 06:54, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> > > Pavel, thank you for you review, revisions and rebase.
> > > We'll reconsider this once v17 is branched.
>
> I've looked through patches v16 once more and think they're good
> enough, and previous issues are all addressed. I see that there is
> nothing that blocks it from being committed except the last iteration
> was days before v16 feature freeze.
>
> Recently in another thread [1] Alexander posted a new version of
> patches v16 (as 0001 and 0002) In 0001 only indenation, comments, and
> commit messages changed from v16 in this thread. In 0002 new test
> eval-plan-qual-2 was integrated into the existing eval-plan-qual test.
> For maintaining the most recent versions in this thread I'm attaching
> them under v17. I suppose that we can commit these patches to v17 if
> there are no objections or additional reviews.
>
> [1] https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com

The new revision of patches is attached.

It has updated commit messages, new comments, and some variables were
renamed to be more consistent with surroundings.

I also think that all the design issues spoken before are resolved.
It would be nice to hear from Andres about this.

I'll continue rechecking these patches myself.

------
Regards,
Alexander Korotkov

Attachment

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

From

Alexander Korotkov

Date:

24 March, 04:12:11

On Tue, Mar 19, 2024 at 5:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Tue, Nov 28, 2023 at 11:00 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > You're designing new APIs, days before the feature freeze.
> > On Wed, 5 Apr 2023 at 06:54, Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> > > > Pavel, thank you for you review, revisions and rebase.
> > > > We'll reconsider this once v17 is branched.
> >
> > I've looked through patches v16 once more and think they're good
> > enough, and previous issues are all addressed. I see that there is
> > nothing that blocks it from being committed except the last iteration
> > was days before v16 feature freeze.
> >
> > Recently in another thread [1] Alexander posted a new version of
> > patches v16 (as 0001 and 0002) In 0001 only indenation, comments, and
> > commit messages changed from v16 in this thread. In 0002 new test
> > eval-plan-qual-2 was integrated into the existing eval-plan-qual test.
> > For maintaining the most recent versions in this thread I'm attaching
> > them under v17. I suppose that we can commit these patches to v17 if
> > there are no objections or additional reviews.
> >
> > [1]
https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
>
> The new revision of patches is attached.
>
> It has updated commit messages, new comments, and some variables were
> renamed to be more consistent with surroundings.
>
> I also think that all the design issues spoken before are resolved.
> It would be nice to hear from Andres about this.
>
> I'll continue rechecking these patches myself.

I've re-read this thread.  It still seems to me that the issues raised
before are addressed now.  Fingers crossed, I'm going to push this if
there are no objections.

------
Regards,
Alexander Korotkov