Re: More guidance on ctid - Mailing list pgsql-docs

From Bernice Southey
Subject Re: More guidance on ctid
Date
Msg-id CAEDh4nzDMoRDZMDGGxKHzJ5hZscmvT2FWioTzQ6A0aJVtCO8NQ@mail.gmail.com
Whole thread Raw
In response to Re: More guidance on ctid  (Bruce Momjian <bruce@momjian.us>)
List pgsql-docs
Bruce Momjian <bruce@momjian.us> wrote:
> I think the reporter above extrapolated this example to use ctid in
> other, non-appropriate cases.  Should we add a warning to the docs to
> explain that the general use of ctid is discouraged?

I've been explaining this badly. Here's an example that I hope will
make clear why I'm so concerned.

(Read Committed default isolation level)
drop table if exists b1, b2;
create table b1 as select 1 id, 100 balance;
create table b2 as select 1 id, 100 balance;

With an immutable column self-join, it works as expected and
serializes the balance changes.

--session1
begin;
with x as (select id from b1 order by id for update)
update b1 set balance = balance + 1 from x where b1.id = x.id;

--session2
with x as (select id from b1 order by id for update)
update b1 set balance = balance - 100 from x where b1.id = x.id;
select * from b1;

--session1
commit;

--session2
UPDATE 1
 id | balance
----+---------
  1 |       1
(1 row)

But with ctid, the second update is lost.

--session1
begin;
with x as (select ctid from b2 order by id for update)
update b2 set balance = balance + 1 from x where b2.ctid = x.ctid;

--session2
with x as (select ctid from b2 order by id for update)
update b2 set balance = balance - 100 from x where b2.ctid = x.ctid;
select * from b2;

--session1
commit;

--session2
UPDATE 0
 id | balance
----+---------
  1 |     101
(1 row)

The session2 CTE blocks until it gets the updated version of the row
from session 1 commit. It gets b2.ctid = (0,2) giving x.ctid = (0,2).
But the UPDATE gets b2.ctid = (0,1) at the start of the transaction.
This doesn't change even after session 1 commits. The join fails. This
is correct, but not obvious.

This is why I think the docs should say something different to what
they currently do. The UPDATE example is fine only because it's run
repeatedly until there are no more rows to find.

I learned the primary key self-join pattern for order by/limit in
delete/update soon after I began with postgres. I'm pretty sure ctid
would have confused me. So I think the doc examples with an id column
will be very helpful, and safer.

If users discover ctid on internet forums or LLMs and then check the
docs, I think they should get some notice of the concurrency risk.

Here's a patch with an attempt to do the above.

Forum examples:
https://stackoverflow.com/questions/10245560/deadlocks-in-postgresql-when-running-update/71163671#71163671
https://www.reddit.com/r/PostgreSQL/comments/1cn0q1c/comment/l36ppve/

Thanks, Bernice

Attachment

pgsql-docs by date:

Previous
From: Andrew Jackson
Date:
Subject: Add Restart=on-failure To Example Systemd File
Next
From: Daniel Gustafsson
Date:
Subject: restart point vs restartpoint in the docs