Thread: existing row not found by SELECT ... WHERE CTID = ?

existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

24 May 2022, 14:44:15

We have a C-written program, written in ESQL/C, of our LMS where the logic
crawls with FETCH through a hit list and does UPDATE on some rows which
match certain condition. This works fine for thousands of rows every night,
but magically sometimes it fails. I have here the part of ESQL/C logs
when I was able to catch such a case:

The hit list for the FETCHes is built with:

[29858] [23.05.2022 23:11:06:419]: prepare_common on line 1825: name sid_d01buch; query: "SELECT ctid, * from d01buch
WHEREd01status = 4 "
 
[29858] [23.05.2022 23:11:06:419]: ecpg_execute on line 2026: query: declare d01buch_scr scroll cursor with hold for
SELECTctid, * from d01buch WHERE d01status = 4 ; with 0 parameter(s) on connection sisis
 

then the CURSOR d01buch_scr is FETCHed some 59537 times, some of the
rows were updated and here is the failing situation with a good FETCH of
the row:

...
[29858] [23.05.2022 23:21:21:842]: ecpg_process_output on line 2655: correctly got 1 tuples with 79 fields
[29858] [23.05.2022 23:21:21:842]: ecpg_get_data on line 2655: RESULT: (668486,20) offset: 19; array: no
[29858] [23.05.2022 23:21:21:842]: ecpg_get_data on line 2655: RESULT: 03904016                        offset: 1152;
array:no
 
...


but when a new CURSOR should be build to UPDATE the row based
on its CTID = (668486,20), the row could not be found:

[29858] [23.05.2022 23:21:21:843]: deallocate_one on line 2494: name hs_d01buch
[29858] [23.05.2022 23:21:21:843]: prepare_common on line 2494: name hs_d01buch; query    : "SELECT * FROM d01buch
WHEREctid = $1 FOR UPDATE"
 
[29858] [23.05.2022 23:21:21:843]: ecpg_execute on line 2526: query: declare hc_d01buch cursor for SELECT * FROM
d01buchWHERE ctid = $1 FOR UPDATE; with 1 parameter(s) on     connection sisis
 
[29858] [23.05.2022 23:21:21:844]: ecpg_execute on line 2526: using PQexecParams
[29858] [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)

...

and the first FETCH in the CURSOR hc_d01buch could not see any row:


104 [29858] [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no data found on  line 2531

Why is this? Thanks

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Tom Lane

Date:

24 May 2022, 14:47:11

Matthias Apitz <guru@unixarea.de> writes:
> We have a C-written program, written in ESQL/C, of our LMS where the logic
> crawls with FETCH through a hit list and does UPDATE on some rows which
> match certain condition. This works fine for thousands of rows every night,
> but magically sometimes it fails.

Maybe something else already updated the row since the cursor was opened?
That would change its CTID.

            regards, tom lane

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

24 May 2022, 15:42:26

El día martes, mayo 24, 2022 a las 10:47:11 -0400, Tom Lane escribió:

> Matthias Apitz <guru@unixarea.de> writes:
> > We have a C-written program, written in ESQL/C, of our LMS where the logic
> > crawls with FETCH through a hit list and does UPDATE on some rows which
> > match certain condition. This works fine for thousands of rows every night,
> > but magically sometimes it fails.
> 
> Maybe something else already updated the row since the cursor was opened?
> That would change its CTID.

If you compare the time when the CTID was read:

[29858] [23.05.2022 23:21:21:842]: ecpg_get_data on line 2655: RESULT: (668486,20) offset:
19; array: no

with the time when it was not found:

[29858] [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no data found on
line 2531

it's nearly imposible that our software could have done this. Maybe
VACUUM hits exactly this moment in time and row, but this sounds also like 5
good numbers in the lotery jackpot :-)

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Laurenz Albe

Date:

24 May 2022, 15:50:14

On Tue, 2022-05-24 at 16:44 +0200, Matthias Apitz wrote:
> We have a C-written program, written in ESQL/C, of our LMS where the logic
> crawls with FETCH through a hit list and does UPDATE on some rows which
> match certain condition. This works fine for thousands of rows every night,
> but magically sometimes it fails. I have here the part of ESQL/C logs
> when I was able to catch such a case:
> 
> The hit list for the FETCHes is built with:
> 
> [29858] [23.05.2022 23:11:06:419]: prepare_common on line 1825: name sid_d01buch; query: "SELECT ctid, * from d01buch
WHEREd01status = 4 "
 
> [29858] [23.05.2022 23:11:06:419]: ecpg_execute on line 2026: query: declare d01buch_scr scroll cursor with hold for
SELECTctid, * from d01buch WHERE d01status = 4 ; with 0 parameter(s) on
 
> connection sisis
> 
> then the CURSOR d01buch_scr is FETCHed some 59537 times, some of the
> rows were updated and here is the failing situation with a good FETCH of
> the row:

I cannot understand many of your log messages, but it seems quite clear that
you are declaring a WITH HOLD cursor.

Such cursors are materialized when the transactoin that creates the cursor
commits, so the result set is "frozen" and does no longer reflect the current
state of the table.

It may well be that somebody deleted or updated a few rows between the time
the cursor was materialized and the time the 50000th row was fetched.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Tom Lane

Date:

24 May 2022, 16:11:49

Laurenz Albe <laurenz.albe@cybertec.at> writes:
> It may well be that somebody deleted or updated a few rows between the time
> the cursor was materialized and the time the 50000th row was fetched.

Even without HOLD, a cursor will return a view of the data as it stood
when the cursor was opened, just as a plain SELECT does.  There is
*plenty* of time for another session to get in there if you've been
groveling through 50K records one at a time.

            regards, tom lane

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

25 May 2022, 09:21:41

El día martes, mayo 24, 2022 a las 12:11:49 -0400, Tom Lane escribió:

> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > It may well be that somebody deleted or updated a few rows between the time
> > the cursor was materialized and the time the 50000th row was fetched.
> 
> Even without HOLD, a cursor will return a view of the data as it stood
> when the cursor was opened, just as a plain SELECT does.  There is
> *plenty* of time for another session to get in there if you've been
> groveling through 50K records one at a time.

Tom, Thanks for pointing us in the right direction where to look for a
solution. The CURSOR was opened around 23:11 pm and the CTID not found
at 23:21 pm, i.e. ten minutes later. This piece of software does every
night some housekeeping work in the circulation area of our LMS (Library
Management System) and is meant to run as a standalone job (only one
process after the other). We're trying to figure out with the customer if something
else was started/running at this time between 23:11 and 23:21, to shut this
off in the future. Is it possible that the PostgreSQL 13.1 server does
something by its own to invalidate the rowid?

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Laurenz Albe

Date:

25 May 2022, 10:51:02

On Wed, 2022-05-25 at 11:21 +0200, Matthias Apitz wrote:
> Is it possible that the PostgreSQL 13.1 server does something by its own to invalidate the rowid?

No.  PostgreSQL may remove a dead row, but a dead row is by definition
no longer visible, so it wouldn't be found by a query.

Yours,
Laurenz Albe

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

25 May 2022, 12:01:12

El día Mittwoch, Mai 25, 2022 a las 12:51:02 +0200, Laurenz Albe escribió:

> On Wed, 2022-05-25 at 11:21 +0200, Matthias Apitz wrote:
> > Is it possible that the PostgreSQL 13.1 server does something by its own to invalidate the rowid?
> 
> No.  PostgreSQL may remove a dead row, but a dead row is by definition
> no longer visible, so it wouldn't be found by a query.

We will solve the problem now with setting the session after connect to

   SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;

(with an appropriate ESQL/C call). Any comments?

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Christoph Moench-Tegeder

Date:

25 May 2022, 12:27:50

## Matthias Apitz (guru@unixarea.de):

> We will solve the problem now with setting the session after connect to
> 
>    SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> 
> (with an appropriate ESQL/C call). Any comments?

Maybe the real question is whether it is wise to use an implementation
artifact (ctid) to identify rows?
The textbook approach could be row locks (SELECT ... FOR SHARE/UPDATE and
variants) to prevent concurrent changes or optimistic locking  (and a
primary key in any case) - but maybe you already investigated those options?

Regards,
Christoph

-- 
Spare Space

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Laurenz Albe

Date:

25 May 2022, 13:22:39

On Wed, 2022-05-25 at 14:27 +0200, Christoph Moench-Tegeder wrote:
> ## Matthias Apitz (guru@unixarea.de):
> 
> > We will solve the problem now with setting the session after connect to
> > 
> >    SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> > 
> > (with an appropriate ESQL/C call). Any comments?
> 
> Maybe the real question is whether it is wise to use an implementation
> artifact (ctid) to identify rows?
> The textbook approach could be row locks (SELECT ... FOR SHARE/UPDATE and
> variants) to prevent concurrent changes or optimistic locking  (and a
> primary key in any case) - but maybe you already investigated those options?

Right.

REPEATABLE READ won't help you there.  True, you will see a stable snapshot
of the database inside a single transaction, but if a concurrent session has
modified the row, you will get a serialization error.
So that is not a solution.

Yours,
Laurenz Albe

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Ravi Krishna

Date:

25 May 2022, 14:30:16

> No.  PostgreSQL may remove a dead row, but a dead row is by definition
> no longer visible, so it wouldn't be found by a query.

I am wondering whether it is a good practice to use CTID in a where 
clause.  years ago when I use to code in Informix, using ROWID as a 
generic substitute for primary key was discouraged precisely for the 
same reason as described here for CTID. Sometimes rowid can change under 
concurrent updates.

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Tom Lane

Date:

25 May 2022, 14:38:24

Ravi Krishna <srkrishna@vivaldi.net> writes:
>> No.  PostgreSQL may remove a dead row, but a dead row is by definition
>> no longer visible, so it wouldn't be found by a query.

> I am wondering whether it is a good practice to use CTID in a where 
> clause.

It's fine if part of your business logic is that you don't want to allow
concurrent updates.  In this case, the OP seems to want to prevent rather
than tolerate the concurrent update, so I don't think he needs to revisit
the app's use of CTID.

If you do need to support concurrent updates, then yeah relying on CTID
is likely to be problematic.

            regards, tom lane

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

26 May 2022, 15:12:44

El día Wednesday, May 25, 2022 a las 11:21:44AM +0200, Matthias Apitz escribió:

> El día martes, mayo 24, 2022 a las 12:11:49 -0400, Tom Lane escribió:
> 
> > Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > > It may well be that somebody deleted or updated a few rows between the time
> > > the cursor was materialized and the time the 50000th row was fetched.
> > 
> > Even without HOLD, a cursor will return a view of the data as it stood
> > when the cursor was opened, just as a plain SELECT does.  There is
> > *plenty* of time for another session to get in there if you've been
> > groveling through 50K records one at a time.
> 
> Tom, Thanks for pointing us in the right direction where to look for a
> solution. The CURSOR was opened around 23:11 pm and the CTID not found
> at 23:21 pm, i.e. ten minutes later. This piece of software does every
> night some housekeeping work in the circulation area of our LMS (Library
> Management System) and is meant to run as a standalone job (only one
> process after the other). We're trying to figure out with the customer if something
> else was started/running at this time between 23:11 and 23:21, to shut this
> off in the future. ...

Is there any way to get with the old CTID to the row, for example with
the old CTID to the new one which the row now has after the update of the row?

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

"David G. Johnston"

Date:

26 May 2022, 15:24:12

On Thursday, May 26, 2022, Matthias Apitz <guru@unixarea.de> wrote:

Is there any way to get with the old CTID to the row, for example with
the old CTID to the new one which the row now has after the update of the row?

No, there is no link between old and new in the main table.

David J.

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Tom Lane

Date:

26 May 2022, 16:02:35

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thursday, May 26, 2022, Matthias Apitz <guru@unixarea.de> wrote:
>> Is there any way to get with the old CTID to the row, for example with
>> the old CTID to the new one which the row now has after the update of the
>> row?

> No, there is no link between old and new in the main table.

There is a forward link from the old CTID to the new, but we don't
provide any user-accessible way to use it.  I wonder though if the
OP should be considering using SELECT FOR UPDATE in his cursor,
so that it'd automatically chain up to the newest row version.

            regards, tom lane

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

27 May 2022, 16:36:36

El día jueves, mayo 26, 2022 a las 12:02:35 -0400, Tom Lane escribió:

> I wonder though if the
> OP should be considering using SELECT FOR UPDATE in his cursor,
> so that it'd automatically chain up to the newest row version.

I diged into this but the CURSOR are all declared WITH HOLD and this
conflicts with FOR UPDATE. And this is not easy to change in our generic
generated DB-layer for all the ~400 tables.

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

31 May 2022, 06:36:55

El día miércoles, mayo 25, 2022 a las 10:38:24a. m. -0400, Tom Lane escribió:

> Ravi Krishna <srkrishna@vivaldi.net> writes:
> >> No.  PostgreSQL may remove a dead row, but a dead row is by definition
> >> no longer visible, so it wouldn't be found by a query.
> 
> > I am wondering whether it is a good practice to use CTID in a where 
> > clause.
> 
> It's fine if part of your business logic is that you don't want to allow
> concurrent updates.  In this case, the OP seems to want to prevent rather
> than tolerate the concurrent update, so I don't think he needs to revisit
> the app's use of CTID.
> 
> If you do need to support concurrent updates, then yeah relying on CTID
> is likely to be problematic.

Tom, we detected another issue of missing a row by its CTID in another table
where fees are stored which one(!) process cumulates in the night. The time
window between creating the CURSOR and missing the CTID is only 42
seconds and I can not imagine that any other concurrent process is updating
such fee rows at midnight. Could exist any other reason why a row changes
its CTID? Full VACUUM is not used either.

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

Re: existing row not found by SELECT ... WHERE CTID = ?

From

Matthias Apitz

Date:

31 May 2022, 18:03:25

Analizing our logs in more detail, we got to know that there are cases
where, after updating a row with something like:

     EXEC SQL UPDATE d01buch SET
                d01gsi =:d01gsi,
                d01ex  =:d01ex,
                ...
                d01vldate =:d01vldate
     WHERE CURRENT OF hc_d01buch;

we look-up the same row later again with its old CTID in :rowid; one solution
of it could be right after the UPDATE look-up the new CTID with:

     EXEC SQL SELECT ctid INTO :rowid FROM d01buch WHERE CURRENT OF hc_d01buch;

i.e. using the same CURSOR which was used for the UPDATE again for a
SELECT for the CTID. Can this work?

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub