Thread: cache lookup failed when \d t concurrent with DML change column data type

hi. I think I found a bug.
PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by
gcc-14.1.0, 64-bit
commit at 45188c2ea2.
Ubuntu 22.04.4 LTS


setup:
drop table t cascade;
create table t(a int PRIMARY key);

IN session1:
step "change data type" {begin; alter table t alter column a set data
type int4;}
step "s1" {commit;}

IN session2:
step "psql_another_session" {\d t}

permutation "change data type" "psql_another_session" "s1"


 \set ECHO_HIDDEN on
/******** QUERY *********/
SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered,
i.indisvalid, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
  pg_catalog.pg_get_constraintdef(con.oid, true), contype,
condeferrable, condeferred, i.indisreplident, c2.reltablespace,
con.conperiod
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
  LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND
conindid = i.indexrelid AND contype IN ('p','u','x'))
WHERE c.oid = '34405' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
ORDER BY i.indisprimary DESC, c2.relname;
/************************/

ERROR:  cache lookup failed for attribute 1 of relation 34418



On 10/24/24 22:30, jian he wrote:
> hi. I think I found a bug.
> PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by
> gcc-14.1.0, 64-bit
> commit at 45188c2ea2.
> Ubuntu 22.04.4 LTS
> 
> 
> setup:
> drop table t cascade;
> create table t(a int PRIMARY key);
> 
> IN session1:
> step "change data type" {begin; alter table t alter column a set data
> type int4;}
> step "s1" {commit;}
> 
> IN session2:
> step "psql_another_session" {\d t}
> 
> permutation "change data type" "psql_another_session" "s1"

> ERROR:  cache lookup failed for attribute 1 of relation 34418
Yes, it looks like a bug existing for a long time, at least since PG11 
(I didn't trace further down).
It seems that the backend didn't apply invalidation messages before 
touching system caches. Backtrace:

in get_attoptions (relid=16388, attnum=1) at lsyscache.c:982
in pg_get_indexdef_worker (indexrelid=16388, colno=0, excludeOps=0x0, 
attrsOnly=false, keysOnly=false, showTblSpc=false, inherits=false, 
prettyFlags=7, missing_ok=true) at ruleutils.c:1458
in pg_get_indexdef_ext (fcinfo=0x55a15acc1c18) at ruleutils.c:1202
in ExecInterpExpr (state=0x55a15acc2a10, econtext=0x55a15ac62930, 
isnull=0x7fffd66a5bcf) at execExprInterp.c:770
in ExecInterpExprStillValid (state=0x55a15acc2a10, 
econtext=0x55a15ac62930, isNull=0x7fffd66a5bcf) at execExprInterp.c:2035
in ExecEvalExprSwitchContext (state=0x55a15acc2a10, 
econtext=0x55a15ac62930, isNull=0x7fffd66a5bcf) at 
../../../src/include/executor/executor.h:367
in ExecProject (projInfo=0x55a15acc2a08) at 
../../../src/include/executor/executor.h:401

-- 
regards, Andrei Lepikhov




On 10/25/24 10:05, Andrei Lepikhov wrote:
> On 10/24/24 22:30, jian he wrote:
>> hi. I think I found a bug.
>> PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by
>> gcc-14.1.0, 64-bit
>> commit at 45188c2ea2.
>> Ubuntu 22.04.4 LTS
>>
>>
>> setup:
>> drop table t cascade;
>> create table t(a int PRIMARY key);
>>
>> IN session1:
>> step "change data type" {begin; alter table t alter column a set data
>> type int4;}
>> step "s1" {commit;}
>>
>> IN session2:
>> step "psql_another_session" {\d t}
>>
>> permutation "change data type" "psql_another_session" "s1"
> 
>> ERROR:  cache lookup failed for attribute 1 of relation 34418
> Yes, it looks like a bug existing for a long time, at least since PG11 
> (I didn't trace further down).
> It seems that the backend didn't apply invalidation messages before 
> touching system caches. Backtrace:
After a short discovery, I found the origins:
The pg_get_indexdef has an incoming index oid and gets all the stuff 
needed just by looking up sys-caches. But it wants to build a list of 
relation column names at a specific moment and opens the heap relation. 
After that operation, we already have syscaches updated and the old 
index oid replaced with the new one.
It may be have made sense to lock the row of replaced index in  pg_class 
and pg_index until the transaction, altered it will be  commmitted. But, 
because ALTER TABLE is not fully MVCC-safe, it may be  expected (or 
acceptable) behaviour.

-- 
regards, Andrei Lepikhov




On Fri, 25 Oct 2024 at 09:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> It may be have made sense to lock the row of replaced index in  pg_class
> and pg_index until the transaction, altered it will be  commmitted. But,
> because ALTER TABLE is not fully MVCC-safe, it may be  expected (or
> acceptable) behaviour.

I suspect this is the case. If that is, should be reflect it in the doc?


-- 
Best regards,
Kirill Reshke



On 10/25/24 14:15, Kirill Reshke wrote:
> On Fri, 25 Oct 2024 at 09:51, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>
>> It may be have made sense to lock the row of replaced index in  pg_class
>> and pg_index until the transaction, altered it will be  commmitted. But,
>> because ALTER TABLE is not fully MVCC-safe, it may be  expected (or
>> acceptable) behaviour.
> 
> I suspect this is the case. If that is, should be reflect it in the doc?
We already have the doc entry on such cases [1]. This is a suitable 
place to change if someone wants to detail this 'failed cache lookup' case.

[1] https://www.postgresql.org/docs/16/mvcc-caveats.html

-- 
regards, Andrei Lepikhov