Thread: cache lookup failed when \d t concurrent with DML change column data type
hi. I think I found a bug. PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by gcc-14.1.0, 64-bit commit at 45188c2ea2. Ubuntu 22.04.4 LTS setup: drop table t cascade; create table t(a int PRIMARY key); IN session1: step "change data type" {begin; alter table t alter column a set data type int4;} step "s1" {commit;} IN session2: step "psql_another_session" {\d t} permutation "change data type" "psql_another_session" "s1" \set ECHO_HIDDEN on /******** QUERY *********/ SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered, i.indisvalid, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true), pg_catalog.pg_get_constraintdef(con.oid, true), contype, condeferrable, condeferred, i.indisreplident, c2.reltablespace, con.conperiod FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND conindid = i.indexrelid AND contype IN ('p','u','x')) WHERE c.oid = '34405' AND c.oid = i.indrelid AND i.indexrelid = c2.oid ORDER BY i.indisprimary DESC, c2.relname; /************************/ ERROR: cache lookup failed for attribute 1 of relation 34418
Re: cache lookup failed when \d t concurrent with DML change column data type
From
Andrei Lepikhov
Date:
On 10/24/24 22:30, jian he wrote: > hi. I think I found a bug. > PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by > gcc-14.1.0, 64-bit > commit at 45188c2ea2. > Ubuntu 22.04.4 LTS > > > setup: > drop table t cascade; > create table t(a int PRIMARY key); > > IN session1: > step "change data type" {begin; alter table t alter column a set data > type int4;} > step "s1" {commit;} > > IN session2: > step "psql_another_session" {\d t} > > permutation "change data type" "psql_another_session" "s1" > ERROR: cache lookup failed for attribute 1 of relation 34418 Yes, it looks like a bug existing for a long time, at least since PG11 (I didn't trace further down). It seems that the backend didn't apply invalidation messages before touching system caches. Backtrace: in get_attoptions (relid=16388, attnum=1) at lsyscache.c:982 in pg_get_indexdef_worker (indexrelid=16388, colno=0, excludeOps=0x0, attrsOnly=false, keysOnly=false, showTblSpc=false, inherits=false, prettyFlags=7, missing_ok=true) at ruleutils.c:1458 in pg_get_indexdef_ext (fcinfo=0x55a15acc1c18) at ruleutils.c:1202 in ExecInterpExpr (state=0x55a15acc2a10, econtext=0x55a15ac62930, isnull=0x7fffd66a5bcf) at execExprInterp.c:770 in ExecInterpExprStillValid (state=0x55a15acc2a10, econtext=0x55a15ac62930, isNull=0x7fffd66a5bcf) at execExprInterp.c:2035 in ExecEvalExprSwitchContext (state=0x55a15acc2a10, econtext=0x55a15ac62930, isNull=0x7fffd66a5bcf) at ../../../src/include/executor/executor.h:367 in ExecProject (projInfo=0x55a15acc2a08) at ../../../src/include/executor/executor.h:401 -- regards, Andrei Lepikhov
Re: cache lookup failed when \d t concurrent with DML change column data type
From
Andrei Lepikhov
Date:
On 10/25/24 10:05, Andrei Lepikhov wrote: > On 10/24/24 22:30, jian he wrote: >> hi. I think I found a bug. >> PostgreSQL 18devel_debug_build_45188c2ea2 on x86_64-linux, compiled by >> gcc-14.1.0, 64-bit >> commit at 45188c2ea2. >> Ubuntu 22.04.4 LTS >> >> >> setup: >> drop table t cascade; >> create table t(a int PRIMARY key); >> >> IN session1: >> step "change data type" {begin; alter table t alter column a set data >> type int4;} >> step "s1" {commit;} >> >> IN session2: >> step "psql_another_session" {\d t} >> >> permutation "change data type" "psql_another_session" "s1" > >> ERROR: cache lookup failed for attribute 1 of relation 34418 > Yes, it looks like a bug existing for a long time, at least since PG11 > (I didn't trace further down). > It seems that the backend didn't apply invalidation messages before > touching system caches. Backtrace: After a short discovery, I found the origins: The pg_get_indexdef has an incoming index oid and gets all the stuff needed just by looking up sys-caches. But it wants to build a list of relation column names at a specific moment and opens the heap relation. After that operation, we already have syscaches updated and the old index oid replaced with the new one. It may be have made sense to lock the row of replaced index in pg_class and pg_index until the transaction, altered it will be commmitted. But, because ALTER TABLE is not fully MVCC-safe, it may be expected (or acceptable) behaviour. -- regards, Andrei Lepikhov
Re: cache lookup failed when \d t concurrent with DML change column data type
From
Kirill Reshke
Date:
On Fri, 25 Oct 2024 at 09:51, Andrei Lepikhov <lepihov@gmail.com> wrote: > > It may be have made sense to lock the row of replaced index in pg_class > and pg_index until the transaction, altered it will be commmitted. But, > because ALTER TABLE is not fully MVCC-safe, it may be expected (or > acceptable) behaviour. I suspect this is the case. If that is, should be reflect it in the doc? -- Best regards, Kirill Reshke
Re: cache lookup failed when \d t concurrent with DML change column data type
From
Andrei Lepikhov
Date:
On 10/25/24 14:15, Kirill Reshke wrote: > On Fri, 25 Oct 2024 at 09:51, Andrei Lepikhov <lepihov@gmail.com> wrote: >> >> It may be have made sense to lock the row of replaced index in pg_class >> and pg_index until the transaction, altered it will be commmitted. But, >> because ALTER TABLE is not fully MVCC-safe, it may be expected (or >> acceptable) behaviour. > > I suspect this is the case. If that is, should be reflect it in the doc? We already have the doc entry on such cases [1]. This is a suitable place to change if someone wants to detail this 'failed cache lookup' case. [1] https://www.postgresql.org/docs/16/mvcc-caveats.html -- regards, Andrei Lepikhov