Re: BUG #17821: Assertion failed in heap_update() due to heap pruning - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #17821: Assertion failed in heap_update() due to heap pruning
Date
Msg-id 25c3399b-58a3-d727-7f97-93394fd1afa9@gmail.com
Whole thread Raw
In response to Re: BUG #17821: Assertion failed in heap_update() due to heap pruning  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17821: Assertion failed in heap_update() due to heap pruning
List pgsql-bugs
Hi,

04.03.2023 19:48, Andres Freund wrote:
> My first suspicion would be that we aren't holding a lock in the right moment, to enforce cache invalidation
processing.
>

I've encountered another case of that assertion failure (this time with pg_statistic):
numclients=10
for ((c=1;c<=numclients;c++)); do
   createdb db$c
done

for ((i=1;i<=50;i++)); do
   echo "iteration $i"

   for ((c=1;c<=numclients;c++)); do
     echo "
CREATE TABLE t(i int4, t1 text[], t2 text[]);
INSERT INTO t SELECT g,  ARRAY[repeat('x', g)], ARRAY[repeat('x', g)] FROM generate_series(1, 200) g;
ANALYZE t;
     " | psql -d db$c >/dev/null
   done

   for ((c=1;c<=numclients;c++)); do
     echo "
ANALYZE t;
     " | psql -d db$c >/dev/null &

     echo "
SELECT * FROM t WHERE i = 1;
ANALYZE t;
     " | psql -d db$c >/dev/null &
   done
   wait
   grep 'TRAP:' server.log && { break; }

   for ((c=1;c<=numclients;c++)); do
     echo "
DROP TABLE t; VACUUM pg_statistic;
     " | psql -d db$c >/dev/null
   done
done

...
iteration 13
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
connection to server was lost
TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3074, PID: 89697

Reproduced on REL_12_STABLE (starting from 5e0928005) .. master.
(Modifying pruneheap.c as in [1] not needed for this case.)

Here we have the following sequence of events:
Session 1
ANALYZE t; // needed just to produce material for pruning
     do_analyze_rel() -> update_attstats() -> CatalogTupleUpdateWithInfo() -> heap_update(ctid(18,16)); 
PrepareToInvalidateCacheTuple(hashvalue: {hash1})
                 Session 2:
                 SELECT * FROM t WHERE i = 1;
                     SearchCatCacheInternal(hashValue: {hash1})
                         SearchCatCacheMiss(hashValue: {hash1})
     ProcessInvalidationMessages()
     CatCacheInvalidate(hashValue: {hash1})
     CatCacheInvalidate(hashValue: {hash2})
                             CatCacheInvalidate(hashValue: {hash1})
                         SearchCatCacheMiss returns tid(18,16)

                 ANALYZE t;
                     do_analyze_rel() -> update_attstats()
                         attno=0 SearchSysCache3() -> SearchCatCacheInternal(hashValue: {hash2}) ->
SearchCatCacheMiss()
 
-> heap_page_prune(block: 18)
                         attno=1 SearchSysCache3() -> SearchCatCacheInternal(hashValue: {hash1}) -> oldtup =
tid(18,16)
                         CatalogTupleUpdateWithInfo() -> heap_update(tid(18, 16))

(A server.log with additional tracing messages for one of the failed runs
is attached.)

In other words, a backend can get some tuple id from catcache and then prune
the underlying page while searching for another (missing) entry in the cache
(any other reading of the page that may trigger pruning will do as well).
After that, an attempt to update that tuple might trigger the Assert().

[1] https://www.postgresql.org/message-id/17821-dd8c334263399284%40postgresql.org

Best regards,
Alexander
Attachment

pgsql-bugs by date:

Previous
From: Peter Smith
Date:
Subject: Re: [16+] subscription can end up in inconsistent state
Next
From: Laurenz Albe
Date:
Subject: Re: BUG #18149: Incorrect lexeme for english token "proxy"