intra-grant-inplace-db.spec got a novel failure today:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sarus&dt=2024-10-26%2014%3A08%3A58
The isolationtester_waiting query is supposed to detect that step vac2 is
blocked. vac2 didn't finish within the timeout, but isolationtester_waiting
never considered it blocked. Most-relevant postmaster.log lines:
===
2024-10-26 14:22:51.791 UTC [2444667:6] isolation/intra-grant-inplace-db/s1 LOG: statement: BEGIN;
2024-10-26 14:22:51.791 UTC [2444667:7] isolation/intra-grant-inplace-db/s1 LOG: statement:
GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee;
2024-10-26 14:22:51.791 UTC [2444668:6] isolation/intra-grant-inplace-db/s2 LOG: statement: VACUUM (FREEZE);
2024-10-26 14:22:51.813 UTC [2444663:1] LOG: skipping analyze of "pg_attribute" --- lock not available
2024-10-26 14:22:51.815 UTC [2444666:7] isolation/intra-grant-inplace-db/control connection LOG: execute
isolationtester_waiting:SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{2444667,2444668,2444669}')
2024-10-26 14:22:51.815 UTC [2444666:8] isolation/intra-grant-inplace-db/control connection DETAIL: Parameters: $1 =
'2444668'
[... omitting most other lines from PID 2444666: >37000 isolationtester_waiting calls ...]
2024-10-26 14:22:51.851 UTC [2444663:2] LOG: skipping analyze of "pg_class" --- lock not available
2024-10-26 14:22:51.892 UTC [2444663:3] LOG: skipping analyze of "pg_depend" --- lock not available
2024-10-26 14:23:52.833 UTC [2444663:4] ERROR: canceling autovacuum task
2024-10-26 14:23:52.833 UTC [2444663:5] CONTEXT: while updating tuple (0,20) in relation "pg_database"
2024-10-26 14:24:52.768 UTC [2447791:1] ERROR: canceling autovacuum task
2024-10-26 14:24:52.768 UTC [2447791:2] CONTEXT: while updating tuple (0,20) in relation "pg_database"
2024-10-26 14:25:52.777 UTC [2451018:1] ERROR: canceling autovacuum task
2024-10-26 14:25:52.777 UTC [2451018:2] CONTEXT: while updating tuple (0,20) in relation "pg_database"
2024-10-26 14:26:52.789 UTC [2454071:1] ERROR: canceling autovacuum task
2024-10-26 14:26:52.789 UTC [2454071:2] CONTEXT: while updating tuple (0,20) in relation "pg_database"
2024-10-26 14:27:16.322 UTC [2442306:1] LOG: checkpoint starting: time
2024-10-26 14:27:52.798 UTC [2457311:1] ERROR: canceling autovacuum task
2024-10-26 14:27:52.798 UTC [2457311:2] CONTEXT: while updating tuple (0,20) in relation "pg_database"
2024-10-26 14:28:51.755 UTC [2444666:37889] isolation/intra-grant-inplace-db/control connection LOG: execute
isolationtester_waiting:SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{2444667,2444668,2444669}')
2024-10-26 14:28:51.755 UTC [2444666:37890] isolation/intra-grant-inplace-db/control connection DETAIL: Parameters: $1
='2444668'
2024-10-26 14:28:51.810 UTC [2463672:1] [unknown] LOG: connection received: host=[local]
2024-10-26 14:28:51.812 UTC [2444668:7] isolation/intra-grant-inplace-db/s2 ERROR: canceling statement due to user
request
2024-10-26 14:28:51.812 UTC [2444668:8] isolation/intra-grant-inplace-db/s2 CONTEXT: while scanning block 0 of
relation"pg_catalog.pg_database"
2024-10-26 14:28:51.812 UTC [2444668:9] isolation/intra-grant-inplace-db/s2 STATEMENT: VACUUM (FREEZE);
===
I would have expected these locking-relevant events:
- GRANT stamped a pg_database tuple with an xmax.
- auto-analyze sought to update the same tuple, so it acquired LOCKTAG_TUPLE
and entered XactLockTableWait(xmax-from-GRANT).
- Non-auto VACUUM from step vac2 blocked on acquiring LOCKTAG_TUPLE.
pg_isolation_test_session_is_blocked() considers this not blocked, since
blocking on autovacuum doesn't count for test output purposes.
- Deadlock detector notices auto-analyze is blocking vac2, hence "canceling
autovacuum task" after 1s.
- vac2 finishes its LOCKTAG_TUPLE acquisition and enters
XactLockTableWait(xmax-from-GRANT).
- pg_isolation_test_session_is_blocked() now considers vac2 blocked.
The timeout suggests some subset of that didn't happen. It's odd that a new
auto-analyze starts every minute, each of which exits due to blocking vac2. I
likely got something wrong in commit aac2c9b or its parent (2024-09-24). I'll
work on reproducing this.
Thanks,
nm