heap_inplace_lock vs. autovacuum w/ LOCKTAG_TUPLE - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | heap_inplace_lock vs. autovacuum w/ LOCKTAG_TUPLE |
Date | |
Msg-id | 20241026184936.ae.nmisch@google.com Whole thread Raw |
List | pgsql-hackers |
intra-grant-inplace-db.spec got a novel failure today: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sarus&dt=2024-10-26%2014%3A08%3A58 The isolationtester_waiting query is supposed to detect that step vac2 is blocked. vac2 didn't finish within the timeout, but isolationtester_waiting never considered it blocked. Most-relevant postmaster.log lines: === 2024-10-26 14:22:51.791 UTC [2444667:6] isolation/intra-grant-inplace-db/s1 LOG: statement: BEGIN; 2024-10-26 14:22:51.791 UTC [2444667:7] isolation/intra-grant-inplace-db/s1 LOG: statement: GRANT TEMP ON DATABASE isolation_regression TO regress_temp_grantee; 2024-10-26 14:22:51.791 UTC [2444668:6] isolation/intra-grant-inplace-db/s2 LOG: statement: VACUUM (FREEZE); 2024-10-26 14:22:51.813 UTC [2444663:1] LOG: skipping analyze of "pg_attribute" --- lock not available 2024-10-26 14:22:51.815 UTC [2444666:7] isolation/intra-grant-inplace-db/control connection LOG: execute isolationtester_waiting:SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{2444667,2444668,2444669}') 2024-10-26 14:22:51.815 UTC [2444666:8] isolation/intra-grant-inplace-db/control connection DETAIL: Parameters: $1 = '2444668' [... omitting most other lines from PID 2444666: >37000 isolationtester_waiting calls ...] 2024-10-26 14:22:51.851 UTC [2444663:2] LOG: skipping analyze of "pg_class" --- lock not available 2024-10-26 14:22:51.892 UTC [2444663:3] LOG: skipping analyze of "pg_depend" --- lock not available 2024-10-26 14:23:52.833 UTC [2444663:4] ERROR: canceling autovacuum task 2024-10-26 14:23:52.833 UTC [2444663:5] CONTEXT: while updating tuple (0,20) in relation "pg_database" 2024-10-26 14:24:52.768 UTC [2447791:1] ERROR: canceling autovacuum task 2024-10-26 14:24:52.768 UTC [2447791:2] CONTEXT: while updating tuple (0,20) in relation "pg_database" 2024-10-26 14:25:52.777 UTC [2451018:1] ERROR: canceling autovacuum task 2024-10-26 14:25:52.777 UTC [2451018:2] CONTEXT: while updating tuple (0,20) in relation "pg_database" 2024-10-26 14:26:52.789 UTC [2454071:1] ERROR: canceling autovacuum task 2024-10-26 14:26:52.789 UTC [2454071:2] CONTEXT: while updating tuple (0,20) in relation "pg_database" 2024-10-26 14:27:16.322 UTC [2442306:1] LOG: checkpoint starting: time 2024-10-26 14:27:52.798 UTC [2457311:1] ERROR: canceling autovacuum task 2024-10-26 14:27:52.798 UTC [2457311:2] CONTEXT: while updating tuple (0,20) in relation "pg_database" 2024-10-26 14:28:51.755 UTC [2444666:37889] isolation/intra-grant-inplace-db/control connection LOG: execute isolationtester_waiting:SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{2444667,2444668,2444669}') 2024-10-26 14:28:51.755 UTC [2444666:37890] isolation/intra-grant-inplace-db/control connection DETAIL: Parameters: $1 ='2444668' 2024-10-26 14:28:51.810 UTC [2463672:1] [unknown] LOG: connection received: host=[local] 2024-10-26 14:28:51.812 UTC [2444668:7] isolation/intra-grant-inplace-db/s2 ERROR: canceling statement due to user request 2024-10-26 14:28:51.812 UTC [2444668:8] isolation/intra-grant-inplace-db/s2 CONTEXT: while scanning block 0 of relation"pg_catalog.pg_database" 2024-10-26 14:28:51.812 UTC [2444668:9] isolation/intra-grant-inplace-db/s2 STATEMENT: VACUUM (FREEZE); === I would have expected these locking-relevant events: - GRANT stamped a pg_database tuple with an xmax. - auto-analyze sought to update the same tuple, so it acquired LOCKTAG_TUPLE and entered XactLockTableWait(xmax-from-GRANT). - Non-auto VACUUM from step vac2 blocked on acquiring LOCKTAG_TUPLE. pg_isolation_test_session_is_blocked() considers this not blocked, since blocking on autovacuum doesn't count for test output purposes. - Deadlock detector notices auto-analyze is blocking vac2, hence "canceling autovacuum task" after 1s. - vac2 finishes its LOCKTAG_TUPLE acquisition and enters XactLockTableWait(xmax-from-GRANT). - pg_isolation_test_session_is_blocked() now considers vac2 blocked. The timeout suggests some subset of that didn't happen. It's odd that a new auto-analyze starts every minute, each of which exits due to blocking vac2. I likely got something wrong in commit aac2c9b or its parent (2024-09-24). I'll work on reproducing this. Thanks, nm
pgsql-hackers by date: