Re: WAL record CRC calculated incorrectly because of underlying buffer modification - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: WAL record CRC calculated incorrectly because of underlying buffer modification |
Date | |
Msg-id | 0a2e14d3-fbc8-9e86-a968-e0c4a465e5b0@gmail.com Whole thread Raw |
In response to | Re: WAL record CRC calculated incorrectly because of underlying buffer modification (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
|
List | pgsql-hackers |
Hello Thomas and Andres, Thank you for looking at this! 11.05.2024 06:26, Thomas Munro wrote: > On Sat, May 11, 2024 at 3:57 AM Andres Freund <andres@anarazel.de> wrote: >> On 2024-05-10 16:00:01 +0300, Alexander Lakhin wrote: >>> and discovered that XLogRecordAssemble() calculates CRC over a buffer, >>> that might be modified by another process. >> If, with "might", you mean that it's legitimate for that buffer to be >> modified, I don't think so. The bug is that something is modifying the buffer >> despite it being exclusively locked. >> >> I.e. what we likely have here is a bug somewhere in the hash index code. > I don't have a good grip on the higher level locking protocols of > hash.c, but one microscopic thing jumps out: > > /* > * bucket buffer was not changed, but still needs to be > * registered to ensure that we can acquire a cleanup lock on > * it during replay. > */ > if (!xlrec.is_primary_bucket_page) > { > uint8 flags = REGBUF_STANDARD | > REGBUF_NO_IMAGE | REGBUF_NO_CHANGE; > > XLogRegisterBuffer(0, bucket_buf, flags); > } > > That registers a buffer that is pinned but not content-locked, and we > tell xloginsert.c not to copy its image into the WAL, but it does it > anyway because: > > /* > * If needs_backup is true or WAL checking is enabled for current > * resource manager, log a full-page write for the current block. > */ > include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0; > > So I guess it copies the image on dodo, which has: > > 'PG_TEST_EXTRA' => 'ssl ldap > kerberos wal_consistency_checking libpq_encryption xid_wraparound' > > Perhaps a no-image, no-change registered buffer should not be > including an image, even for XLR_CHECK_CONSISTENCY? It's actually > useless for consistency checking too I guess, this issue aside, > because it doesn't change anything so there is nothing to check. Yes, I think something wrong is here. I've reduced the reproducer to: cat << 'EOF' | psql CREATE TABLE hash_cleanup_heap(keycol INT); CREATE INDEX hash_cleanup_index on hash_cleanup_heap USING HASH (keycol); BEGIN; INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; ROLLBACK; EOF cat << 'EOF' | psql & INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; DROP TABLE hash_cleanup_heap; EOF cat << 'EOF' | psql & SELECT pg_sleep(random() / 20); VACUUM hash_cleanup_heap; EOF wait grep 'TRAP:' server.log ("wal_consistency_checking = all" and the xloginsert patch are still required) and with additional logging I see: 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|LOG: !!!hashbucketcleanup| scan page buf: 1832 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|CONTEXT: while vacuuming index "hash_cleanup_index" of relation "public.hash_cleanup_heap" 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|STATEMENT: VACUUM hash_cleanup_heap; 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|LOG: !!!_hash_doinsert| _hash_getbucketbuf_from_hashkey: 1822 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|STATEMENT: INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|LOG: !!!hashbucketcleanup| xlrec.is_primary_bucket_page: 0, buf: 1832, bucket_buf: 1822 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|CONTEXT: while vacuuming index "hash_cleanup_index" of relation "public.hash_cleanup_heap" 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96dd|STATEMENT: VACUUM hash_cleanup_heap; 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|LOG: !!!_hash_doinsert| _hash_relbuf(rel, 1822) 2024-05-11 03:45:23.424 UTC|law|regression|663ee9d3.1f96de|STATEMENT: INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i; TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: "xloginsert.c", Line: 949, PID: 2070237 Best regards, Alexander
pgsql-hackers by date: