BUG #18942: walsender memory allocation failure adding snapshot and invalidations to logical replica w/PG 16.9 - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18942: walsender memory allocation failure adding snapshot and invalidations to logical replica w/PG 16.9 |
Date | |
Msg-id | 18942-0ab1e5ae156613ad@postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18942 Logged by: Laurence Parry Email address: greenreaper@hotmail.com PostgreSQL version: 16.9 Operating system: Debian Linux bookworm (customised 6.11.10 kernel) Description: We run a logical replication chain for the purpose of offloading certain expensive queries, backup generation, and disaster recovery. Initially PG 16.8 from the PostgreSQL project's DEB repo was used for both primary and the replica. The replica was upgraded to PG 16.9 without incident. Performance of our primary was degraded following its own upgrade a few days later. Logs showed the walsender cycling due to a failure to allocate 1.5GB of memory: ERROR,XX000,"invalid memory alloc request size 1627040720",,,,,,"START_REPLICATION SLOT ""replica"" LOGICAL 298B/CB420338 (proto_version '4', streaming 'parallel', origin 'any', publication_names '""replica""', binary 'true')",,"repalloc, mcxt.c:1487","replica","walsender",,0" This occurred a few seconds after: "logical decoding found consistent point at 298B/CABD75E0","Logical decoding will begin using saved snapshot.",,,,,"START_REPLICATION SLOT ""replica"" LOGICAL 298B/CB420338 (proto_version '4', streaming 'parallel', origin 'any', publication_names '""replica""', binary 'true')",,"SnapBuildRestore, snapbuild.c:2043","replica","walsender",,0 DEBUG3 log level gave these messages immediately prior to the memory allocation failure (same microsecond): "UpdateDecodingStats: updating stats 0x5610b1380120 0 0 0 1 1 264 1 621710620",,,,,,,,"UpdateDecodingStats, logical.c:1926" "found top level transaction 2650000463, with catalog changes",,,,,,,,"SnapBuildCommitTxn, snapbuild.c:1136" "adding a new snapshot and invalidations to 2650000421 at 298B/CB45EAA8",,,,,,,,"SnapBuildDistributeSnapshotAndInval, snapbuild.c:901" [MemoryContextAllocExtended seemingly used via palloc/rpalloc in e.g. ReorderBufferAddInvalidations normally only allows allocations of 1GB - this does not appear to be end-user tuneable. <https://github.com/postgres/postgres/blob/6aa33afe6da1ce5e6a9c68900727db4621f899cd/src/backend/utils/mmgr/mcxt.c#L1251> ] While debugging I ran out of time (SSD) to fix it and had to drop the replica subscription and VACUUM FULL the category tables (bloated 20-60x likely due to retained temp/unlogged tables). I saved log messages and the first two WAL files containing highlighted transactions. Today, after reindexing larger tables concurrently on the same 16.9 primary to free up space, I tried to rebuild the replica on 17.5 using the command originally used to build it [create subscription replica connection 'postgresql://app:pass@hostname:port/dbname?sslmode=verify-ca&connect_timeout=5&application_name=replica' publication replica with (binary, streaming = parallel)], and a copy of the schema from the 16.9 replica. Almost immediately I ran into the same memory allocation issue, but for 1.4GB instead ("invalid memory alloc request size 1512643104"), suggesting that the problem related to our workload/schema rather than some transitory error. I wonder if this issue relates to "Fix data loss in logical replication." <https://github.com/postgres/postgres/commit/4909b38af034fa4d2c67c5c71fd8509f870c1695> from April which was back-patched to 16 <https://github.com/postgres/postgres/commit/9a2f8b4f01d5a40b8b080f6b108ae314a6730cec> and triggers the "adding a new snapshot and invalidations" debug message immediately prior to the memory allocation failure. The WAL dump contains many long and repetitive catcache mentions. The workload includes both temporary and unlogged tables being created and destroyed on a regular basis (with a lifetime of ~15-30 minutes for the unlogged tables, roughly ~5,000 present at any time), used for generated gallery listings and search results. Probably not an ideal architecture; it's not something we can quickly change. The unlogged tables are not included in the publication, though a table registering them is. Exclusion of these cache tables and reduction of the resulting disk load on the HDD-based replica is one of the reasons logical replication is preferable over physical replication for our use-case, although we have used the latter before. A log of the walsender from an attempt to re-enable replication is available at: https://www.dropbox.com/scl/fi/bhhots6haujnqv74y88ut/logical-walsender.csv.xz?rlkey=jqd5ezihuuc0oyzigz2z1gb7a&st=lvrlknb7&dl=1 XID 2650000463 had 92 changes spilt to disk and then one other change spilt to disk with a different stats number: "adding a new snapshot and invalidations to 2650000463 at 298B/CB45DEB0" "spill 92 changes in XID 2650000463 to disk" "UpdateDecodingStats: updating stats 0x5610b1380120 1 1 621710356 0 0 0 0 0" [next two with temporary position 4/4693920] "spill 1 changes in XID 2650000463 to disk" "UpdateDecodingStats: updating stats 0x5610b1380120 0 1 264 0 0 0 0 0" "UpdateDecodingStats: updating stats 0x5610b1380120 0 0 0 1 1 264 1 621710620" The output of waldump from 298B/CB3868B0 / 2650000420 to 298B/CB46C470 / 2650000466 is available at https://www.dropbox.com/scl/fi/88zbmtikbwrkjoqtg5y4c/logical-waldump.txt.xz?rlkey=r1fry9gfzlzntyc84hip6dl6y&e=1&st=ztm97lu4&dl=1 Modified server variables from the primary: https://www.dropbox.com/scl/fi/0b3b00vr1hgm577t9sihj/logical-variables.txt?rlkey=7hxmbjzh5sk9d2ko6633mgkcj&st=n73e4y3p&dl=0 If any other information is desired, please let me know. Regards, -- Laurence Parry https://inkbunny.net
pgsql-bugs by date: