Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: logical decoding : exceeded maxAllocatedDescs for .spill files |
Date | |
Msg-id | 20200207193434.wkwne7d4wr5kebye@alap3.anarazel.de Whole thread Raw |
In response to | Re: logical decoding : exceeded maxAllocatedDescs for .spill files (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, On 2020-01-11 23:20:56 -0500, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Nah, don't think I believe that: the test inserts a bunch of tuples, > but they look like they will all be *exactly* the same size. > > CREATE TABLE decoding_test(x integer, y text); > ... > > FOR i IN 1..10 LOOP > BEGIN > INSERT INTO decoding_test(x) SELECT generate_series(1,5000); > EXCEPTION > when division_by_zero then perform 'dummy'; > END; I think the issue this triggers higher memory usage in in older versions is that before commit cec2edfa7859279f36d2374770ca920c59c73dd8 Author: Amit Kapila <akapila@postgresql.org> Date: 2019-11-16 17:49:33 +0530 Add logical_decoding_work_mem to limit ReorderBuffer memory usage. we enforced how many changes to keep in memory (vs on disk) /* * Maximum number of changes kept in memory, per transaction. After that, * changes are spooled to disk. * * The current value should be sufficient to decode the entire transaction * without hitting disk in OLTP workloads, while starting to spool to disk in * other workloads reasonably fast. * * At some point in the future it probably makes sense to have a more elaborate * resource management here, but it's not entirely clear what that would look * like. */ static const Size max_changes_in_memory = 4096; on a per-transaction basis. And that subtransactions are *different* transactions for that purpose (as they can be rolled back separately). As the test generates loads of records for different subtransactions, they each end up having quite a few changes (including the tuples pointed to!) in memory at the same time. Due to the way the limit of 4096 interacts with the 5000 rows inserted above, we only hit the out of memory error when loading. That's because when decoding (before the commit has been seen), we spill after 4096 changes: 2020-02-07 11:18:22.399 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 585 to disk 2020-02-07 11:18:22.419 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 586 to disk 2020-02-07 11:18:22.431 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 587 to disk 2020-02-07 11:18:22.443 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 588 to disk 2020-02-07 11:18:22.454 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 589 to disk 2020-02-07 11:18:22.465 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 590 to disk 2020-02-07 11:18:22.477 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 591 to disk 2020-02-07 11:18:22.488 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 592 to disk 2020-02-07 11:18:22.499 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 593 to disk 2020-02-07 11:18:22.511 PST [1136134][3/2] DEBUG: spill 4096 changes in XID 594 to disk so there's each 5000 - 4096 changes in memory, times 10. But when actually calling the output plugin (at the commit record), we start with loading changes back into memory from the start of each subtransaction. That first entails spilling the tail of that transaction to disk, and then loading the start: 2020-02-07 11:18:22.515 PST [1136134][3/2] DEBUG: StartSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGR,xid/subid/cid: 0/1/0 2020-02-07 11:18:22.515 PST [1136134][3/2] DEBUG: StartSubTransaction(2) name: replay; blockState: SUB BEGIN; state: INPROGR,xid/subid/cid: 0/2/0 2020-02-07 11:18:22.515 PST [1136134][3/2] DEBUG: spill 904 changes in XID 585 to disk 2020-02-07 11:18:22.524 PST [1136134][3/2] DEBUG: restored 4096 changes in XID 585 into memory 2020-02-07 11:18:22.524 PST [1136134][3/2] DEBUG: spill 904 changes in XID 586 to disk 2020-02-07 11:18:22.534 PST [1136134][3/2] DEBUG: restored 4096 changes in XID 586 into memory 2020-02-07 11:18:22.534 PST [1136134][3/2] DEBUG: spill 904 changes in XID 587 to disk 2020-02-07 11:18:22.544 PST [1136134][3/2] DEBUG: restored 4096 changes in XID 587 into memory 2020-02-07 11:18:22.544 PST [1136134][3/2] DEBUG: spill 904 changes in XID 588 to disk 2020-02-07 11:18:22.554 PST [1136134][3/2] DEBUG: restored 4096 changes in XID 588 into memory 2020-02-07 11:18:22.554 PST [1136134][3/2] DEBUG: spill 904 changes in XID 589 to disk TopMemoryContext: 161440 total in 7 blocks; 80240 free (68 chunks); 81200 used ... Because each transaction has 4096 changes in memory, we actually need more memory here than we did during the decoding phase, where all but the "current" subtransaction only have 5000 - 4096 changes in memory. If we instead change the test to insert 4096*2 - 1 tuples each, we run out of memory earlier: 2020-02-07 11:23:20.540 PST [1136134][3/12] DEBUG: spill 4096 changes in XID 610 to disk 2020-02-07 11:23:20.565 PST [1136134][3/12] DEBUG: spill 4096 changes in XID 611 to disk 2020-02-07 11:23:20.587 PST [1136134][3/12] DEBUG: spill 4096 changes in XID 612 to disk 2020-02-07 11:23:20.608 PST [1136134][3/12] DEBUG: spill 4096 changes in XID 613 to disk 2020-02-07 11:23:20.630 PST [1136134][3/12] DEBUG: spill 4096 changes in XID 614 to disk TopMemoryContext: 161440 total in 7 blocks; 79264 free (82 chunks); 82176 used ... 2020-02-07 11:23:20.655 PST [1136134][3/12] ERROR: out of memory 2020-02-07 11:23:20.655 PST [1136134][3/12] DETAIL: Failed on request of size 8208. 2020-02-07 11:23:20.655 PST [1136134][3/12] STATEMENT: SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL,NULL); The reason that the per-subxact enforcement of max_changes_in_memory isn't as noticable in 11 is that there we have the generational context. Which means that each of the 4096*10 tuples we have in memory doesn't allocate MaxHeapTupleSize, but instead something like ~30 bytes. I wonder if we, in the backbranches that don't have generation context, should just reduce the size of slab allocated tuples to be ~1024 bytes instead of MaxHeapTupleSize. That's an almost trivial change, as we already have to support tuples above that limit (in cases the oldtuple in an update/delete contains toasted columns that we "inlined"). POC for that attached. Greetings, Andres Freund
Attachment
pgsql-hackers by date: