Re: Using per-transaction memory contexts for storing decoded tuples - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Using per-transaction memory contexts for storing decoded tuples |
Date | |
Msg-id | CAD21AoCXcU=3bcn0zRypokFm8EcMq0tnA67irt33KtkW3ApaAg@mail.gmail.com Whole thread Raw |
In response to | Re: Using per-transaction memory contexts for storing decoded tuples (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Tue, Sep 17, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 16, 2024 at 10:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Sep 13, 2024 at 3:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Sep 12, 2024 at 4:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > We have several reports that logical decoding uses memory much more > > > > than logical_decoding_work_mem[1][2][3]. For instance in one of the > > > > reports[1], even though users set logical_decoding_work_mem to > > > > '256MB', a walsender process was killed by OOM because of using more > > > > than 4GB memory. > > > > > > > > As per the discussion in these threads so far, what happened was that > > > > there was huge memory fragmentation in rb->tup_context. > > > > rb->tup_context uses GenerationContext with 8MB memory blocks. We > > > > cannot free memory blocks until all memory chunks in the block are > > > > freed. If there is a long-running transaction making changes, its > > > > changes could be spread across many memory blocks and we end up not > > > > being able to free memory blocks unless the long-running transaction > > > > is evicted or completed. Since we don't account fragmentation, block > > > > header size, nor chunk header size into per-transaction memory usage > > > > (i.e. txn->size), rb->size could be less than > > > > logical_decoding_work_mem but the actual allocated memory in the > > > > context is much higher than logical_decoding_work_mem. > > > > > > > > > > It is not clear to me how the fragmentation happens. Is it because of > > > some interleaving transactions which are even ended but the memory > > > corresponding to them is not released? > > > > In a generation context, we can free a memory block only when all > > memory chunks there are freed. Therefore, individual tuple buffers are > > already pfree()'ed but the underlying memory blocks are not freed. > > > > I understood this part but didn't understand the cases leading to this > problem. For example, if there is a large (and only) transaction in > the system that allocates many buffers for change records during > decoding, in the end, it should free memory for all the buffers > allocated in the transaction. So, wouldn't that free all the memory > chunks corresponding to the memory blocks allocated? My guess was that > we couldn't free all the chunks because there were small interleaving > transactions that allocated memory but didn't free it before the large > transaction ended. We haven't actually checked with the person who reported the problem, so this is just a guess, but I think there were concurrent transactions, including long-running INSERT transactions. For example, suppose a transaction that inserts 10 million rows and many OLTP-like (short) transactions are running at the same time. The scenario I thought of was that one 8MB Generation Context Block contains 1MB of the large insert transaction changes, and the other 7MB contains short OLTP transaction changes. If there are just 256 such blocks, even after all short-transactions have completed, the Generation Context will have allocated 2GB of memory until we decode the commit record of the large transaction, but rb->size will say 256MB. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: