Re: Using per-transaction memory contexts for storing decoded tuples - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Using per-transaction memory contexts for storing decoded tuples
Date
Msg-id CAD21AoCXcU=3bcn0zRypokFm8EcMq0tnA67irt33KtkW3ApaAg@mail.gmail.com
Whole thread Raw
In response to Re: Using per-transaction memory contexts for storing decoded tuples  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Sep 17, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 10:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Sep 13, 2024 at 3:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Sep 12, 2024 at 4:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > We have several reports that logical decoding uses memory much more
> > > > than logical_decoding_work_mem[1][2][3]. For instance in one of the
> > > > reports[1], even though users set logical_decoding_work_mem to
> > > > '256MB', a walsender process was killed by OOM because of using more
> > > > than 4GB memory.
> > > >
> > > > As per the discussion in these threads so far, what happened was that
> > > > there was huge memory fragmentation in rb->tup_context.
> > > > rb->tup_context uses GenerationContext with 8MB memory blocks. We
> > > > cannot free memory blocks until all memory chunks in the block are
> > > > freed. If there is a long-running transaction making changes, its
> > > > changes could be spread across many memory blocks and we end up not
> > > > being able to free memory blocks unless the long-running transaction
> > > > is evicted or completed. Since we don't account fragmentation, block
> > > > header size, nor chunk header size into per-transaction memory usage
> > > > (i.e. txn->size), rb->size could be less than
> > > > logical_decoding_work_mem but the actual allocated memory in the
> > > > context is much higher than logical_decoding_work_mem.
> > > >
> > >
> > > It is not clear to me how the fragmentation happens. Is it because of
> > > some interleaving transactions which are even ended but the memory
> > > corresponding to them is not released?
> >
> > In a generation context, we can free a memory block only when all
> > memory chunks there are freed. Therefore, individual tuple buffers are
> > already pfree()'ed but the underlying memory blocks are not freed.
> >
>
> I understood this part but didn't understand the cases leading to this
> problem. For example, if there is a large (and only) transaction in
> the system that allocates many buffers for change records during
> decoding, in the end, it should free memory for all the buffers
> allocated in the transaction. So, wouldn't that free all the memory
> chunks corresponding to the memory blocks allocated? My guess was that
> we couldn't free all the chunks because there were small interleaving
> transactions that allocated memory but didn't free it before the large
> transaction ended.

We haven't actually checked with the person who reported the problem,
so this is just a guess, but I think there were concurrent
transactions, including long-running INSERT transactions. For example,
suppose a transaction that inserts 10 million rows and many OLTP-like
(short) transactions are running at the same time. The scenario I
thought of was that one 8MB Generation Context Block contains 1MB of
the large insert transaction changes, and the other 7MB contains short
OLTP transaction changes. If there are just 256 such blocks, even
after all short-transactions have completed, the Generation Context
will have allocated 2GB of memory until we decode the commit record of
the large transaction, but rb->size will say 256MB.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: AIO v2.0
Next
From: Marcos Pegoraro
Date:
Subject: Re: Detailed release notes