Re: Using per-transaction memory contexts for storing decoded tuples - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Using per-transaction memory contexts for storing decoded tuples
Date
Msg-id CAA4eK1L2uRD18EgScguTy=AMgscLUfp3KomUaErLzrNmekivYg@mail.gmail.com
Whole thread Raw
In response to Re: Using per-transaction memory contexts for storing decoded tuples  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Using per-transaction memory contexts for storing decoded tuples
List pgsql-hackers
On Fri, Sep 27, 2024 at 10:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Sep 27, 2024 at 12:39 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Mon, 23 Sept 2024 at 09:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sun, Sep 22, 2024 at 11:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > >
> > > > On Fri, 20 Sept 2024 at 17:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Fri, Sep 20, 2024 at 5:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > > > > In general, it's a bit annoying to have to code around this
> > > > > > GenerationContext fragmentation issue.
> > > > >
> > > > > Right, and I am also slightly afraid that this may not cause some
> > > > > regression in other cases where defrag wouldn't help.
> > > >
> > > > Yeah, that's certainly a possibility. I was hoping that
> > > > MemoryContextMemAllocated() being much larger than logical_work_mem
> > > > could only happen when there is fragmentation, but certainly, you
> > > > could be wasting effort trying to defrag transactions where the
> > > > changes all arrive in WAL consecutively and there is no
> > > > defragmentation. It might be some other large transaction that's
> > > > causing the context's allocations to be fragmented. I don't have any
> > > > good ideas on how to avoid wasting effort on non-problematic
> > > > transactions. Maybe there's something that could be done if we knew
> > > > the LSN of the first and last change and the gap between the LSNs was
> > > > much larger than the WAL space used for this transaction. That would
> > > > likely require tracking way more stuff than we do now, however.
> > > >
> > >
> > > With more information tracking, we could avoid some non-problematic
> > > transactions but still, it would be difficult to predict that we
> > > didn't harm many cases because to make the memory non-contiguous, we
> > > only need a few interleaving small transactions. We can try to think
> > > of ideas for implementing defragmentation in our code if we first can
> > > prove that smaller block sizes cause problems.
> > >
> > > > With the smaller blocks idea, I'm a bit concerned that using smaller
> > > > blocks could cause regressions on systems that are better at releasing
> > > > memory back to the OS after free() as no doubt malloc() would often be
> > > > slower on those systems. There have been some complaints recently
> > > > about glibc being a bit too happy to keep hold of memory after free()
> > > > and I wondered if that was the reason why the small block test does
> > > > not cause much of a performance regression. I wonder how the small
> > > > block test would look on Mac, FreeBSD or Windows. I think it would be
> > > > risky to assume that all is well with reducing the block size after
> > > > testing on a single platform.
> > > >
> > >
> > > Good point. We need extensive testing on different platforms, as you
> > > suggest, to verify if smaller block sizes caused any regressions.
> >
> > I did similar tests on Windows. rb_mem_block_size was changed from 8kB
> > to 8MB. Below table shows the result (average of 5 runs) and Standard
> > Deviation (of 5 runs) for each block-size.
> >
> > ===============================================
> > block-size  |    Average time (ms)   |    Standard Deviation (ms)
> > -------------------------------------------------------------------------------------
> > 8kb            |    12580.879 ms         |    144.6923467
> > 16kb          |    12442.7256 ms       |    94.02799006
> > 32kb          |    12370.7292 ms       |    97.7958552
> > 64kb          |    11877.4888 ms       |    222.2419142
> > 128kb        |    11828.8568 ms       |    129.732941
> > 256kb        |    11801.086 ms         |    20.60030913
> > 512kb        |    12361.4172 ms       |    65.27390105
> > 1MB          |    12343.3732 ms       |    80.84427202
> > 2MB          |    12357.675 ms         |    79.40017604
> > 4MB          |    12395.8364 ms       |    76.78273689
> > 8MB          |    11712.8862 ms       |    50.74323039
> > ==============================================
> >
> > From the results, I think there is a small regression for small block size.
> >
> > I ran the tests in git bash. I have also attached the test script.
>
> Thank you for testing on Windows! I've run the same benchmark on Mac
> (Sonoma 14.7, M1 Pro):
>
> 8kB: 4852.198 ms
> 16kB: 4822.733 ms
> 32kB: 4776.776 ms
> 64kB: 4851.433 ms
> 128kB: 4804.821 ms
> 256kB: 4781.778 ms
> 512kB: 4776.486 ms
> 1MB: 4783.456 ms
> 2MB: 4770.671 ms
> 4MB: 4785.800 ms
> 8MB: 4747.447 ms
>
> I can see there is a small regression for small block sizes.
>

So, decoding a large transaction with many smaller allocations can
have ~2.2% overhead with a smaller block size (say 8Kb vs 8MB). In
real workloads, we will have fewer such large transactions or a mix of
small and large transactions. That will make the overhead much less
visible. Does this mean that we should invent some strategy to defrag
the memory at some point during decoding or use any other technique? I
don't find this overhead above the threshold to invent something
fancy. What do others think?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: On non-Windows, hard depend on uselocale(3)
Next
From: Michał Kłeczek
Date:
Subject: Re: SET or STRICT modifiers on function affect planner row estimates