Re: Using per-transaction memory contexts for storing decoded tuples - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Using per-transaction memory contexts for storing decoded tuples
Date
Msg-id CAD21AoDaO1txkgic+uE6u2_SDt=BxL9a_5=7-CtADZxKh6g1pw@mail.gmail.com
Whole thread Raw
In response to Re: Using per-transaction memory contexts for storing decoded tuples  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Using per-transaction memory contexts for storing decoded tuples
List pgsql-hackers
On Fri, Sep 27, 2024 at 12:39 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 23 Sept 2024 at 09:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Sep 22, 2024 at 11:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > >
> > > On Fri, 20 Sept 2024 at 17:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Sep 20, 2024 at 5:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > > > In general, it's a bit annoying to have to code around this
> > > > > GenerationContext fragmentation issue.
> > > >
> > > > Right, and I am also slightly afraid that this may not cause some
> > > > regression in other cases where defrag wouldn't help.
> > >
> > > Yeah, that's certainly a possibility. I was hoping that
> > > MemoryContextMemAllocated() being much larger than logical_work_mem
> > > could only happen when there is fragmentation, but certainly, you
> > > could be wasting effort trying to defrag transactions where the
> > > changes all arrive in WAL consecutively and there is no
> > > defragmentation. It might be some other large transaction that's
> > > causing the context's allocations to be fragmented. I don't have any
> > > good ideas on how to avoid wasting effort on non-problematic
> > > transactions. Maybe there's something that could be done if we knew
> > > the LSN of the first and last change and the gap between the LSNs was
> > > much larger than the WAL space used for this transaction. That would
> > > likely require tracking way more stuff than we do now, however.
> > >
> >
> > With more information tracking, we could avoid some non-problematic
> > transactions but still, it would be difficult to predict that we
> > didn't harm many cases because to make the memory non-contiguous, we
> > only need a few interleaving small transactions. We can try to think
> > of ideas for implementing defragmentation in our code if we first can
> > prove that smaller block sizes cause problems.
> >
> > > With the smaller blocks idea, I'm a bit concerned that using smaller
> > > blocks could cause regressions on systems that are better at releasing
> > > memory back to the OS after free() as no doubt malloc() would often be
> > > slower on those systems. There have been some complaints recently
> > > about glibc being a bit too happy to keep hold of memory after free()
> > > and I wondered if that was the reason why the small block test does
> > > not cause much of a performance regression. I wonder how the small
> > > block test would look on Mac, FreeBSD or Windows. I think it would be
> > > risky to assume that all is well with reducing the block size after
> > > testing on a single platform.
> > >
> >
> > Good point. We need extensive testing on different platforms, as you
> > suggest, to verify if smaller block sizes caused any regressions.
>
> I did similar tests on Windows. rb_mem_block_size was changed from 8kB
> to 8MB. Below table shows the result (average of 5 runs) and Standard
> Deviation (of 5 runs) for each block-size.
>
> ===============================================
> block-size  |    Average time (ms)   |    Standard Deviation (ms)
> -------------------------------------------------------------------------------------
> 8kb            |    12580.879 ms         |    144.6923467
> 16kb          |    12442.7256 ms       |    94.02799006
> 32kb          |    12370.7292 ms       |    97.7958552
> 64kb          |    11877.4888 ms       |    222.2419142
> 128kb        |    11828.8568 ms       |    129.732941
> 256kb        |    11801.086 ms         |    20.60030913
> 512kb        |    12361.4172 ms       |    65.27390105
> 1MB          |    12343.3732 ms       |    80.84427202
> 2MB          |    12357.675 ms         |    79.40017604
> 4MB          |    12395.8364 ms       |    76.78273689
> 8MB          |    11712.8862 ms       |    50.74323039
> ==============================================
>
> From the results, I think there is a small regression for small block size.
>
> I ran the tests in git bash. I have also attached the test script.

Thank you for testing on Windows! I've run the same benchmark on Mac
(Sonoma 14.7, M1 Pro):

8kB: 4852.198 ms
16kB: 4822.733 ms
32kB: 4776.776 ms
64kB: 4851.433 ms
128kB: 4804.821 ms
256kB: 4781.778 ms
512kB: 4776.486 ms
1MB: 4783.456 ms
2MB: 4770.671 ms
4MB: 4785.800 ms
8MB: 4747.447 ms

I can see there is a small regression for small block sizes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)
Next
From: Christoph Berg
Date:
Subject: Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)