Thread: pgsql: Generational memory allocator

pgsql: Generational memory allocator

From

Simon Riggs

Date:

22 November 2017, 21:48:19

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Author: Tomas Vondra
Reviewed-by: Simon Riggs

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a4ccc1cef5a04cc054af83bc4582a045d5232cb3

Modified Files
--------------
src/backend/replication/logical/reorderbuffer.c |  80 +--
src/backend/utils/mmgr/Makefile                 |   2 +-
src/backend/utils/mmgr/README                   |  23 +
src/backend/utils/mmgr/generation.c             | 768 ++++++++++++++++++++++++
src/include/nodes/memnodes.h                    |   4 +-
src/include/nodes/nodes.h                       |   1 +
src/include/replication/reorderbuffer.h         |  15 +-
src/include/utils/memutils.h                    |   5 +
8 files changed, 819 insertions(+), 79 deletions(-)

Re: pgsql: Generational memory allocator

From

Andres Freund

Date:

23 November 2017, 03:16:49

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
> Generational memory allocator
> 
> Add new style of memory allocator, known as Generational
> appropriate for use in cases where memory is allocated
> and then freed in roughly oldest first order (FIFO).
> 
> Use new allocator for logical decoding’s reorderbuffer
> to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01

Greetings,

Andres Freund

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

23 November 2017, 12:57:10

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
>> Generational memory allocator
>>
>> Add new style of memory allocator, known as Generational
>> appropriate for use in cases where memory is allocated
>> and then freed in roughly oldest first order (FIFO).
>>
>> Use new allocator for logical decoding’s reorderbuffer
>> to significantly reduce memory usage and improve performance.
>
> Looks like it's not quite valgrind clean:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Please enable some info reporting on the animal.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Tomas Vondra

Date:

23 November 2017, 18:16:08


On 11/23/2017 10:57 AM, Simon Riggs wrote:
> On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
>>> Generational memory allocator
>>>
>>> Add new style of memory allocator, known as Generational
>>> appropriate for use in cases where memory is allocated
>>> and then freed in roughly oldest first order (FIFO).
>>>
>>> Use new allocator for logical decoding’s reorderbuffer
>>> to significantly reduce memory usage and improve performance.
>>
>> Looks like it's not quite valgrind clean:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
> 
> It doesn't report anything useful that would allow me to fix this.
> 
> Please enable some info reporting on the animal.
> 

I agree it would be useful to get the valgrind log from the animal, but 
I guess you'd need to get valgrind working to fix the issue anyway.

Anyway, the attached patch fixes the issues for me - strictly speaking 
the Assert is not needed, but it doesn't hurt.

regards
Tomas

Attachment

genalloc-valgrind-fix.patch

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

23 November 2017, 21:52:58

On 24 November 2017 at 02:16, Tomas Vondra <tv@fuzzy.cz> wrote:
>
>
> On 11/23/2017 10:57 AM, Simon Riggs wrote:
>>
>> On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
>>>
>>> Hi,
>>>
>>> On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
>>>>
>>>> Generational memory allocator
>>>>
>>>> Add new style of memory allocator, known as Generational
>>>> appropriate for use in cases where memory is allocated
>>>> and then freed in roughly oldest first order (FIFO).
>>>>
>>>> Use new allocator for logical decoding’s reorderbuffer
>>>> to significantly reduce memory usage and improve performance.
>>>
>>>
>>> Looks like it's not quite valgrind clean:
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
>>
>>
>> It doesn't report anything useful that would allow me to fix this.
>>
>> Please enable some info reporting on the animal.
>>
>
> I agree it would be useful to get the valgrind log from the animal, but I
> guess you'd need to get valgrind working to fix the issue anyway.

If something fails compilation it tells us why, not just "compilation failed".

The whole point of the buildfarm is to report errors to allow them to
be fixed, not just a fence that blocks things.


> Anyway, the attached patch fixes the issues for me - strictly speaking the
> Assert is not needed, but it doesn't hurt.

Thanks for the patch.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Andres Freund

Date:

23 November 2017, 22:39:36

On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
> On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
> > Hi,
> >
> > On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
> >> Generational memory allocator
> >>
> >> Add new style of memory allocator, known as Generational
> >> appropriate for use in cases where memory is allocated
> >> and then freed in roughly oldest first order (FIFO).
> >>
> >> Use new allocator for logical decoding’s reorderbuffer
> >> to significantly reduce memory usage and improve performance.
> >
> > Looks like it's not quite valgrind clean:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
> 
> It doesn't report anything useful that would allow me to fix this.

Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.

> Please enable some info reporting on the animal.

Hey, I just pointed out for you that there's a problem with something
that you committed. ISTM you're inverting the responsibilities more than
a bit here.

Greetings,

Andres Freund

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

23 November 2017, 23:06:42

Andres Freund <andres@anarazel.de> writes:
> On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
>> On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
>>> Looks like it's not quite valgrind clean:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01

>> It doesn't report anything useful that would allow me to fix this.

> Uh. It reports a failure, and you can go from there. The error output
> actually is in postmaster.log but for some reason the buildfarm code
> didn't display that in this case.

I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.

In any case, valgrind failures are generally easy enough to reproduce
locally.

Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2017-11-23%2013%3A56%3A17

we have

ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2
-fstack-protector--param=ssp-buffer-size=4 -Wformat -Werror=format-security  -I../../../../src/include
-D_FORTIFY_SOURCE=2-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o generation.o generation.c 
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1

Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Tomas Vondra

Date:

24 November 2017, 00:34:57


On 11/23/2017 09:06 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
>>> On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
>>>> Looks like it's not quite valgrind clean:
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
> 
>>> It doesn't report anything useful that would allow me to fix this.
> 
>> Uh. It reports a failure, and you can go from there. The error output
>> actually is in postmaster.log but for some reason the buildfarm code
>> didn't display that in this case.
> 

It may not be immediately obvious that the failure is due to valgrind, 
but otherwise I agree it's up to us to investigate.

> I think it's a legitimate complaint that postmaster.log wasn't captured
> in this failure, but that's a buildfarm script oversight and hardly
> Andres' fault.
> 

Are the valgrind errors really written to postmaster log? I'm assuming 
it failed because valgrind ran into an issue and killed the process.

> In any case, valgrind failures are generally easy enough to reproduce
> locally.
> 

Right.

> Meanwhile, over on
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2017-11-23%2013%3A56%3A17
> 
> we have
> 
> ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2
-fstack-protector--param=ssp-buffer-size=4 -Wformat -Werror=format-security  -I../../../../src/include
-D_FORTIFY_SOURCE=2-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o generation.o generation.c
 
> generation.c: In function ‘GenerationContextCreate’:
> generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
> make[4]: *** [generation.o] Error 1
> 
> Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
> must be a maxaligned quantity, which is wrong on platforms where
> MAXALIGN is bigger than sizeof(pointer).
> 

Hmmm, I see. Presumably adding this to GenerationChunk (similarly to 
what we do in AllocChunkData) would address the issue:

#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4Size        padding;
#endif

but I have no way to verify that (no access to such machine). I wonder 
why SlabChunk doesn't need to do that (perhaps a comment explaining that 
would be appropriate?).

regards
Tomas

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

24 November 2017, 00:42:29

On 24 November 2017 at 07:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Meanwhile, over on
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2017-11-23%2013%3A56%3A17
>
> we have
>
> ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2
-fstack-protector--param=ssp-buffer-size=4 -Wformat -Werror=format-security  -I../../../../src/include
-D_FORTIFY_SOURCE=2-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o generation.o generation.c 
> generation.c: In function ‘GenerationContextCreate’:
> generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
> make[4]: *** [generation.o] Error 1
>
> Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
> must be a maxaligned quantity, which is wrong on platforms where
> MAXALIGN is bigger than sizeof(pointer).

Thanks, yes.

I can't see how to resolve that without breaking the design assumption
at line 122, "there must not be any padding to reach a MAXALIGN
boundary here!".

So I'll wait for Tomas to comment.

(I'm just about to catch a long flight, so will be offline for 24
hours, so we may need to revert this before I get back if it is
difficult to resolve.)

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Andres Freund

Date:

24 November 2017, 00:46:36

On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
> > I think it's a legitimate complaint that postmaster.log wasn't captured
> > in this failure, but that's a buildfarm script oversight and hardly
> > Andres' fault.
> > 
> 
> Are the valgrind errors really written to postmaster log? I'm assuming it
> failed because valgrind ran into an issue and killed the process.

Yes. Search e.g. in
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-09-18%2004%3A10%3A01
for VALGRINDERROR-BEGIN.

(You could argue that they're only written there in certain
configurations, because I'd assume it'd not work in e.g. syslog
configured systems, because valgrind just prints to stderr).

> Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
> do in AllocChunkData) would address the issue:
> 
> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
>     Size        padding;
> #endif
> 
> but I have no way to verify that (no access to such machine). I wonder why
> SlabChunk doesn't need to do that (perhaps a comment explaining that would
> be appropriate?).

Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?

Greetings,

Andres Freund

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

24 November 2017, 00:47:07

On 24 November 2017 at 06:39, Andres Freund <andres@anarazel.de> wrote:
> On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
>> On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
>> > Hi,
>> >
>> > On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
>> >> Generational memory allocator
>> >>
>> >> Add new style of memory allocator, known as Generational
>> >> appropriate for use in cases where memory is allocated
>> >> and then freed in roughly oldest first order (FIFO).
>> >>
>> >> Use new allocator for logical decoding’s reorderbuffer
>> >> to significantly reduce memory usage and improve performance.
>> >
>> > Looks like it's not quite valgrind clean:
>> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
>>
>> It doesn't report anything useful that would allow me to fix this.
>
> Uh. It reports a failure, and you can go from there. The error output
> actually is in postmaster.log but for some reason the buildfarm code
> didn't display that in this case.

>> Please enable some info reporting on the animal.
>
> Hey, I just pointed out for you that there's a problem with something
> that you committed. ISTM you're inverting the responsibilities more than
> a bit here.

I don't believe I was inverting responsibilties, but sorry if you thought I was.

Yes, there is a bug in something I committed which I know about
because of the buildfarm. I am happy to work on fixing it. No
discussion on responsibility there.

I can see that buildfarm animal is yours, so I asked you for more info
about the bug on that animal. ISTM reasonable to ask for some more
detail on a bug report, and where that is an automated service to
request that the owner of that service make changes to assist. I don't
see why that implies some change of my responsibilities.

Thanks for your help

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

24 November 2017, 01:04:36

Andres Freund <andres@anarazel.de> writes:
> On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
>> Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
>> do in AllocChunkData) would address the issue:
>>
>> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
>> Size        padding;
>> #endif
>>
>> but I have no way to verify that (no access to such machine). I wonder why
>> SlabChunk doesn't need to do that (perhaps a comment explaining that would
>> be appropriate?).

> Can't you just compile pg on a 32bit system and manually define MAXALIGN
> to 8 bytes?

I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Tomas Vondra

Date:

24 November 2017, 01:51:03


On 11/23/2017 11:04 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
>>> Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
>>> do in AllocChunkData) would address the issue:
>>>
>>> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
>>> Size        padding;
>>> #endif
>>>
>>> but I have no way to verify that (no access to such machine). I wonder why
>>> SlabChunk doesn't need to do that (perhaps a comment explaining that would
>>> be appropriate?).
> 
>> Can't you just compile pg on a 32bit system and manually define MAXALIGN
>> to 8 bytes?
> 
> I pushed a patch that computes how much padding to add and adds it.
> (It might not really work if size_t and void * are different sizes,
> because then there could be additional padding in the struct; but
> that seems very unlikely.)
> 

Thanks. Do we need to do something similar to the other memory contexts? 
I see Slab does not do this at all (assuming it's not necessary), and 
AllocSet does this in a different way (which seems a bit strange).

regards
Tomas

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

24 November 2017, 07:19:14

Tomas Vondra <tv@fuzzy.cz> writes:
> On 11/23/2017 11:04 PM, Tom Lane wrote:
>> I pushed a patch that computes how much padding to add and adds it.
>> (It might not really work if size_t and void * are different sizes,
>> because then there could be additional padding in the struct; but
>> that seems very unlikely.)

> Thanks. Do we need to do something similar to the other memory contexts? 
> I see Slab does not do this at all (assuming it's not necessary), and 
> AllocSet does this in a different way (which seems a bit strange).

Hm ... the coding in AllocSet is ancient and I have to say that I don't
like it as well as what I put into generation.c.  We could not have done
it in that way at the time, because (IIRC) we did not have constant macros
for SIZEOF_SIZE_T, and maybe not SIZEOF_VOID_P either --- and you need
constant macros in order to do the #if calculation, because sizeof() does
not work in preprocessor expressions.  But we have 'em now, so I'm tempted
to change aset.c to match generation.c.

As for SlabChunk, I think that's fine as-is (except I wonder why the
"block" field is declared "void *" rather than "SlabBlock *").  Since
the contents are fixed at two pointer fields, it's impossible that
there's any unexpected padding in there --- the struct size is
basically guaranteed to be 2 * sizeof(pointer).  Which makes it either 8
or 16 bytes, either one of which is certain to be a multiple of MAXALIGN.
So I think the StaticAssert that that's true is plenty sufficient in
that case.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

24 November 2017, 17:29:52

On 24 November 2017 at 09:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
>>> Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
>>> do in AllocChunkData) would address the issue:
>>>
>>> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
>>> Size         padding;
>>> #endif
>>>
>>> but I have no way to verify that (no access to such machine). I wonder why
>>> SlabChunk doesn't need to do that (perhaps a comment explaining that would
>>> be appropriate?).
>
>> Can't you just compile pg on a 32bit system and manually define MAXALIGN
>> to 8 bytes?
>
> I pushed a patch that computes how much padding to add and adds it.
> (It might not really work if size_t and void * are different sizes,
> because then there could be additional padding in the struct; but
> that seems very unlikely.)

Oh, OK, thanks.

It sunk in what was needed while flying, but that's better than my efforts.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Andrew Dunstan

Date:

24 November 2017, 18:35:43


On 11/23/2017 03:06 PM, Tom Lane wrote:
>
> I think it's a legitimate complaint that postmaster.log wasn't captured
> in this failure, but that's a buildfarm script oversight and hardly
> Andres' fault.
>

A fix for this will be in the next buildfarm release.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

24 November 2017, 21:31:53

Tomas Vondra <tv@fuzzy.cz> writes:
> I agree it would be useful to get the valgrind log from the animal, but
> I guess you'd need to get valgrind working to fix the issue anyway.
> Anyway, the attached patch fixes the issues for me - strictly speaking
> the Assert is not needed, but it doesn't hurt.

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run:

==00:00:00:08.768 15475== Invalid read of size 8
==00:00:00:08.768 15475==    at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780)
==00:00:00:08.768 15475==    by 0x70291E: LogicalDecodingProcessRecord (decode.c:371)
==00:00:00:08.768 15475==    by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308)
==00:00:00:08.768 15475==    by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.768 15475==    by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.768 15475==    by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.768 15475==    by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.768 15475==    by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.768 15475==    by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.768 15475==    by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.768 15475==    by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.768 15475==    by 0x6EFDC9: PostmasterMain (postmaster.c:4412)
==00:00:00:08.768 15475==  Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined
==00:00:00:08.768 15475==    at 0x89A17F: palloc (mcxt.c:871)
==00:00:00:08.768 15475==    by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283)
==00:00:00:08.768 15475==    by 0x520CD3: XLogReadRecord (xlogreader.c:475)
==00:00:00:08.768 15475==    by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293)
==00:00:00:08.769 15475==    by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.769 15475==    by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.769 15475==    by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.769 15475==    by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.769 15475==    by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.769 15475==    by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.769 15475==    by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.769 15475==    by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.769 15475==

Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I'm tempted to push Tomas' patch as-is, just to see if skink sees the
same failure I do.

BTW, I'm pretty certain that there are additional bugs in generation.c's
valgrind support, in code paths that the regression tests likely do not
reach.  I do not think it's setting up valgrind correctly for "large"
chunks, and it looks to me like GenerationCheck would fail because it
tries to access chunk->requested_size which is generally kept NOACCESS.
I'm tempted to rip out all of the logic concerned with flipping
chunk->requested_size between normal and NOACCESS states, as that seems to
me like an exercise much more likely to introduce bugs than to catch any.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

24 November 2017, 22:13:12

I wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> Thanks. Do we need to do something similar to the other memory contexts? 
>> I see Slab does not do this at all (assuming it's not necessary), and 
>> AllocSet does this in a different way (which seems a bit strange).

> Hm ... the coding in AllocSet is ancient and I have to say that I don't
> like it as well as what I put into generation.c.

I take that back: the current coding of padding in AllocChunkData only
dates back to 7e3aa03b.  But I still don't like it, and will migrate
generation.c's version into aset.c.  And maybe improve the comments.

BTW, it appears that some of the confusion in generation.c can be
accounted for by not having entirely followed the changes in 7e3aa03b.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

25 November 2017, 04:25:38

I wrote:
> For me, this patch fixes the valgrind failures inside generation.c
> itself, but I still see one more in the test_decoding run: ...
> Not sure what to make of this: the stack traces make it look unrelated
> to the GenerationContext changes, but if it's not related, how come
> skink was passing before that patch went in?

I've pushed fixes for everything that I could find wrong in generation.c
(and there was a lot :-().  But I'm still seeing the "invalid read in
SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
Somebody who has more familiarity with the logical decoding stuff than
I do needs to look into that.

I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
triggering the failure, with the attached patch.  Weirdly, *it does not
fail* with this.  I have no explanation for that.

            regards, tom lane

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index ad65b98..a48db7e 100644
*** a/src/backend/replication/logical/snapbuild.c
--- b/src/backend/replication/logical/snapbuild.c
*************** SnapBuildProcessNewCid(SnapBuild *builde
*** 770,775 ****
--- 770,781 ----
                         XLogRecPtr lsn, xl_heap_new_cid *xlrec)
  {
      CommandId    cid;
+     TransactionId top_xid;
+     RelFileNode node;
+     ItemPointerData tid;
+     CommandId cmin;
+     CommandId cmax;
+     CommandId combocid;

      /*
       * we only log new_cid's if a catalog tuple was modified, so mark the
*************** SnapBuildProcessNewCid(SnapBuild *builde
*** 777,786 ****
       */
      ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);

!     ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
!                                  xlrec->target_node, xlrec->target_tid,
!                                  xlrec->cmin, xlrec->cmax,
!                                  xlrec->combocid);

      /* figure out new command id */
      if (xlrec->cmin != InvalidCommandId &&
--- 783,799 ----
       */
      ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);

!     top_xid = xlrec->top_xid;
!     node = xlrec->target_node;
!     tid = xlrec->target_tid;
!     cmin = xlrec->cmin;
!     cmax = xlrec->cmax;
!     combocid = xlrec->combocid;
!
!     ReorderBufferAddNewTupleCids(builder->reorder, top_xid, lsn,
!                                  node, tid,
!                                  cmin, cmax,
!                                  combocid);

      /* figure out new command id */
      if (xlrec->cmin != InvalidCommandId &&

Re: pgsql: Generational memory allocator

From

Tomas Vondra

Date:

25 November 2017, 07:54:18

Hi,

On 11/25/2017 02:25 AM, Tom Lane wrote:
> I wrote:
>> For me, this patch fixes the valgrind failures inside generation.c
>> itself, but I still see one more in the test_decoding run: ...
>> Not sure what to make of this: the stack traces make it look unrelated
>> to the GenerationContext changes, but if it's not related, how come
>> skink was passing before that patch went in?
> 
> I've pushed fixes for everything that I could find wrong in generation.c
> (and there was a lot :-().  But I'm still seeing the "invalid read in
> SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
> Somebody who has more familiarity with the logical decoding stuff than
> I do needs to look into that.
> 
> I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
> triggering the failure, with the attached patch.  Weirdly, *it does not
> fail* with this.  I have no explanation for that.
> 

I have no explanation for that either. FWIW I don't think this is 
related to the new memory contexts. I can reproduce it on 3bae43c (i.e. 
before the Generation memory context was introduced), and with Slab 
removed from ReorderBuffer.

I wonder if this might be a valgrind issue. I'm not sure which version 
skink is using, but I'm running with valgrind-3.12.0-9.el7_4.x86_64.

BTW I also see these failures in hstore:

==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
==15168==    at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
==15168==    by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
==15168==    by 0x15419EE4: hstore_in (hstore_io.c:416)
==15168==    by 0x9ED11A: InputFunctionCall (fmgr.c:1635)
==15168==    by 0x9ED3C2: OidInputFunctionCall (fmgr.c:1738)
==15168==    by 0x6014A2: stringTypeDatum (parse_type.c:641)
==15168==    by 0x5E1ADC: coerce_type (parse_coerce.c:304)
==15168==    by 0x5E17A9: coerce_to_target_type (parse_coerce.c:103)
==15168==    by 0x5EDD6D: transformTypeCast (parse_expr.c:2724)
==15168==    by 0x5E8860: transformExprRecurse (parse_expr.c:203)
==15168==    by 0x5E8601: transformExpr (parse_expr.c:156)
==15168==    by 0x5FCF95: transformTargetEntry (parse_target.c:103)
==15168==    by 0x5FD15D: transformTargetList (parse_target.c:191)
==15168==    by 0x5A5EEC: transformSelectStmt (analyze.c:1214)
==15168==    by 0x5A4453: transformStmt (analyze.c:297)
==15168==    by 0x5A4381: transformOptionalSelectInto (analyze.c:242)
==15168==    by 0x5A423F: transformTopLevelStmt (analyze.c:192)
==15168==    by 0x5A4097: parse_analyze (analyze.c:112)
==15168==    by 0x87E0AF: pg_analyze_and_rewrite (postgres.c:664)
==15168==    by 0x87E6EE: exec_simple_query (postgres.c:1045)

Seems hstoreUniquePairs may call memcpy with the same pointers in some 
cases (which looks a bit dubious). But the code is ancient, so it's 
strange it didn't fail before.

regards
Tomas

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

25 November 2017, 08:18:25

Tomas Vondra <tv@fuzzy.cz> writes:
> BTW I also see these failures in hstore:

> ==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
> ==15168==    at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
> ==15168==    by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
> ==15168==    by 0x15419EE4: hstore_in (hstore_io.c:416)

Huh ...

> Seems hstoreUniquePairs may call memcpy with the same pointers in some 
> cases (which looks a bit dubious). But the code is ancient, so it's 
> strange it didn't fail before.

Quite.  It's easy to see how to avoid the nominally-undefined behavior:

-            memcpy(res, ptr, sizeof(Pairs));
+            if (res != ptr)
+                memcpy(res, ptr, sizeof(Pairs));

but this should surely have been noticed by valgrind tests before.
The case doesn't necessarily occur --- if the first two items in
sorted order are dups, it won't --- but if you're seeing it occur
in regression testing then skink should have seen it as well.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

25 November 2017, 22:50:41

I wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> BTW I also see these failures in hstore:

>> ==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
>> ==15168==    at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
>> ==15168==    by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
>> ==15168==    by 0x15419EE4: hstore_in (hstore_io.c:416)

> Huh ...

I tried to duplicate this on my RHEL6 workstation, and failed to,
even though adding an assertion easily proves that the hstore
regression test does exercise the case.  So apparently the answer
as to why skink isn't reporting this is just "not all versions of
valgrind check it".

I checked the git history and noted that we've previously fixed other
nominally-undefined usage of memcpy() on the grounds that OpenBSD's
libc will complain about it.  So it seems like something that would
be good to fix, and I did so.

Meanwhile, skink has come back with a success as of 0f2458f, rendering
the other mystery even deeper --- although at least this confirms the
idea that the crashes we are seeing predate the generation.c patch.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Andres Freund

Date:

25 November 2017, 23:00:15

Hi,

On 2017-11-25 14:50:41 -0500, Tom Lane wrote:
> I wrote:
> > Tomas Vondra <tv@fuzzy.cz> writes:
> >> BTW I also see these failures in hstore:
>
> >> ==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
> >> ==15168==    at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
> >> ==15168==    by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
> >> ==15168==    by 0x15419EE4: hstore_in (hstore_io.c:416)
>
> > Huh ...
>
> I tried to duplicate this on my RHEL6 workstation, and failed to,
> even though adding an assertion easily proves that the hstore
> regression test does exercise the case.  So apparently the answer
> as to why skink isn't reporting this is just "not all versions of
> valgrind check it".

I suspect that the issue rather is that the compiler will sometimes
replace the memcpy() with an in-line member-by-member version. That'll
not be visible as a memcpy to valgrind.


> Meanwhile, skink has come back with a success as of 0f2458f, rendering
> the other mystery even deeper --- although at least this confirms the
> idea that the crashes we are seeing predate the generation.c patch.

Hm, wonder whether that could be optimization dependent too.

Greetings,

Andres Freund

Re: pgsql: Generational memory allocator

From

Andres Freund

Date:

25 November 2017, 23:13:56

On 2017-11-25 12:00:15 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-25 14:50:41 -0500, Tom Lane wrote:
> > I wrote:
> > > Tomas Vondra <tv@fuzzy.cz> writes:
> > >> BTW I also see these failures in hstore:
> >
> > >> ==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
> > >> ==15168==    at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
> > >> ==15168==    by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
> > >> ==15168==    by 0x15419EE4: hstore_in (hstore_io.c:416)
> >
> > > Huh ...
> >
> > I tried to duplicate this on my RHEL6 workstation, and failed to,
> > even though adding an assertion easily proves that the hstore
> > regression test does exercise the case.  So apparently the answer
> > as to why skink isn't reporting this is just "not all versions of
> > valgrind check it".
> 
> I suspect that the issue rather is that the compiler will sometimes
> replace the memcpy() with an in-line member-by-member version. That'll
> not be visible as a memcpy to valgrind.

That's indeed the case. Here's the disassembly from skink, albeit for
v10, because those objects were currently present:

disassemble /s hstoreUniquePairs
...
342                             res++;  0x00000000000005c2 <+174>:   add    $0x28,%rbx

343                             memcpy(res, ptr, sizeof(Pairs));  0x00000000000005c6 <+178>:   mov    (%r12),%rax
0x00000000000005ca<+182>:   mov    0x8(%r12),%rdx  0x00000000000005cf <+187>:   mov    %rax,(%rbx)  0x00000000000005d2
<+190>:  mov    %rdx,0x8(%rbx)  0x00000000000005d6 <+194>:   mov    0x10(%r12),%rax  0x00000000000005db <+199>:   mov
0x18(%r12),%rdx  0x00000000000005e0 <+204>:   mov    %rax,0x10(%rbx)  0x00000000000005e4 <+208>:   mov
%rdx,0x18(%rbx) 0x00000000000005e8 <+212>:   mov    0x20(%r12),%rax  0x00000000000005ed <+217>:   mov
%rax,0x20(%rbx)

344                     }
345
...

Greetings,

Andres Freund

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

26 November 2017, 00:19:20

Andres Freund <andres@anarazel.de> writes:
> On 2017-11-25 14:50:41 -0500, Tom Lane wrote:
>> Meanwhile, skink has come back with a success as of 0f2458f, rendering
>> the other mystery even deeper --- although at least this confirms the
>> idea that the crashes we are seeing predate the generation.c patch.

> Hm, wonder whether that could be optimization dependent too.

Evidently :-(.  I examined my compiler's assembly output for the
relevant line, snapbuild.c:780:
   ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
xlrec->target_node,xlrec->target_tid,                                xlrec->cmin, xlrec->cmax,
     xlrec->combocid);

which is
movl    12(%rbx), %eax        combocidmovq    16(%rbx), %rcx        target_node.spcNode + dbNodemovq    %rbp, %rdxmovl
 24(%rbx), %r8d        target_node.relNodemovq    56(%r12), %rdimovq    28(%rbx), %r9        target_tid ... 8 bytes
worthmovl   (%rbx), %esi        top_xidmovl    %eax, 16(%rsp)movl    8(%rbx), %eax        cmaxmovl    %eax, 8(%rsp)movl
  4(%rbx), %eax        cminmovl    %eax, (%rsp)call    ReorderBufferAddNewTupleCids

I've annotated the fetches from *rbx to match the data structure:

typedef struct xl_heap_new_cid
{   /*    * store toplevel xid so we don't have to merge cids from different    * transactions    */   TransactionId
top_xid;  CommandId    cmin;   CommandId    cmax;

   /*    * don't really need the combocid since we have the actual values right in    * this struct, but the padding
makesit free and its useful for    * debugging.    */   CommandId    combocid;

   /*    * Store the relfilenode/ctid pair to facilitate lookups.    */   RelFileNode target_node;   ItemPointerData
target_tid;
} xl_heap_new_cid;

and what's apparent is that it's grabbing 8 bytes not just 6 from
target_tid.  So the failure case must occur when the xl_heap_new_cid
WAL record is right at the end of the palloc'd record buffer and
valgrind notices that we're fetching padding bytes.

I suspect that gcc feels within its rights to do this because the
size/alignment of xl_heap_new_cid is a multiple of 4 bytes, so that
there ought to be 2 padding bytes at the end.

We could possibly prevent that by marking the whole xl_heap_new_cid
struct as packed/align(2), the same way ItemPointerData is marked,
but that would render accesses to all of its fields inefficient
on alignment-sensitive machines.  Moreover, if we go that route,
we have a boatload of other xlog record formats with similar hazards.

Instead I propose that we should make sure that the palloc request size
for XLogReaderState->main_data is always maxalign'd.  The existing
behavior in DecodeXLogRecord of palloc'ing it only just barely big
enough for the current record seems pretty brain-dead performance-wise
even without this consideration.  Generally, if we need to enlarge
that buffer, we should enlarge it significantly, IMO.

BTW, that comment in the middle of xl_heap_new_cid about "padding
makes this field free" seems entirely wrong.  By my count the
struct is currently 34 bytes, so if by padding it's talking about
maxalign alignment of the whole struct, that field is costing us 8 bytes
on maxalign-8 machines.  There's certainly no internal alignment
that would make it free.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

26 November 2017, 00:46:02

I wrote:
> Instead I propose that we should make sure that the palloc request size
> for XLogReaderState->main_data is always maxalign'd.  The existing
> behavior in DecodeXLogRecord of palloc'ing it only just barely big
> enough for the current record seems pretty brain-dead performance-wise
> even without this consideration.  Generally, if we need to enlarge
> that buffer, we should enlarge it significantly, IMO.

I've confirmed that the attached is sufficient to stop the valgrind crash
on my machine.  But as I said, I think we should be more aggressive at
resizing the buffer, to reduce resize cycles.  I'm inclined to start out
with a buffer size of 128 or 256 or so bytes and double it when needed.
Anybody have a feeling for a typical size for the "main data" part
of a WAL record?

            regards, tom lane

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index aeaafed..4840b0e 100644
*** a/src/backend/access/transam/xlogreader.c
--- b/src/backend/access/transam/xlogreader.c
*************** DecodeXLogRecord(XLogReaderState *state,
*** 1279,1285 ****
          {
              if (state->main_data)
                  pfree(state->main_data);
!             state->main_data_bufsz = state->main_data_len;
              state->main_data = palloc(state->main_data_bufsz);
          }
          memcpy(state->main_data, ptr, state->main_data_len);
--- 1279,1285 ----
          {
              if (state->main_data)
                  pfree(state->main_data);
!             state->main_data_bufsz = MAXALIGN(state->main_data_len);
              state->main_data = palloc(state->main_data_bufsz);
          }
          memcpy(state->main_data, ptr, state->main_data_len);

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

26 November 2017, 12:33:01

On 26 November 2017 at 08:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Instead I propose that we should make sure that the palloc request size
>> for XLogReaderState->main_data is always maxalign'd.  The existing
>> behavior in DecodeXLogRecord of palloc'ing it only just barely big
>> enough for the current record seems pretty brain-dead performance-wise
>> even without this consideration.  Generally, if we need to enlarge
>> that buffer, we should enlarge it significantly, IMO.
>
> I've confirmed that the attached is sufficient to stop the valgrind crash
> on my machine.  But as I said, I think we should be more aggressive at
> resizing the buffer, to reduce resize cycles.  I'm inclined to start out
> with a buffer size of 128 or 256 or so bytes and double it when needed.
> Anybody have a feeling for a typical size for the "main data" part
> of a WAL record?

We reuse the buffer and only pfree/palloc when we need to enlarge the
buffer, so not sure we need to do the doubling thing and it probably
doesn't matter what the typical size is.

So I think we're just good to go with your patch.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

26 November 2017, 12:33:42

On 25 November 2017 at 02:35, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
>
>
> On 11/23/2017 03:06 PM, Tom Lane wrote:
>>
>> I think it's a legitimate complaint that postmaster.log wasn't captured
>> in this failure, but that's a buildfarm script oversight and hardly
>> Andres' fault.
>>
>
> A fix for this will be in the next buildfarm release.

Thanks

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

26 November 2017, 12:34:49

On 25 November 2017 at 12:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> For me, this patch fixes the valgrind failures inside generation.c
>> itself, but I still see one more in the test_decoding run: ...
>> Not sure what to make of this: the stack traces make it look unrelated
>> to the GenerationContext changes, but if it's not related, how come
>> skink was passing before that patch went in?
>
> I've pushed fixes for everything that I could find wrong in generation.c
> (and there was a lot :-().

Thanks very much for doing that.  I'll review changes and comment.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

26 November 2017, 20:46:46

Simon Riggs <simon@2ndquadrant.com> writes:
> On 26 November 2017 at 08:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've confirmed that the attached is sufficient to stop the valgrind crash
>> on my machine.  But as I said, I think we should be more aggressive at
>> resizing the buffer, to reduce resize cycles.  I'm inclined to start out
>> with a buffer size of 128 or 256 or so bytes and double it when needed.
>> Anybody have a feeling for a typical size for the "main data" part
>> of a WAL record?

> We reuse the buffer and only pfree/palloc when we need to enlarge the
> buffer, so not sure we need to do the doubling thing and it probably
> doesn't matter what the typical size is.

Well, I'm concerned about the possibility of a lot of palloc thrashing
if the first bunch of records it reads happen to have steadily increasing
sizes.  However, rather than doubling, it might be sufficient to set a
robust minimum on the first allocation, ie use something along the lines
of Max(1024, MAXALIGN(state->main_data_len)).
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

26 November 2017, 21:45:33

On 27 November 2017 at 04:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 26 November 2017 at 08:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I've confirmed that the attached is sufficient to stop the valgrind crash
>>> on my machine.  But as I said, I think we should be more aggressive at
>>> resizing the buffer, to reduce resize cycles.  I'm inclined to start out
>>> with a buffer size of 128 or 256 or so bytes and double it when needed.
>>> Anybody have a feeling for a typical size for the "main data" part
>>> of a WAL record?
>
>> We reuse the buffer and only pfree/palloc when we need to enlarge the
>> buffer, so not sure we need to do the doubling thing and it probably
>> doesn't matter what the typical size is.
>
> Well, I'm concerned about the possibility of a lot of palloc thrashing
> if the first bunch of records it reads happen to have steadily increasing
> sizes.  However, rather than doubling, it might be sufficient to set a
> robust minimum on the first allocation, ie use something along the lines
> of Max(1024, MAXALIGN(state->main_data_len)).

Agreed.

I was just researching what that number should be... and I was
thinking that we should use the maximum normal tuple size, which I
think is

TOAST_TUPLE_THRESHOLD +
SizeOfXLogRecord +
SizeOfXLogRecordDataHeaderLong

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

26 November 2017, 21:53:37

Simon Riggs <simon@2ndquadrant.com> writes:
> On 27 November 2017 at 04:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, I'm concerned about the possibility of a lot of palloc thrashing
>> if the first bunch of records it reads happen to have steadily increasing
>> sizes.  However, rather than doubling, it might be sufficient to set a
>> robust minimum on the first allocation, ie use something along the lines
>> of Max(1024, MAXALIGN(state->main_data_len)).

> Agreed.

> I was just researching what that number should be... and I was
> thinking that we should use the maximum normal tuple size, which I
> think is

> TOAST_TUPLE_THRESHOLD +
> SizeOfXLogRecord +
> SizeOfXLogRecordDataHeaderLong

Well, let's not overthink this, because anything under 8K is going to
be rounded up to the next power of 2 anyway by aset.c.  Based on this
point I'd say that BLCKSZ/2 or BLCKSZ/4 would be reasonable candidates
for the minimum.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

26 November 2017, 22:09:02

On 27 November 2017 at 05:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 27 November 2017 at 04:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, I'm concerned about the possibility of a lot of palloc thrashing
>>> if the first bunch of records it reads happen to have steadily increasing
>>> sizes.  However, rather than doubling, it might be sufficient to set a
>>> robust minimum on the first allocation, ie use something along the lines
>>> of Max(1024, MAXALIGN(state->main_data_len)).
>
>> Agreed.
>
>> I was just researching what that number should be... and I was
>> thinking that we should use the maximum normal tuple size, which I
>> think is
>
>> TOAST_TUPLE_THRESHOLD +
>> SizeOfXLogRecord +
>> SizeOfXLogRecordDataHeaderLong
>
> Well, let's not overthink this, because anything under 8K is going to
> be rounded up to the next power of 2 anyway by aset.c.  Based on this
> point I'd say that BLCKSZ/2 or BLCKSZ/4 would be reasonable candidates
> for the minimum.

BLCKSZ/2 seems best then.

I guess that means palloc doesn't work well with BLCKSZ > 8192

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

26 November 2017, 22:42:54

Simon Riggs <simon@2ndquadrant.com> writes:
> On 27 November 2017 at 05:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, let's not overthink this, because anything under 8K is going to
>> be rounded up to the next power of 2 anyway by aset.c.  Based on this
>> point I'd say that BLCKSZ/2 or BLCKSZ/4 would be reasonable candidates
>> for the minimum.

> BLCKSZ/2 seems best then.

Sold, will make it so.

> I guess that means palloc doesn't work well with BLCKSZ > 8192

Don't think it's particularly related.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

27 November 2017, 00:17:47

On 27 November 2017 at 06:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 27 November 2017 at 05:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, let's not overthink this, because anything under 8K is going to
>>> be rounded up to the next power of 2 anyway by aset.c.  Based on this
>>> point I'd say that BLCKSZ/2 or BLCKSZ/4 would be reasonable candidates
>>> for the minimum.
>
>> BLCKSZ/2 seems best then.
>
> Sold, will make it so.

So you will like this next patch also, since there is related code
above that stanza.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

data_bufsz.v1.patch

Re: pgsql: Generational memory allocator

From

Tom Lane

Date:

27 November 2017, 01:06:11

Simon Riggs <simon@2ndquadrant.com> writes:
> On 27 November 2017 at 06:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Sold, will make it so.

> So you will like this next patch also, since there is related code
> above that stanza.

Looks reasonable to me, but I wonder whether BLCKSZ is a good initial
setting.  Since the data length is only uint16, we're apparently assuming
that the contents must be small.
        regards, tom lane

Re: pgsql: Generational memory allocator

From

Simon Riggs

Date:

27 November 2017, 07:13:19

On 27 November 2017 at 09:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 27 November 2017 at 06:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Sold, will make it so.
>
>> So you will like this next patch also, since there is related code
>> above that stanza.
>
> Looks reasonable to me, but I wonder whether BLCKSZ is a good initial
> setting.  Since the data length is only uint16, we're apparently assuming
> that the contents must be small.

Our max block size is 2^15 so I think we're good.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services