Home > mailing lists

Re: pgsql: Generational memory allocator - Mailing list pgsql-committers

From	Tom Lane
Subject	Re: pgsql: Generational memory allocator
Date	November 25, 2017 00:31:53
Msg-id	16449.1511548313@sss.pgh.pa.us Whole thread Raw
In response to	Re: pgsql: Generational memory allocator (Tomas Vondra <tv@fuzzy.cz>)
Responses	Re: pgsql: Generational memory allocator
List	pgsql-committers

Tree view

Tomas Vondra <tv@fuzzy.cz> writes:
> I agree it would be useful to get the valgrind log from the animal, but
> I guess you'd need to get valgrind working to fix the issue anyway.
> Anyway, the attached patch fixes the issues for me - strictly speaking
> the Assert is not needed, but it doesn't hurt.

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run:

==00:00:00:08.768 15475== Invalid read of size 8
==00:00:00:08.768 15475==    at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780)
==00:00:00:08.768 15475==    by 0x70291E: LogicalDecodingProcessRecord (decode.c:371)
==00:00:00:08.768 15475==    by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308)
==00:00:00:08.768 15475==    by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.768 15475==    by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.768 15475==    by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.768 15475==    by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.768 15475==    by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.768 15475==    by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.768 15475==    by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.768 15475==    by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.768 15475==    by 0x6EFDC9: PostmasterMain (postmaster.c:4412)
==00:00:00:08.768 15475==  Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined
==00:00:00:08.768 15475==    at 0x89A17F: palloc (mcxt.c:871)
==00:00:00:08.768 15475==    by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283)
==00:00:00:08.768 15475==    by 0x520CD3: XLogReadRecord (xlogreader.c:475)
==00:00:00:08.768 15475==    by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293)
==00:00:00:08.769 15475==    by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.769 15475==    by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.769 15475==    by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.769 15475==    by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.769 15475==    by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.769 15475==    by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.769 15475==    by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.769 15475==    by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.769 15475==

Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I'm tempted to push Tomas' patch as-is, just to see if skink sees the
same failure I do.

BTW, I'm pretty certain that there are additional bugs in generation.c's
valgrind support, in code paths that the regression tests likely do not
reach.  I do not think it's setting up valgrind correctly for "large"
chunks, and it looks to me like GenerationCheck would fail because it
tries to access chunk->requested_size which is generally kept NOACCESS.
I'm tempted to rip out all of the logic concerned with flipping
chunk->requested_size between normal and NOACCESS states, as that seems to
me like an exercise much more likely to introduce bugs than to catch any.
        regards, tom lane

pgsql-committers by date:

From: Andrew Dunstan
Date: 24 November 2017, 21:35:43
Subject: Re: pgsql: Generational memory allocator

From: Tom Lane
Date: 25 November 2017, 00:43:40
Subject: pgsql: Fix bug in generation.c's valgrind support.

Re: pgsql: Generational memory allocator - Mailing list pgsql-committers

Previous

Next