Re: pgsql: Generational memory allocator - Mailing list pgsql-committers

From Tom Lane
Subject Re: pgsql: Generational memory allocator
Date
Msg-id 16449.1511548313@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql: Generational memory allocator  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: pgsql: Generational memory allocator  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers
Tomas Vondra <tv@fuzzy.cz> writes:
> I agree it would be useful to get the valgrind log from the animal, but
> I guess you'd need to get valgrind working to fix the issue anyway.
> Anyway, the attached patch fixes the issues for me - strictly speaking
> the Assert is not needed, but it doesn't hurt.

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run:

==00:00:00:08.768 15475== Invalid read of size 8
==00:00:00:08.768 15475==    at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780)
==00:00:00:08.768 15475==    by 0x70291E: LogicalDecodingProcessRecord (decode.c:371)
==00:00:00:08.768 15475==    by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308)
==00:00:00:08.768 15475==    by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.768 15475==    by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.768 15475==    by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.768 15475==    by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.768 15475==    by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.768 15475==    by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.768 15475==    by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.768 15475==    by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.768 15475==    by 0x6EFDC9: PostmasterMain (postmaster.c:4412)
==00:00:00:08.768 15475==  Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined
==00:00:00:08.768 15475==    at 0x89A17F: palloc (mcxt.c:871)
==00:00:00:08.768 15475==    by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283)
==00:00:00:08.768 15475==    by 0x520CD3: XLogReadRecord (xlogreader.c:475)
==00:00:00:08.768 15475==    by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293)
==00:00:00:08.769 15475==    by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.769 15475==    by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.769 15475==    by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.769 15475==    by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.769 15475==    by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.769 15475==    by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.769 15475==    by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.769 15475==    by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.769 15475==

Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I'm tempted to push Tomas' patch as-is, just to see if skink sees the
same failure I do.

BTW, I'm pretty certain that there are additional bugs in generation.c's
valgrind support, in code paths that the regression tests likely do not
reach.  I do not think it's setting up valgrind correctly for "large"
chunks, and it looks to me like GenerationCheck would fail because it
tries to access chunk->requested_size which is generally kept NOACCESS.
I'm tempted to rip out all of the logic concerned with flipping
chunk->requested_size between normal and NOACCESS states, as that seems to
me like an exercise much more likely to introduce bugs than to catch any.
        regards, tom lane


pgsql-committers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pgsql: Generational memory allocator
Next
From: Tom Lane
Date:
Subject: pgsql: Fix bug in generation.c's valgrind support.