Re: pgsql: Generational memory allocator - Mailing list pgsql-committers
From | Tom Lane |
---|---|
Subject | Re: pgsql: Generational memory allocator |
Date | |
Msg-id | 16449.1511548313@sss.pgh.pa.us Whole thread Raw |
In response to | Re: pgsql: Generational memory allocator (Tomas Vondra <tv@fuzzy.cz>) |
Responses |
Re: pgsql: Generational memory allocator
|
List | pgsql-committers |
Tomas Vondra <tv@fuzzy.cz> writes: > I agree it would be useful to get the valgrind log from the animal, but > I guess you'd need to get valgrind working to fix the issue anyway. > Anyway, the attached patch fixes the issues for me - strictly speaking > the Assert is not needed, but it doesn't hurt. For me, this patch fixes the valgrind failures inside generation.c itself, but I still see one more in the test_decoding run: ==00:00:00:08.768 15475== Invalid read of size 8 ==00:00:00:08.768 15475== at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780) ==00:00:00:08.768 15475== by 0x70291E: LogicalDecodingProcessRecord (decode.c:371) ==00:00:00:08.768 15475== by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308) ==00:00:00:08.768 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231) ==00:00:00:08.768 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95) ==00:00:00:08.768 15475== by 0x626976: ExecScan (execScan.c:97) ==00:00:00:08.768 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241) ==00:00:00:08.768 15475== by 0x76B07A: PortalRunSelect (pquery.c:932) ==00:00:00:08.768 15475== by 0x76C3E0: PortalRun (pquery.c:773) ==00:00:00:08.768 15475== by 0x7687EC: exec_simple_query (postgres.c:1120) ==00:00:00:08.768 15475== by 0x76A5C7: PostgresMain (postgres.c:4139) ==00:00:00:08.768 15475== by 0x6EFDC9: PostmasterMain (postmaster.c:4412) ==00:00:00:08.768 15475== Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined ==00:00:00:08.768 15475== at 0x89A17F: palloc (mcxt.c:871) ==00:00:00:08.768 15475== by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283) ==00:00:00:08.768 15475== by 0x520CD3: XLogReadRecord (xlogreader.c:475) ==00:00:00:08.768 15475== by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293) ==00:00:00:08.769 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231) ==00:00:00:08.769 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95) ==00:00:00:08.769 15475== by 0x626976: ExecScan (execScan.c:97) ==00:00:00:08.769 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241) ==00:00:00:08.769 15475== by 0x76B07A: PortalRunSelect (pquery.c:932) ==00:00:00:08.769 15475== by 0x76C3E0: PortalRun (pquery.c:773) ==00:00:00:08.769 15475== by 0x7687EC: exec_simple_query (postgres.c:1120) ==00:00:00:08.769 15475== by 0x76A5C7: PostgresMain (postgres.c:4139) ==00:00:00:08.769 15475== Not sure what to make of this: the stack traces make it look unrelated to the GenerationContext changes, but if it's not related, how come skink was passing before that patch went in? I'm tempted to push Tomas' patch as-is, just to see if skink sees the same failure I do. BTW, I'm pretty certain that there are additional bugs in generation.c's valgrind support, in code paths that the regression tests likely do not reach. I do not think it's setting up valgrind correctly for "large" chunks, and it looks to me like GenerationCheck would fail because it tries to access chunk->requested_size which is generally kept NOACCESS. I'm tempted to rip out all of the logic concerned with flipping chunk->requested_size between normal and NOACCESS states, as that seems to me like an exercise much more likely to introduce bugs than to catch any. regards, tom lane
pgsql-committers by date: