Thread: Re: Valgrind failures caused by multivariate stats patch
On 03/25/2017 10:10 PM, Andres Freund wrote: ... > ==2486== Uninitialised byte(s) found during client check request > ==2486== at 0x1C4857: printtup (printtup.c:347) > ==2486== by 0x401FD5: ExecutePlan (execMain.c:1681) > ==2486== by 0x3FFDED: standard_ExecutorRun (execMain.c:355) > ==2486== by 0x3FFC07: ExecutorRun (execMain.c:298) > ==2486== by 0x5C40BD: PortalRunSelect (pquery.c:928) > ==2486== by 0x5C3D50: PortalRun (pquery.c:769) > ==2486== by 0x5BDC1A: exec_simple_query (postgres.c:1101) > ==2486== by 0x5C203F: PostgresMain (postgres.c:4071) > ==2486== by 0x525138: BackendRun (postmaster.c:4317) > ==2486== by 0x524848: BackendStartup (postmaster.c:3989) > ==2486== by 0x520C4A: ServerLoop (postmaster.c:1729) > ==2486== by 0x5201D9: PostmasterMain (postmaster.c:1337) > ==2486== by 0x45DFA6: main (main.c:228) > ==2486== Address 0x3e8c50f5 is in a rw- anonymous segment > ==2486== Uninitialised value was created by a heap allocation > ==2486== at 0x76212F: palloc (mcxt.c:872) > ==2486== by 0x578814: statext_ndistinct_build (mvdistinct.c:80) > ==2486== by 0x577CCE: BuildRelationExtStatistics (extended_stats.c:94) > ==2486== by 0x3546B6: do_analyze_rel (analyze.c:577) > ==2486== by 0x353B76: analyze_rel (analyze.c:271) > ==2486== by 0x3E7B89: vacuum (vacuum.c:321) > ==2486== by 0x3E7755: ExecVacuum (vacuum.c:122) > ==2486== by 0x5C5F1C: standard_ProcessUtility (utility.c:670) > ==2486== by 0x5C5724: ProcessUtility (utility.c:353) > ==2486== by 0x5C46CA: PortalRunUtility (pquery.c:1174) > ==2486== by 0x5C48D0: PortalRunMulti (pquery.c:1317) > ==2486== by 0x5C3E1A: PortalRun (pquery.c:795) > ==2486== by 0x5BDC1A: exec_simple_query (postgres.c:1101) > ==2486== by 0x5C203F: PostgresMain (postgres.c:4071) > ==2486== by 0x525138: BackendRun (postmaster.c:4317) > ==2486== by 0x524848: BackendStartup (postmaster.c:3989) > ==2486== by 0x520C4A: ServerLoop (postmaster.c:1729) > ==2486== by 0x5201D9: PostmasterMain (postmaster.c:1337) > ==2486== by 0x45DFA6: main (main.c:228) Hmmm, so I have a theory about what's going on, but no matter what I do I can't trigger these valgrind failures. What switches are you using to start valgrind? I'm using this: valgrind --leak-check=no --undef-value-errors=yes \ --expensive-definedness-checks=yes --track-origins=yes \ --read-var-info=yes--gen-suppressions=all \ --suppressions=src/tools/valgrind.supp --time-stamp=yes \ --log-file=$HOME/pg-valgrind/%p.log--trace-children=yes \ postgres --log_line_prefix="%m %p " --log_statement=all \ --shared_buffers=64MB-D /home/user/tmp/data-valgrind 2>&1 FWIW I think the issue is that statext_ndistinct_build does palloc() and then uses offsetof() to copy data into the chunk, which might result in a few bytes skipped due to alignment/padding. But as I can't reproduce the valgrind failure, it's hard to say. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-03-26 20:38:52 +0200, Tomas Vondra wrote: > On 03/25/2017 10:10 PM, Andres Freund wrote: > ... > > ==2486== Uninitialised byte(s) found during client check request > > ==2486== at 0x1C4857: printtup (printtup.c:347) > > ==2486== by 0x401FD5: ExecutePlan (execMain.c:1681) > > ==2486== by 0x3FFDED: standard_ExecutorRun (execMain.c:355) > > ==2486== by 0x3FFC07: ExecutorRun (execMain.c:298) > > ==2486== by 0x5C40BD: PortalRunSelect (pquery.c:928) > > ==2486== by 0x5C3D50: PortalRun (pquery.c:769) > > ==2486== by 0x5BDC1A: exec_simple_query (postgres.c:1101) > > ==2486== by 0x5C203F: PostgresMain (postgres.c:4071) > > ==2486== by 0x525138: BackendRun (postmaster.c:4317) > > ==2486== by 0x524848: BackendStartup (postmaster.c:3989) > > ==2486== by 0x520C4A: ServerLoop (postmaster.c:1729) > > ==2486== by 0x5201D9: PostmasterMain (postmaster.c:1337) > > ==2486== by 0x45DFA6: main (main.c:228) > > ==2486== Address 0x3e8c50f5 is in a rw- anonymous segment > > ==2486== Uninitialised value was created by a heap allocation > > ==2486== at 0x76212F: palloc (mcxt.c:872) > > ==2486== by 0x578814: statext_ndistinct_build (mvdistinct.c:80) > > ==2486== by 0x577CCE: BuildRelationExtStatistics (extended_stats.c:94) > > ==2486== by 0x3546B6: do_analyze_rel (analyze.c:577) > > ==2486== by 0x353B76: analyze_rel (analyze.c:271) > > ==2486== by 0x3E7B89: vacuum (vacuum.c:321) > > ==2486== by 0x3E7755: ExecVacuum (vacuum.c:122) > > ==2486== by 0x5C5F1C: standard_ProcessUtility (utility.c:670) > > ==2486== by 0x5C5724: ProcessUtility (utility.c:353) > > ==2486== by 0x5C46CA: PortalRunUtility (pquery.c:1174) > > ==2486== by 0x5C48D0: PortalRunMulti (pquery.c:1317) > > ==2486== by 0x5C3E1A: PortalRun (pquery.c:795) > > ==2486== by 0x5BDC1A: exec_simple_query (postgres.c:1101) > > ==2486== by 0x5C203F: PostgresMain (postgres.c:4071) > > ==2486== by 0x525138: BackendRun (postmaster.c:4317) > > ==2486== by 0x524848: BackendStartup (postmaster.c:3989) > > ==2486== by 0x520C4A: ServerLoop (postmaster.c:1729) > > ==2486== by 0x5201D9: PostmasterMain (postmaster.c:1337) > > ==2486== by 0x45DFA6: main (main.c:228) > > Hmmm, so I have a theory about what's going on, but no matter what I do I > can't trigger these valgrind failures. What switches are you using to start > valgrind? I'm using this: > > valgrind --leak-check=no --undef-value-errors=yes \ > --expensive-definedness-checks=yes --track-origins=yes \ > --read-var-info=yes --gen-suppressions=all \ > --suppressions=src/tools/valgrind.supp --time-stamp=yes \ > --log-file=$HOME/pg-valgrind/%p.log --trace-children=yes \ > postgres --log_line_prefix="%m %p " --log_statement=all \ > --shared_buffers=64MB -D /home/user/tmp/data-valgrind 2>&1 --quiet \ --error-exitcode=55 \ --suppressions=${PG_SRC_DIR}/src/tools/valgrind.supp \ --trace-children=yes--track-origins=yes --read-var-info=yes \ --num-callers=20 \ --leak-check=no \ --gen-suppressions=all\ the error-exitcode makes it a whole lot easier to see an error, because it triggers a crash-restart cycle at backend exit ;) Note that skink also finds the issue: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2017-03-26%2013%3A00%3A01&stg=check Did you compile postgres with valgrind support (-DUSE_VALGRIND / pg_config_manual.h)? - Andres
On 03/26/2017 08:47 PM, Andres Freund wrote: > On 2017-03-26 20:38:52 +0200, Tomas Vondra wrote: >> ... >> Hmmm, so I have a theory about what's going on, but no matter what I do I >> can't trigger these valgrind failures. What switches are you using to start >> valgrind? I'm using this: >> >> valgrind --leak-check=no --undef-value-errors=yes \ >> --expensive-definedness-checks=yes --track-origins=yes \ >> --read-var-info=yes --gen-suppressions=all \ >> --suppressions=src/tools/valgrind.supp --time-stamp=yes \ >> --log-file=$HOME/pg-valgrind/%p.log --trace-children=yes \ >> postgres --log_line_prefix="%m %p " --log_statement=all \ >> --shared_buffers=64MB -D /home/user/tmp/data-valgrind 2>&1 > > --quiet \ > --error-exitcode=55 \ > --suppressions=${PG_SRC_DIR}/src/tools/valgrind.supp \ > --trace-children=yes --track-origins=yes --read-var-info=yes \ > --num-callers=20 \ > --leak-check=no \ > --gen-suppressions=all \ > > the error-exitcode makes it a whole lot easier to see an error, because > it triggers a crash-restart cycle at backend exit ;) > OK, this did the trick. And just as I suspected, it seems to be due to doing memcpy+offsetof when serializing the statistic into a bytea. The attached patch fixes that for me. Can you test it on your build? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra wrote: > OK, this did the trick. And just as I suspected, it seems to be due to doing > memcpy+offsetof when serializing the statistic into a bytea. The attached > patch fixes that for me. Can you test it on your build? Buildfarm member skink confirms that this is fixed. Thanks! -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services