Thread: [HACKERS] valgrind errors around dsa.c
Hi, newly added tests exercise parallel bitmap scans. And they trigger valgrind errors: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01 ==4567== VALGRINDERROR-BEGIN ==4567== Conditional jump or move depends on uninitialised value(s) ==4567== at 0x5FD62A: check_for_freed_segments (dsa.c:2219) ==4567== by 0x5FD97E: dsa_get_address (dsa.c:934) ==4567== by 0x5FDA2A: init_span (dsa.c:1339) ==4567== by 0x5FE6D1: ensure_active_superblock (dsa.c:1696) ==4567== by 0x5FEBBD: alloc_object (dsa.c:1452) ==4567== by 0x5FEBBD: dsa_allocate_extended (dsa.c:693) ==4567== by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536) ==4567== by 0x3C7A83: pagetable_create (simplehash.h:342) ==4567== by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323) ==4567== by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246) ==4567== by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432) ==4567== by 0x22510C: btgetbitmap (nbtree.c:460) ==4567== by 0x21A8D1: index_getbitmap (indexam.c:726) ==4567== by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91) ==4567== by 0x37D353: MultiExecProcNode (execProcnode.c:621) ==4567== Uninitialised value was created by a heap allocation ==4567== at 0x602FD5: palloc (mcxt.c:872) ==4567== by 0x5FF73B: create_internal (dsa.c:1242) ==4567== by 0x5FF8F5: dsa_create_in_place (dsa.c:473) ==4567== by 0x37CA32: ExecInitParallelPlan (execParallel.c:532) ==4567== by 0x38C324: ExecGather (nodeGather.c:152) ==4567== by 0x37D247: ExecProcNode (execProcnode.c:551) ==4567== by 0x39870F: ExecNestLoop (nodeNestloop.c:156) ==4567== by 0x37D1B7: ExecProcNode (execProcnode.c:512) ==4567== by 0x3849D4: fetch_input_tuple (nodeAgg.c:686) ==4567== by 0x387764: agg_retrieve_direct (nodeAgg.c:2306) ==4567== by 0x387A11: ExecAgg (nodeAgg.c:2117) ==4567== by 0x37D217: ExecProcNode (execProcnode.c:539) ==4567== It could be that these are spurious due to shared memory - valgrind doesn't track definedness across processes - but the fact that memory allocated by palloc is the source of the undefined memory makes me doubt that. Greetings, Andres Freund
On Sat, Apr 8, 2017 at 4:49 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > newly added tests exercise parallel bitmap scans. And they trigger > valgrind errors: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01 > > > ==4567== VALGRINDERROR-BEGIN > ==4567== Conditional jump or move depends on uninitialised value(s) > ==4567== at 0x5FD62A: check_for_freed_segments (dsa.c:2219) > ==4567== by 0x5FD97E: dsa_get_address (dsa.c:934) > ==4567== by 0x5FDA2A: init_span (dsa.c:1339) > ==4567== by 0x5FE6D1: ensure_active_superblock (dsa.c:1696) > ==4567== by 0x5FEBBD: alloc_object (dsa.c:1452) > ==4567== by 0x5FEBBD: dsa_allocate_extended (dsa.c:693) > ==4567== by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536) > ==4567== by 0x3C7A83: pagetable_create (simplehash.h:342) > ==4567== by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323) > ==4567== by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246) > ==4567== by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432) > ==4567== by 0x22510C: btgetbitmap (nbtree.c:460) > ==4567== by 0x21A8D1: index_getbitmap (indexam.c:726) > ==4567== by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91) > ==4567== by 0x37D353: MultiExecProcNode (execProcnode.c:621) > ==4567== Uninitialised value was created by a heap allocation > ==4567== at 0x602FD5: palloc (mcxt.c:872) > ==4567== by 0x5FF73B: create_internal (dsa.c:1242) > ==4567== by 0x5FF8F5: dsa_create_in_place (dsa.c:473) > ==4567== by 0x37CA32: ExecInitParallelPlan (execParallel.c:532) > ==4567== by 0x38C324: ExecGather (nodeGather.c:152) > ==4567== by 0x37D247: ExecProcNode (execProcnode.c:551) > ==4567== by 0x39870F: ExecNestLoop (nodeNestloop.c:156) > ==4567== by 0x37D1B7: ExecProcNode (execProcnode.c:512) > ==4567== by 0x3849D4: fetch_input_tuple (nodeAgg.c:686) > ==4567== by 0x387764: agg_retrieve_direct (nodeAgg.c:2306) > ==4567== by 0x387A11: ExecAgg (nodeAgg.c:2117) > ==4567== by 0x37D217: ExecProcNode (execProcnode.c:539) > ==4567== > > It could be that these are spurious due to shared memory - valgrind > doesn't track definedness across processes - but the fact that memory > allocated by palloc is the source of the undefined memory makes me doubt > that. Thanks. Will post a fix for this later today. -- Thomas Munro http://www.enterprisedb.com
On Sat, Apr 8, 2017 at 8:57 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Apr 8, 2017 at 4:49 AM, Andres Freund <andres@anarazel.de> wrote: >> Hi, >> >> newly added tests exercise parallel bitmap scans. And they trigger >> valgrind errors: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01 >> >> >> ==4567== VALGRINDERROR-BEGIN >> ==4567== Conditional jump or move depends on uninitialised value(s) >> ==4567== at 0x5FD62A: check_for_freed_segments (dsa.c:2219) >> ==4567== by 0x5FD97E: dsa_get_address (dsa.c:934) >> ==4567== by 0x5FDA2A: init_span (dsa.c:1339) >> ==4567== by 0x5FE6D1: ensure_active_superblock (dsa.c:1696) >> ==4567== by 0x5FEBBD: alloc_object (dsa.c:1452) >> ==4567== by 0x5FEBBD: dsa_allocate_extended (dsa.c:693) >> ==4567== by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536) >> ==4567== by 0x3C7A83: pagetable_create (simplehash.h:342) >> ==4567== by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323) >> ==4567== by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246) >> ==4567== by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432) >> ==4567== by 0x22510C: btgetbitmap (nbtree.c:460) >> ==4567== by 0x21A8D1: index_getbitmap (indexam.c:726) >> ==4567== by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91) >> ==4567== by 0x37D353: MultiExecProcNode (execProcnode.c:621) >> ==4567== Uninitialised value was created by a heap allocation >> ==4567== at 0x602FD5: palloc (mcxt.c:872) >> ==4567== by 0x5FF73B: create_internal (dsa.c:1242) >> ==4567== by 0x5FF8F5: dsa_create_in_place (dsa.c:473) >> ==4567== by 0x37CA32: ExecInitParallelPlan (execParallel.c:532) >> ==4567== by 0x38C324: ExecGather (nodeGather.c:152) >> ==4567== by 0x37D247: ExecProcNode (execProcnode.c:551) >> ==4567== by 0x39870F: ExecNestLoop (nodeNestloop.c:156) >> ==4567== by 0x37D1B7: ExecProcNode (execProcnode.c:512) >> ==4567== by 0x3849D4: fetch_input_tuple (nodeAgg.c:686) >> ==4567== by 0x387764: agg_retrieve_direct (nodeAgg.c:2306) >> ==4567== by 0x387A11: ExecAgg (nodeAgg.c:2117) >> ==4567== by 0x37D217: ExecProcNode (execProcnode.c:539) >> ==4567== >> >> It could be that these are spurious due to shared memory - valgrind >> doesn't track definedness across processes - but the fact that memory >> allocated by palloc is the source of the undefined memory makes me doubt >> that. > > Thanks. Will post a fix for this later today. Fix attached. Explanation: Whenever segments are destroyed because they no longer contain any live blocks, the shared variable control->freed_segment_counter advances. Each attached backend has its own local variable area->freed_segment_counter, and if it sees that the former differs from the latter it checks all attached segments to see if any need to be detached. I failed to initialise the backend-local version, with the consequence that if you were very unlucky your backend could fail to detach from a no-longer needed segment until a another segment was eventually freed causing the shared counter to move again. More likely, it would notice that they are different because one holds uninitialised junk, perform a spurious scan for dead segments, and then get them in sync. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2017-04-08 14:46:04 +1200, Thomas Munro wrote: > Fix attached. Thanks. Pushed! Andres