Thread: [HACKERS] valgrind error in subscription code

[HACKERS] valgrind error in subscription code

From
Andres Freund
Date:
Hi,

I enabled skink / the valgrind animal to run the tap tests too (hugely
increasing the test time :(), and that paid of:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13

==606== VALGRINDERROR-BEGIN
==606== Conditional jump or move depends on uninitialised value(s)
==606==    at 0x46A207: logicalrep_rel_open (relation.c:361)
==606==    by 0x472D12: copy_table (tablesync.c:669)
==606==    by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606==    by 0x475287: ApplyWorkerMain (worker.c:1530)
==606==    by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606==    by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606==    by 0x44E59F: maybe_start_bgworker (postmaster.c:5761)
==606==    by 0x44F11D: sigusr1_handler (postmaster.c:5015)
==606==    by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==606==    by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==606==    by 0x44F868: ServerLoop (postmaster.c:1693)
==606==    by 0x450C53: PostmasterMain (postmaster.c:1337)
==606==  Uninitialised value was created by a heap allocation
==606==    at 0x605382: MemoryContextAlloc (mcxt.c:729)
==606==    by 0x5E4E6A: DynaHashAlloc (dynahash.c:266)
==606==    by 0x5E4EEE: element_alloc (dynahash.c:1637)
==606==    by 0x5E5018: get_hash_entry (dynahash.c:1248)
==606==    by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033)
==606==    by 0x5E5A0D: hash_search (dynahash.c:890)
==606==    by 0x469D38: logicalrep_relmap_update (relation.c:179)
==606==    by 0x472D05: copy_table (tablesync.c:666)
==606==    by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606==    by 0x475287: ApplyWorkerMain (worker.c:1530)
==606==    by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606==    by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606== 
==606== VALGRINDERROR-END

there's also another bleat:
==2161== VALGRINDERROR-BEGIN
==2161== Syscall param write(buf) points to uninitialised byte(s)
==2161==    at 0x4E42190: __write_nocancel (syscall-template.S:84)
==2161==    by 0x4676C7: CheckPointReplicationOrigin (origin.c:574)
==2161==    by 0x25DB99: CheckPointGuts (xlog.c:9007)
==2161==    by 0x2647CD: CreateCheckPoint (xlog.c:8792)
==2161==    by 0x26734F: StartupXLOG (xlog.c:7576)
==2161==    by 0x4510FA: StartupProcessMain (startup.c:217)
==2161==    by 0x273832: AuxiliaryProcessMain (bootstrap.c:426)
==2161==    by 0x44D8A0: StartChildProcess (postmaster.c:5256)
==2161==    by 0x44DF33: PostmasterStateMachine (postmaster.c:3791)
==2161==    by 0x44EE25: reaper (postmaster.c:3038)
==2161==    by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==2161==    by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==2161==  Location 0xffeffda42 is 2 bytes inside local var "disk_state"
==2161==  declared at origin.c:555, in frame #1 of thread 1
==2161==  Uninitialised value was created by a stack allocation
==2161==    at 0x467421: CheckPointReplicationOrigin (origin.c:505)
==2161== 
==2161== VALGRINDERROR-END
==2161== VALGRINDERROR-BEGIN
==2161== Syscall param write(buf) points to uninitialised byte(s)
==2161==    at 0x4E42190: __write_nocancel (syscall-template.S:84)
==2161==    by 0x467712: CheckPointReplicationOrigin (origin.c:591)
==2161==    by 0x25DB99: CheckPointGuts (xlog.c:9007)
==2161==    by 0x2647CD: CreateCheckPoint (xlog.c:8792)
==2161==    by 0x26734F: StartupXLOG (xlog.c:7576)
==2161==    by 0x4510FA: StartupProcessMain (startup.c:217)
==2161==    by 0x273832: AuxiliaryProcessMain (bootstrap.c:426)
==2161==    by 0x44D8A0: StartChildProcess (postmaster.c:5256)
==2161==    by 0x44DF33: PostmasterStateMachine (postmaster.c:3791)
==2161==    by 0x44EE25: reaper (postmaster.c:3038)
==2161==    by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==2161==    by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==2161==  Location 0xffeffda58 is 0 bytes inside local var "crc"
==2161==  declared at origin.c:511, in frame #1 of thread 1
==2161==  Uninitialised value was created by a stack allocation
==2161==    at 0x467421: CheckPointReplicationOrigin (origin.c:505)
==2161== 
==2161== VALGRINDERROR-END

but that looks more like something that needs to be silenced, based on
the 3s look I gave it.


Greetings,

Andres Freund



Re: [HACKERS] valgrind error in subscription code

From
Petr Jelinek
Date:
On 22/04/17 20:31, Andres Freund wrote:
> Hi,
> 
> I enabled skink / the valgrind animal to run the tap tests too (hugely
> increasing the test time :(), and that paid of:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13
> 
> ==606== VALGRINDERROR-BEGIN
> ==606== Conditional jump or move depends on uninitialised value(s)
> ==606==    at 0x46A207: logicalrep_rel_open (relation.c:361)
> ==606==    by 0x472D12: copy_table (tablesync.c:669)
> ==606==    by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
> ==606==    by 0x475287: ApplyWorkerMain (worker.c:1530)
> ==606==    by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
> ==606==    by 0x44E48A: do_start_bgworker (postmaster.c:5577)
> ==606==    by 0x44E59F: maybe_start_bgworker (postmaster.c:5761)
> ==606==    by 0x44F11D: sigusr1_handler (postmaster.c:5015)
> ==606==    by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
> ==606==    by 0x6FB8212: __select_nocancel (syscall-template.S:84)
> ==606==    by 0x44F868: ServerLoop (postmaster.c:1693)
> ==606==    by 0x450C53: PostmasterMain (postmaster.c:1337)
> ==606==  Uninitialised value was created by a heap allocation
> ==606==    at 0x605382: MemoryContextAlloc (mcxt.c:729)
> ==606==    by 0x5E4E6A: DynaHashAlloc (dynahash.c:266)
> ==606==    by 0x5E4EEE: element_alloc (dynahash.c:1637)
> ==606==    by 0x5E5018: get_hash_entry (dynahash.c:1248)
> ==606==    by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033)
> ==606==    by 0x5E5A0D: hash_search (dynahash.c:890)
> ==606==    by 0x469D38: logicalrep_relmap_update (relation.c:179)
> ==606==    by 0x472D05: copy_table (tablesync.c:666)
> ==606==    by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
> ==606==    by 0x475287: ApplyWorkerMain (worker.c:1530)
> ==606==    by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
> ==606==    by 0x44E48A: do_start_bgworker (postmaster.c:5577)
> ==606== 
> ==606== VALGRINDERROR-END

Thanks, here is patch to fix that - I also removed the individual
settings of everything to NULL/0/InvalidOid etc and just replaced it all
with memset.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] valgrind error in subscription code

From
Andres Freund
Date:
Hi,

On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote:
> Thanks, here is patch to fix that - I also removed the individual
> settings of everything to NULL/0/InvalidOid etc and just replaced it all
> with memset.

Cool.

> diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
> index 875a081..5bc54dd 100644
> --- a/src/backend/replication/logical/relation.c
> +++ b/src/backend/replication/logical/relation.c
> @@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
>          pfree(remoterel->attnames);
>          pfree(remoterel->atttyps);
>      }
> -    remoterel->attnames = NULL;
> -    remoterel->atttyps = NULL;
> -
>      bms_free(remoterel->attkeys);
> -    remoterel->attkeys = NULL;
>  
>      if (entry->attrmap)
>          pfree(entry->attrmap);
> -

Btw, I think it's a good pattern to zero things like attrmap after
freeing.  Based on a minute of looking it's unclear to me if
logicalrep_relmap_update() could be called again, if e.g. one of the
pallocs after the logicalrep_relmap_free_entry() errors out.  I think
you essentially addressed that with the memset, so that's good.

- Andres



Re: [HACKERS] valgrind error in subscription code

From
Petr Jelinek
Date:
On 22/04/17 21:16, Andres Freund wrote:
> Hi,
> 
> On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote:
>> Thanks, here is patch to fix that - I also removed the individual
>> settings of everything to NULL/0/InvalidOid etc and just replaced it all
>> with memset.
> 
> Cool.
> 
>> diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
>> index 875a081..5bc54dd 100644
>> --- a/src/backend/replication/logical/relation.c
>> +++ b/src/backend/replication/logical/relation.c
>> @@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
>>          pfree(remoterel->attnames);
>>          pfree(remoterel->atttyps);
>>      }
>> -    remoterel->attnames = NULL;
>> -    remoterel->atttyps = NULL;
>> -
>>      bms_free(remoterel->attkeys);
>> -    remoterel->attkeys = NULL;
>>  
>>      if (entry->attrmap)
>>          pfree(entry->attrmap);
>> -
> 
> Btw, I think it's a good pattern to zero things like attrmap after
> freeing.  Based on a minute of looking it's unclear to me if
> logicalrep_relmap_update() could be called again, if e.g. one of the
> pallocs after the logicalrep_relmap_free_entry() errors out.  I think
> you essentially addressed that with the memset, so that's good.
> 

Yes, memset is called immediately after logicalrep_relmap_free_entry()
which is why I found it redundant to set things to NULL there. We could
put the memset directly inside logicalrep_relmap_free_entry() and the
call either logicalrep_relmap_free_entry() or memset in the caller if
that would make you feel better about it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] valgrind error in subscription code

From
Andres Freund
Date:
On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote:
> On 22/04/17 20:31, Andres Freund wrote:
> > Hi,
> > 
> > I enabled skink / the valgrind animal to run the tap tests too (hugely
> > increasing the test time :(), and that paid of:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13
> > 
> > ==606== VALGRINDERROR-BEGIN
> > ==606== Conditional jump or move depends on uninitialised value(s)
> > ==606==    at 0x46A207: logicalrep_rel_open (relation.c:361)
> > ==606==    by 0x472D12: copy_table (tablesync.c:669)
> > ==606==    by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
> > ==606==    by 0x475287: ApplyWorkerMain (worker.c:1530)
> > ==606==    by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
> > ==606==    by 0x44E48A: do_start_bgworker (postmaster.c:5577)
> > ==606==    by 0x44E59F: maybe_start_bgworker (postmaster.c:5761)
> > ==606==    by 0x44F11D: sigusr1_handler (postmaster.c:5015)
> > ==606==    by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
> > ==606==    by 0x6FB8212: __select_nocancel (syscall-template.S:84)
> > ==606==    by 0x44F868: ServerLoop (postmaster.c:1693)
> > ==606==    by 0x450C53: PostmasterMain (postmaster.c:1337)
> > ==606==  Uninitialised value was created by a heap allocation
> > ==606==    at 0x605382: MemoryContextAlloc (mcxt.c:729)
> > ==606==    by 0x5E4E6A: DynaHashAlloc (dynahash.c:266)
> > ==606==    by 0x5E4EEE: element_alloc (dynahash.c:1637)
> > ==606==    by 0x5E5018: get_hash_entry (dynahash.c:1248)
> > ==606==    by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033)
> > ==606==    by 0x5E5A0D: hash_search (dynahash.c:890)
> > ==606==    by 0x469D38: logicalrep_relmap_update (relation.c:179)
> > ==606==    by 0x472D05: copy_table (tablesync.c:666)
> > ==606==    by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
> > ==606==    by 0x475287: ApplyWorkerMain (worker.c:1530)
> > ==606==    by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
> > ==606==    by 0x44E48A: do_start_bgworker (postmaster.c:5577)
> > ==606== 
> > ==606== VALGRINDERROR-END
> 
> Thanks, here is patch to fix that - I also removed the individual
> settings of everything to NULL/0/InvalidOid etc and just replaced it all
> with memset.

I've pushed this, and a fix for the (afaik harmless) replication origins
bleat.  Let's see what skink says - I wasn't patient enough to locally
verify that your fix indeed fixes things.

- Andres