Thread: valgrind errors

valgrind errors

From
Neil Conway
Date:
Valgrind'ing the postmaster yields a fair number of errors. A lot of
them are similar, such as the following:

==29929== Use of uninitialised value of size 4
==29929==    at 0x80AFB80: XLogInsert (xlog.c:570)
==29929==    by 0x808B0A6: heap_insert (heapam.c:1189)
==29929==    by 0x808B19D: simple_heap_insert (heapam.c:1226)
==29929==    by 0x80C28CC: AddNewAttributeTuples (heap.c:499)
==29929==    by 0x80C2E36: heap_create_with_catalog (heap.c:787)
==29929==    by 0x811F5AD: DefineRelation (tablecmds.c:252)
==29929==    by 0x81DC9BF: ProcessUtility (utility.c:376)
==29929==    by 0x81DB893: PortalRunUtility (pquery.c:780)
==29929==    by 0x81DB9CE: PortalRunMulti (pquery.c:844)
==29929==    by 0x81DB35C: PortalRun (pquery.c:501)
==29929==    by 0x81D75E2: exec_simple_query (postgres.c:935)
==29929==    by 0x81D9F95: PostgresMain (postgres.c:2984)
==29929==
==29929== Syscall param write(buf) contains uninitialised or
unaddressable byte(s)
==29929==    at 0x3C1BAB28: write (in /usr/lib/debug/libc-2.3.2.so)
==29929==    by 0x80B2124: XLogFlush (xlog.c:1416)
==29929==    by 0x80AE348: RecordTransactionCommit (xact.c:549)
==29929==    by 0x80AE82A: CommitTransaction (xact.c:930)
==29929==    by 0x80AED8B: CommitTransactionCommand (xact.c:1242)
==29929==    by 0x81D8934: finish_xact_command (postgres.c:1820)
==29929==    by 0x81D762C: exec_simple_query (postgres.c:967)
==29929==    by 0x81D9F95: PostgresMain (postgres.c:2984)
==29929==    by 0x81A524E: BackendRun (postmaster.c:2662)
==29929==    by 0x81A489E: BackendStartup (postmaster.c:2295)
==29929==    by 0x81A2D0A: ServerLoop (postmaster.c:1165)
==29929==    by 0x81A2773: PostmasterMain (postmaster.c:926)
==29929==  Address 0x3C37BB57 is not stack'd, malloc'd or free'd

(These occur hundreds of times while valgrind'ing "make installcheck".)
The particular call chain that results in the XLogInsert() error is
variable; for example, here's another error report:

==29937== Use of uninitialised value of size 4
==29937==    at 0x80AFA37: XLogInsert (xlog.c:555)
==29937==    by 0x80978F3: _bt_split (nbtinsert.c:907)
==29937==    by 0x80966A1: _bt_insertonpg (nbtinsert.c:504)
==29937==    by 0x8095BB0: _bt_doinsert (nbtinsert.c:141)
==29937==    by 0x809CC78: btinsert (nbtree.c:266)
==29937==    by 0x826200E: OidFunctionCall6 (fmgr.c:1484)
==29937==    by 0x80944FA: index_insert (indexam.c:226)
==29937==    by 0x80C79E6: CatalogIndexInsert (indexing.c:121)
==29937==    by 0x80C2A0B: AddNewAttributeTuples (heap.c:557)
==29937==    by 0x80C2E36: heap_create_with_catalog (heap.c:787)
==29937==    by 0x811F5AD: DefineRelation (tablecmds.c:252)
==29937==    by 0x81DC9BF: ProcessUtility (utility.c:376)

Any thoughts on what could be causing these errors? (I looked into it,
but couldn't see anything that looked like an obvious culprit.)

-Neil




Re: valgrind errors

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Any thoughts on what could be causing these errors?

I suspect valgrind is complaining because XLogInsert is memcpy'ing a
struct that has allocation padding in it.  Which of course is a bogus
complaint ...
        regards, tom lane


Re: valgrind errors

From
Shachar Shemesh
Date:
Tom Lane wrote:

>Neil Conway <neilc@samurai.com> writes:
>  
>
>>Any thoughts on what could be causing these errors?
>>    
>>
>
>I suspect valgrind is complaining because XLogInsert is memcpy'ing a
>struct that has allocation padding in it.  Which of course is a bogus
>complaint ...
>  
>
As far as I remember (couldn't find modern documentation on the matter) 
Valgrind is resitant to this problem. When a block of memory is copied, 
the initialized/uninitialized status is copied along. It only complains 
when an actual operation is performed using uninitialized memory. This 
was developed for the explicit reason of avoiding the problem you describe.
         Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/



Re: valgrind errors

From
Shachar Shemesh
Date:
Shachar Shemesh wrote:

> Tom Lane wrote:
>
>> I suspect valgrind is complaining because XLogInsert is memcpy'ing a
>> struct that has allocation padding in it.  Which of course is a bogus
>> complaint ...
>>  
>>
> As far as I remember (couldn't find modern documentation on the 
> matter) Valgrind is resitant to this problem. When a block of memory 
> is copied, the initialized/uninitialized status is copied along. It 
> only complains when an actual operation is performed using 
> uninitialized memory. This was developed for the explicit reason of 
> avoiding the problem you describe.
>
>          Shachar
>
Found it:
http://developer.kde.org/~sewardj/docs-2.0.0/mc_main.html, section 3.3.2

> It is important to understand that your program can copy around junk 
> (uninitialised) data to its heart's content. Memcheck observes this 
> and keeps track of the data, but does not complain. A complaint is 
> issued only when your program attempts to make use of uninitialised data.


What IS possible, however, is that there is a bug in one of the 
underlying libraries.

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/



Re: valgrind errors

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Valgrind'ing the postmaster yields a fair number of errors. A lot of
> them are similar, such as the following:

> ==29929== Use of uninitialised value of size 4
> ==29929==    at 0x80AFB80: XLogInsert (xlog.c:570)

Oh, I see the issue.  Shachar is correct that valgrind doesn't complain
about copying uninitialized bytes.  But it *does* complain about adding
them into a CRC ... so what we are seeing here is gripes about including
padding bytes into a CRC, or writing them out in the case of the
complaints like this one:

> ==29929== Syscall param write(buf) contains uninitialised or
> unaddressable byte(s)

The original pad bytes may be fairly far removed from the point of the
error ... an example is that I was able to make one XLogInsert complaint
go away by changing palloc to palloc0 at tupdesc.c line 413 (in
TupleDescInitEntry), which is several memcpy's removed from the data
that gets passed to XLogInsert.  valgrind's habit of propagating
undef'ness through copies isn't real helpful here.

BTW, valgrind's report about "size 4" is actively misleading, because
the only part of that struct that TupleDescInitEntry isn't careful to
set explicitly is a one-byte pad between attislocal and attinhcount.
        regards, tom lane


Re: valgrind errors

From
Shachar Shemesh
Date:
Min Xu (Hsu) wrote:

>I am confused by how valgrind define "make use" of data? Isn't
>"copy" data a type of "make use"? I mean, if valgrind checks if the
>data was used as inputs of memcpy(), it is fine. But if user uses
>his own memory_copy(), which loads the data into register,
>as if the data is going to be used in some useful computation,
>and then copy the register value to some other memory location
>to finish the copy (yeah, this IS slow), then valgrind is likely
>to be confused too. It may think the data is "used".
>
>I guess all I am saying is that valgrind _can_ still make
>mistakes about it.
>
>-Min
>  
>
If I understand correctly, a data is defined to be "used" when anything 
other than copying is done on it. Arithmetic operations, branches, etc. 
will trigger the error. If you copy the data by adding and then 
subtracting a constant from it, valgrind will complain. If all you do 
(as in your example) is copy it around, and then copy it some more, it 
will not.

Yes, it does keep "uninitialized" bits over your registers. Brrr.
         Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/



Re: valgrind errors

From
"Min Xu (Hsu)"
Date:
I am also interested in this so I want to make some comments.

On Thu, 22 Apr 2004 Shachar Shemesh wrote :
> Found it:
> http://developer.kde.org/~sewardj/docs-2.0.0/mc_main.html, section 3.3.2
> 
> >It is important to understand that your program can copy around junk 
> >(uninitialised) data to its heart's content. Memcheck observes this 
> >and keeps track of the data, but does not complain. A complaint is 
> >issued only when your program attempts to make use of uninitialised data.

I am confused by how valgrind define "make use" of data? Isn't
"copy" data a type of "make use"? I mean, if valgrind checks if the
data was used as inputs of memcpy(), it is fine. But if user uses
his own memory_copy(), which loads the data into register,
as if the data is going to be used in some useful computation,
and then copy the register value to some other memory location
to finish the copy (yeah, this IS slow), then valgrind is likely
to be confused too. It may think the data is "used".

I guess all I am saying is that valgrind _can_ still make
mistakes about it.

-Min

-- 
We've heard that a million monkeys at a million keyboards could produce
the complete works of Shakespeare; now, thanks to the Internet, we know
that it is not true.                                                     --Robert Wilensky


Re: valgrind errors

From
Tom Lane
Date:
Shachar Shemesh <psql@shemesh.biz> writes:
> Tom Lane wrote:
>> The original pad bytes may be fairly far removed from the point of the
>> error ... an example is that I was able to make one XLogInsert complaint
>> go away by changing palloc to palloc0 at tupdesc.c line 413 (in
>> TupleDescInitEntry), which is several memcpy's removed from the data
>> that gets passed to XLogInsert.

> If I understand this correctly, that was a real bug there, wasn't it?

No, just a complete waste of time.  The "uninitialized" data is just
struct padding, and it matters not what's in there.

To get rid of this class of reports we'd probably have to palloc0 rather
than palloc almost everything, and that strikes me as useless overhead.
It would make more sense to tell valgrind to suppress these particular
events in XLogInsert and XLogFlush.

AFAICS, if we actually had an uninitialized field (rather than
uninitialized padding) it would get detected at the point where the
field is used.  If you run with large enough shared_buffers to avoid
having to discard pages from shmem, I think this would be detected even
across a (nominal) disk write and read.

BTW, there is something in the valgrind manual about adding hints to
teach valgrind about custom alloc/free mechanisms.  Has anyone taught
it about palloc?
        regards, tom lane


Re: valgrind errors

From
Shachar Shemesh
Date:
Tom Lane wrote:

>>==29929== Syscall param write(buf) contains uninitialised or
>>unaddressable byte(s)
>>    
>>
>
>The original pad bytes may be fairly far removed from the point of the
>error ... an example is that I was able to make one XLogInsert complaint
>go away by changing palloc to palloc0 at tupdesc.c line 413 (in
>TupleDescInitEntry), which is several memcpy's removed from the data
>that gets passed to XLogInsert.
>
Anything asking valgrind to give more stack output might help?

>  valgrind's habit of propagating
>undef'ness through copies isn't real helpful here.
>  
>
Well, considering the amount of false-positives you would get if you 
didn't.......

If I understand this correctly, that was a real bug there, wasn't it?

>BTW, valgrind's report about "size 4" is actively misleading, because
>the only part of that struct that TupleDescInitEntry isn't careful to
>set explicitly is a one-byte pad between attislocal and attinhcount.
>  
>
You might want to report that to their bugs list. My browsing the docs 
just now leads me to believe valgrind is, generally, aware that only 
parts of a word can be uninitialized. You can even set it to report it 
at the point where uninitialized and initialized data are merged into a 
single operation.

In fact, that may help with getting the errors closer to the place where 
the actual problem resides. Then again, it may cause it to generate way 
more false positives.

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/