Thread: valgrind errors
Valgrind'ing the postmaster yields a fair number of errors. A lot of them are similar, such as the following: ==29929== Use of uninitialised value of size 4 ==29929== at 0x80AFB80: XLogInsert (xlog.c:570) ==29929== by 0x808B0A6: heap_insert (heapam.c:1189) ==29929== by 0x808B19D: simple_heap_insert (heapam.c:1226) ==29929== by 0x80C28CC: AddNewAttributeTuples (heap.c:499) ==29929== by 0x80C2E36: heap_create_with_catalog (heap.c:787) ==29929== by 0x811F5AD: DefineRelation (tablecmds.c:252) ==29929== by 0x81DC9BF: ProcessUtility (utility.c:376) ==29929== by 0x81DB893: PortalRunUtility (pquery.c:780) ==29929== by 0x81DB9CE: PortalRunMulti (pquery.c:844) ==29929== by 0x81DB35C: PortalRun (pquery.c:501) ==29929== by 0x81D75E2: exec_simple_query (postgres.c:935) ==29929== by 0x81D9F95: PostgresMain (postgres.c:2984) ==29929== ==29929== Syscall param write(buf) contains uninitialised or unaddressable byte(s) ==29929== at 0x3C1BAB28: write (in /usr/lib/debug/libc-2.3.2.so) ==29929== by 0x80B2124: XLogFlush (xlog.c:1416) ==29929== by 0x80AE348: RecordTransactionCommit (xact.c:549) ==29929== by 0x80AE82A: CommitTransaction (xact.c:930) ==29929== by 0x80AED8B: CommitTransactionCommand (xact.c:1242) ==29929== by 0x81D8934: finish_xact_command (postgres.c:1820) ==29929== by 0x81D762C: exec_simple_query (postgres.c:967) ==29929== by 0x81D9F95: PostgresMain (postgres.c:2984) ==29929== by 0x81A524E: BackendRun (postmaster.c:2662) ==29929== by 0x81A489E: BackendStartup (postmaster.c:2295) ==29929== by 0x81A2D0A: ServerLoop (postmaster.c:1165) ==29929== by 0x81A2773: PostmasterMain (postmaster.c:926) ==29929== Address 0x3C37BB57 is not stack'd, malloc'd or free'd (These occur hundreds of times while valgrind'ing "make installcheck".) The particular call chain that results in the XLogInsert() error is variable; for example, here's another error report: ==29937== Use of uninitialised value of size 4 ==29937== at 0x80AFA37: XLogInsert (xlog.c:555) ==29937== by 0x80978F3: _bt_split (nbtinsert.c:907) ==29937== by 0x80966A1: _bt_insertonpg (nbtinsert.c:504) ==29937== by 0x8095BB0: _bt_doinsert (nbtinsert.c:141) ==29937== by 0x809CC78: btinsert (nbtree.c:266) ==29937== by 0x826200E: OidFunctionCall6 (fmgr.c:1484) ==29937== by 0x80944FA: index_insert (indexam.c:226) ==29937== by 0x80C79E6: CatalogIndexInsert (indexing.c:121) ==29937== by 0x80C2A0B: AddNewAttributeTuples (heap.c:557) ==29937== by 0x80C2E36: heap_create_with_catalog (heap.c:787) ==29937== by 0x811F5AD: DefineRelation (tablecmds.c:252) ==29937== by 0x81DC9BF: ProcessUtility (utility.c:376) Any thoughts on what could be causing these errors? (I looked into it, but couldn't see anything that looked like an obvious culprit.) -Neil
Neil Conway <neilc@samurai.com> writes: > Any thoughts on what could be causing these errors? I suspect valgrind is complaining because XLogInsert is memcpy'ing a struct that has allocation padding in it. Which of course is a bogus complaint ... regards, tom lane
Tom Lane wrote: >Neil Conway <neilc@samurai.com> writes: > > >>Any thoughts on what could be causing these errors? >> >> > >I suspect valgrind is complaining because XLogInsert is memcpy'ing a >struct that has allocation padding in it. Which of course is a bogus >complaint ... > > As far as I remember (couldn't find modern documentation on the matter) Valgrind is resitant to this problem. When a block of memory is copied, the initialized/uninitialized status is copied along. It only complains when an actual operation is performed using uninitialized memory. This was developed for the explicit reason of avoiding the problem you describe. Shachar -- Shachar Shemesh Lingnu Open Source Consulting http://www.lingnu.com/
Shachar Shemesh wrote: > Tom Lane wrote: > >> I suspect valgrind is complaining because XLogInsert is memcpy'ing a >> struct that has allocation padding in it. Which of course is a bogus >> complaint ... >> >> > As far as I remember (couldn't find modern documentation on the > matter) Valgrind is resitant to this problem. When a block of memory > is copied, the initialized/uninitialized status is copied along. It > only complains when an actual operation is performed using > uninitialized memory. This was developed for the explicit reason of > avoiding the problem you describe. > > Shachar > Found it: http://developer.kde.org/~sewardj/docs-2.0.0/mc_main.html, section 3.3.2 > It is important to understand that your program can copy around junk > (uninitialised) data to its heart's content. Memcheck observes this > and keeps track of the data, but does not complain. A complaint is > issued only when your program attempts to make use of uninitialised data. What IS possible, however, is that there is a bug in one of the underlying libraries. -- Shachar Shemesh Lingnu Open Source Consulting http://www.lingnu.com/
Neil Conway <neilc@samurai.com> writes: > Valgrind'ing the postmaster yields a fair number of errors. A lot of > them are similar, such as the following: > ==29929== Use of uninitialised value of size 4 > ==29929== at 0x80AFB80: XLogInsert (xlog.c:570) Oh, I see the issue. Shachar is correct that valgrind doesn't complain about copying uninitialized bytes. But it *does* complain about adding them into a CRC ... so what we are seeing here is gripes about including padding bytes into a CRC, or writing them out in the case of the complaints like this one: > ==29929== Syscall param write(buf) contains uninitialised or > unaddressable byte(s) The original pad bytes may be fairly far removed from the point of the error ... an example is that I was able to make one XLogInsert complaint go away by changing palloc to palloc0 at tupdesc.c line 413 (in TupleDescInitEntry), which is several memcpy's removed from the data that gets passed to XLogInsert. valgrind's habit of propagating undef'ness through copies isn't real helpful here. BTW, valgrind's report about "size 4" is actively misleading, because the only part of that struct that TupleDescInitEntry isn't careful to set explicitly is a one-byte pad between attislocal and attinhcount. regards, tom lane
Min Xu (Hsu) wrote: >I am confused by how valgrind define "make use" of data? Isn't >"copy" data a type of "make use"? I mean, if valgrind checks if the >data was used as inputs of memcpy(), it is fine. But if user uses >his own memory_copy(), which loads the data into register, >as if the data is going to be used in some useful computation, >and then copy the register value to some other memory location >to finish the copy (yeah, this IS slow), then valgrind is likely >to be confused too. It may think the data is "used". > >I guess all I am saying is that valgrind _can_ still make >mistakes about it. > >-Min > > If I understand correctly, a data is defined to be "used" when anything other than copying is done on it. Arithmetic operations, branches, etc. will trigger the error. If you copy the data by adding and then subtracting a constant from it, valgrind will complain. If all you do (as in your example) is copy it around, and then copy it some more, it will not. Yes, it does keep "uninitialized" bits over your registers. Brrr. Shachar -- Shachar Shemesh Lingnu Open Source Consulting http://www.lingnu.com/
I am also interested in this so I want to make some comments. On Thu, 22 Apr 2004 Shachar Shemesh wrote : > Found it: > http://developer.kde.org/~sewardj/docs-2.0.0/mc_main.html, section 3.3.2 > > >It is important to understand that your program can copy around junk > >(uninitialised) data to its heart's content. Memcheck observes this > >and keeps track of the data, but does not complain. A complaint is > >issued only when your program attempts to make use of uninitialised data. I am confused by how valgrind define "make use" of data? Isn't "copy" data a type of "make use"? I mean, if valgrind checks if the data was used as inputs of memcpy(), it is fine. But if user uses his own memory_copy(), which loads the data into register, as if the data is going to be used in some useful computation, and then copy the register value to some other memory location to finish the copy (yeah, this IS slow), then valgrind is likely to be confused too. It may think the data is "used". I guess all I am saying is that valgrind _can_ still make mistakes about it. -Min -- We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that it is not true. --Robert Wilensky
Shachar Shemesh <psql@shemesh.biz> writes: > Tom Lane wrote: >> The original pad bytes may be fairly far removed from the point of the >> error ... an example is that I was able to make one XLogInsert complaint >> go away by changing palloc to palloc0 at tupdesc.c line 413 (in >> TupleDescInitEntry), which is several memcpy's removed from the data >> that gets passed to XLogInsert. > If I understand this correctly, that was a real bug there, wasn't it? No, just a complete waste of time. The "uninitialized" data is just struct padding, and it matters not what's in there. To get rid of this class of reports we'd probably have to palloc0 rather than palloc almost everything, and that strikes me as useless overhead. It would make more sense to tell valgrind to suppress these particular events in XLogInsert and XLogFlush. AFAICS, if we actually had an uninitialized field (rather than uninitialized padding) it would get detected at the point where the field is used. If you run with large enough shared_buffers to avoid having to discard pages from shmem, I think this would be detected even across a (nominal) disk write and read. BTW, there is something in the valgrind manual about adding hints to teach valgrind about custom alloc/free mechanisms. Has anyone taught it about palloc? regards, tom lane
Tom Lane wrote: >>==29929== Syscall param write(buf) contains uninitialised or >>unaddressable byte(s) >> >> > >The original pad bytes may be fairly far removed from the point of the >error ... an example is that I was able to make one XLogInsert complaint >go away by changing palloc to palloc0 at tupdesc.c line 413 (in >TupleDescInitEntry), which is several memcpy's removed from the data >that gets passed to XLogInsert. > Anything asking valgrind to give more stack output might help? > valgrind's habit of propagating >undef'ness through copies isn't real helpful here. > > Well, considering the amount of false-positives you would get if you didn't....... If I understand this correctly, that was a real bug there, wasn't it? >BTW, valgrind's report about "size 4" is actively misleading, because >the only part of that struct that TupleDescInitEntry isn't careful to >set explicitly is a one-byte pad between attislocal and attinhcount. > > You might want to report that to their bugs list. My browsing the docs just now leads me to believe valgrind is, generally, aware that only parts of a word can be uninitialized. You can even set it to report it at the point where uninitialized and initialized data are merged into a single operation. In fact, that may help with getting the errors closer to the place where the actual problem resides. Then again, it may cause it to generate way more false positives. -- Shachar Shemesh Lingnu Open Source Consulting http://www.lingnu.com/