Thread: memory-related bugs
A suitably-instrumented run of "make installcheck-world" under valgrind turned up a handful of memory-related bugs: * memcpy()/strncpy() between overlapping regions uniqueentry() and dispell_lexize() both deduplicate an array by iteratively copying elements downward to occlude the duplicates. Before finding a first duplicate, these functions call memcpy() with identical arguments. Similarly, resolve_polymorphic_tupdesc() calls TupleDescInitEntry() with an attributeName pointing directly into the TupleDesc's name bytes, causing the latter to call strncpy() with identical arguments. The attached mem1v1-memcpy-overlap.patch fixes these sites by checking for equal pointers before the affected call. For TupleDescInitEntry(), I considered instead having resolve_polymorphic_tupdesc() pstrdup() the value. * read past the end of a Form_pg_type in examine_attribute() examine_attribute() copies a Form_pg_type naively. Since the nullable columns at the end of the structure are not present in memory, the memcpy() reads eight bytes past the end of the source allocation. mem2v1-analyze-overread.patch updates this code to match how we address the same issue for Form_pg_attribute. * off-by-one error in shift_jis_20042euc_jis_2004() This function grabs two bytes at a time, even when only one byte remains; this makes it read past the end of the input. mem3v1-sjis-offbyone.patch changes it to not do this and to report an error when the input ends in a byte that would start a two-byte sequence. Thanks, nm
Attachment
On Sat, Mar 12, 2011 at 1:32 PM, Noah Misch <noah@leadboat.com> wrote: > A suitably-instrumented run of "make installcheck-world" under valgrind turned > up a handful of memory-related bugs: Nice work. How did you instrument things so valgrind knew about palloc et al? I remember trying this in the past and running into problems. I think the biggest one was that we write out structs to disk including padding so trigger lots of reads of uninitialized data warnings. -- greg
Noah Misch <noah@leadboat.com> writes: > A suitably-instrumented run of "make installcheck-world" under valgrind turned > up a handful of memory-related bugs: Hmm, interesting work, but I don't think I believe in the necessity for this kluge: > + else if (attributeName != &(att->attname)) > + namestrcpy(&(att->attname), attributeName); The rules against overlapping memcpy/strcpy's source and destination are meant to cover the case of partial overlap; I find it hard to imagine an implementation that will mess up when the source and destination are identical. If we did think it was important to avoid this situation I would rather find another way, like modifying the caller. Likewise the other changes to avoid no-op memcpy's do not appear to me to be bugs, though possibly they might save enough cycles to be worth doing anyway. > ! stats->attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type)); > ! memcpy(stats->attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type)); > ... > ! stats->attrtype = (Form_pg_type) palloc(TYPE_FIXED_PART_SIZE); > ! memcpy(stats->attrtype, GETSTRUCT(typtuple), TYPE_FIXED_PART_SIZE); I wonder whether we should instead fix this by copying the correct tuple length. regards, tom lane
On Sat, Mar 12, 2011 at 04:08:23PM +0000, Greg Stark wrote: > On Sat, Mar 12, 2011 at 1:32 PM, Noah Misch <noah@leadboat.com> wrote: > > A suitably-instrumented run of "make installcheck-world" under valgrind turned > > up a handful of memory-related bugs: > > > Nice work. How did you instrument things so valgrind knew about palloc > et al? I remember trying this in the past and running into problems. I peppered aset.c and mcxt.c with various calls to the valgrind hook macros. I believe the set of hooks has grown recently (I use valgrind 3.6.0), so it may be that the right facilities didn't exist at that time. > I > think the biggest one was that we write out structs to disk including > padding so trigger lots of reads of uninitialized data warnings. I used suppressions for call sites that write WAL or pgstat structures. Tuples are, with limited exceptions, fully-initialized, so I did validate those.
On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > A suitably-instrumented run of "make installcheck-world" under valgrind turned > > up a handful of memory-related bugs: > > Hmm, interesting work, but I don't think I believe in the necessity for > this kluge: > > > + else if (attributeName != &(att->attname)) > > + namestrcpy(&(att->attname), attributeName); > > The rules against overlapping memcpy/strcpy's source and destination are > meant to cover the case of partial overlap; I find it hard to imagine an > implementation that will mess up when the source and destination are > identical. If we did think it was important to avoid this situation I > would rather find another way, like modifying the caller. Likewise > the other changes to avoid no-op memcpy's do not appear to me to be > bugs, though possibly they might save enough cycles to be worth doing > anyway. I also find it hard to imagine an implementation that needs these changes to produce correct behavior. Avoiding undefined behavior has intrinsic value, but perhaps we do get greater value from the existing code. > > ! stats->attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type)); > > ! memcpy(stats->attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type)); > > ... > > ! stats->attrtype = (Form_pg_type) palloc(TYPE_FIXED_PART_SIZE); > > ! memcpy(stats->attrtype, GETSTRUCT(typtuple), TYPE_FIXED_PART_SIZE); > > I wonder whether we should instead fix this by copying the correct tuple > length. Seems like a step in the wrong direction. We only use typlen and typbyval beyond the immediate context.
Noah Misch wrote: > A suitably-instrumented run of "make installcheck-world" under valgrind turned > up a handful of memory-related bugs: > > * memcpy()/strncpy() between overlapping regions > uniqueentry() and dispell_lexize() both deduplicate an array by iteratively > copying elements downward to occlude the duplicates. Before finding a first > duplicate, these functions call memcpy() with identical arguments. Similarly, > resolve_polymorphic_tupdesc() calls TupleDescInitEntry() with an attributeName > pointing directly into the TupleDesc's name bytes, causing the latter to call > strncpy() with identical arguments. The attached mem1v1-memcpy-overlap.patch > fixes these sites by checking for equal pointers before the affected call. For > TupleDescInitEntry(), I considered instead having resolve_polymorphic_tupdesc() > pstrdup() the value. > > * read past the end of a Form_pg_type in examine_attribute() > examine_attribute() copies a Form_pg_type naively. Since the nullable columns > at the end of the structure are not present in memory, the memcpy() reads eight > bytes past the end of the source allocation. mem2v1-analyze-overread.patch > updates this code to match how we address the same issue for Form_pg_attribute. > > * off-by-one error in shift_jis_20042euc_jis_2004() > This function grabs two bytes at a time, even when only one byte remains; this > makes it read past the end of the input. mem3v1-sjis-offbyone.patch changes it > to not do this and to report an error when the input ends in a byte that would > start a two-byte sequence. Did we conclude any of these were useful? http://archives.postgresql.org/pgsql-hackers/2011-03/msg00856.php I know there were concerns about some of them in the thread. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > Did we conclude any of these were useful? > http://archives.postgresql.org/pgsql-hackers/2011-03/msg00856.php > I know there were concerns about some of them in the thread. Hmm, I guess this slipped through the cracks. I thought that avoiding memcpy(x, x, n) was unnecessary, and I had doubts about the style of some of the other changes, but I think we do need to avoid accessing past the defined end of a data structure. We've seen cases in the past where one day that structure is right up against the end of memory and you get a SIGSEGV; there's no good reason to believe it cannot happen in these places. regards, tom lane
[ Sorry for letting this slip through the cracks ... I think I got distracted by collation bugs :-( ] Noah Misch <noah@leadboat.com> writes: > On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote: >> Noah Misch <noah@leadboat.com> writes: >>> A suitably-instrumented run of "make installcheck-world" under valgrind turned >>> up a handful of memory-related bugs: >> Hmm, interesting work, but I don't think I believe in the necessity for >> this kluge: >> > + else if (attributeName != &(att->attname)) > + namestrcpy(&(att->attname), attributeName); I'm still of the opinion that there's no real need to avoid memcpy with identical source and destination, so I didn't apply this first patch. >> I wonder whether we should instead fix this by copying the correct tuple >> length. > Seems like a step in the wrong direction. We only use typlen and typbyval > beyond the immediate context. Well, the whole point of exposing the pg_type tuple is to avoid making assumptions about what parts of it the type-specific analyze routine will wish to look at. If anything we ought to move in the direction of allowing the non-fixed fields to be accessible too. I didn't do that, since it would imply an ABI change (to add a HeapTuple field to VacAttrStats) and would therefore not be back-patchable. But I did change the code to use SearchSysCacheCopy, which fixes this bug and is readily extensible if we do decide to add such a field later. I applied your third patch (for SJIS2004 conversion) as-is. Thanks for the report and testing! regards, tom lane
On Tue, Sep 06, 2011 at 03:00:42PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote: > >> I wonder whether we should instead fix this by copying the correct tuple > >> length. > > > Seems like a step in the wrong direction. We only use typlen and typbyval > > beyond the immediate context. > > Well, the whole point of exposing the pg_type tuple is to avoid making > assumptions about what parts of it the type-specific analyze routine > will wish to look at. If anything we ought to move in the direction of > allowing the non-fixed fields to be accessible too. I didn't do that, > since it would imply an ABI change (to add a HeapTuple field to > VacAttrStats) and would therefore not be back-patchable. But I did > change the code to use SearchSysCacheCopy, which fixes this bug and > is readily extensible if we do decide to add such a field later. That is a cleaner approach. Thanks. -- Noah Misch http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 6, 2011 at 12:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ Sorry for letting this slip through the cracks ... I think I got > distracted by collation bugs :-( ] > > Noah Misch <noah@leadboat.com> writes: >> On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote: >>> Noah Misch <noah@leadboat.com> writes: >>>> A suitably-instrumented run of "make installcheck-world" under valgrind turned >>>> up a handful of memory-related bugs: > >>> Hmm, interesting work, but I don't think I believe in the necessity for >>> this kluge: >>> >> + else if (attributeName != &(att->attname)) >> + namestrcpy(&(att->attname), attributeName); > > I'm still of the opinion that there's no real need to avoid memcpy with > identical source and destination, so I didn't apply this first patch. I am leery of memcpy with overlapping regions. I know that it can cause an infinite loop on ssse3 architectures, having to do with some backwards-iteration it does: https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/609290 I have spotted this in the wild in PostgreSQL (which is how I happened to produce this bug report link so readily), yet very rarely; I mailed a more detailed report to the security list. -- fdr
Daniel Farina <daniel@heroku.com> writes: > On Tue, Sep 6, 2011 at 12:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm still of the opinion that there's no real need to avoid memcpy with >> identical source and destination, so I didn't apply this first patch. > I am leery of memcpy with overlapping regions. I know that it can > cause an infinite loop on ssse3 architectures, having to do with some > backwards-iteration it does: > https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/609290 The linked page offers no support for your claim of an infinite loop, and in any case it's discussing a case involving *overlapping* regions, not *identical* regions. The reason why this concern is irrelevant for identical regions is that no matter what order the memcpy routine copies the bytes in, it's necessarily storing the exact same data into each byte that was there before. The only way that strange results could be produced is if the routine transiently set some byte to a value other than its final value, which would mean that it must be intending to store to that location more than once, which would be silly and inefficient. So I'm not going to believe that there's a problem here without a lot more rigorous evidence than anyone has offered. regards, tom lane
On Thu, Sep 8, 2011 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> On Tue, Sep 6, 2011 at 12:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm still of the opinion that there's no real need to avoid memcpy with >>> identical source and destination, so I didn't apply this first patch. > >> I am leery of memcpy with overlapping regions. I know that it can >> cause an infinite loop on ssse3 architectures, having to do with some >> backwards-iteration it does: >> https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/609290 > > The linked page offers no support for your claim of an infinite loop, > and in any case it's discussing a case involving *overlapping* regions, > not *identical* regions. What do you mean? As per my original bug report, or in this case (possibly both; I should have dumped the registers, which I'll do if I see it again...)? I'm able to believe that things are fine with all known memcpys today in this case. > The reason why this concern is irrelevant for identical regions is that > no matter what order the memcpy routine copies the bytes in, it's > necessarily storing the exact same data into each byte that was there > before. The only way that strange results could be produced is if the > routine transiently set some byte to a value other than its final value, > which would mean that it must be intending to store to that location > more than once, which would be silly and inefficient. This is a good point, I do understand there is a distinction between the degenerate-case memcpy-to-identical region and the overlapping-case. In my naivety, I'd ask what the costs are of pedantic adherence to this common guideline and, in event someone somewhere does something that is not expected (or, has a slow-but-not-technically-buggy memcpy) are broken, how likely the failure will be easy to pick out. But I don't seriously expect an answer, because I don't think this code path has been a problem for so many years and measuring those things are pretty hard. -- fdr