Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait. - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait.
Date
Msg-id CAEepm=3c6anZDXi+XXDiRpY6qYNSDDVQOK_q12qocPA0x1YMLg@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Apr 28, 2016 at 8:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Apr 28, 2016 at 7:02 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Robert Haas wrote:
>>
>>> That looks like malloc() returned NULL.  I noticed when writing that
>>> patch that isolationtester has never had any checks for malloc
>>> returning NULL, which is bad, and probably worth fixing, but I didn't
>>> choose to stop and fix it at that time.
>>
>> I didn't actually check closely but I wondered whether the pointer
>> arithmetic is actually correct.  Note that the memcpy length is zero.
>> I doubt malloc returning null is the problem; how could it happen
>> exactly at the same spot every time the suite has run?
>>
>>> I don't know off-hand why you see that problem starting at this commit
>>> and not before, or why it doesn't show up on other machines.
>>
>> Perhaps it's only a problem for OpenBSD's libc and not for glibc which
>> is the most common.  The only other openbsd machine in buildfarm doesn't
>> run the isolation tests.
>
> Also happens on OpenBSD 5.8.  Isn't this a classic case where memmove
> is called for?  Replacing the memcpy at line 617 with memmove makes
> the tests run successfully, but at first glance the other two
> instances of memcpy in run_permutation should also be changed to
> memmove, no?

That abort at memcpy.c:65 is indeed a check for overlapping regions:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/memcpy.c?rev=1.2&content-type=text/x-cvsweb-markup

That leaves the question of why the backtrace shows arguments that
wouldn't trigger it (for example memcpy exits early for length == 0).
But it appears that those stack arguments are mangled by the time gdb
dumps them, like this:

#include <string.h>
int main(int argc, char *argv[])
{
  char buffer[] = "hello world";
  memcpy(&buffer[0], &buffer[1], 2);
  return 0;
}

#0  0x00000bce27eab90a in kill () at <stdin>:2
#1  0x00000bce27ee5b19 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
#2  0x00000bce27ebcde8 in memcpy (dst0=0xf7ec5, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
#3  0x00000bcb5fc00c7a in main (argc=1, argv=0x7f7fffff6818) at foo.c:6

Suggested patch attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: WIP: Covering + unique indexes.
Next
From: Tom Lane
Date:
Subject: Re: Support for N synchronous standby servers - take 2