Thread: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Michael Paquier
Date:
Hi,

This morning while running make check-world on my OSX Mavericks laptop, I found the following failure:
test pgtypeslib/dt_test2      ... stderr FAILED (test process was terminated by signal 6: Abort trap)
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff8052c866 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8052c866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff83cb035c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff81899bba libsystem_c.dylib`__abort + 145
    frame #3: 0x00007fff8189a46d libsystem_c.dylib`__stack_chk_fail + 196
    frame #4: 0x000000010f7cb3bb libpgtypes.3.dylib`PGTYPESdate_from_asc(str=0x000000010f6a2d6c, endptr=0x00007fff5055e488) + 635 at datetime.c:104
    frame #5: 0x000000010f6a260f dt_test2`main + 255 at dt_test2.pgc:91
    frame #6: 0x00007fff87acc5fd libdyld.dylib`start + 1
    frame #7: 0x00007fff87acc5fd libdyld.dylib`start + 1
Bisecting is showing me that this failure has been introduced by 4318dae, and is reproducible on all the active branches, down to REL9_0_STABLE.

Note that this problem has been introduced after discussing a separate issue here:
http://www.postgresql.org/message-id/1399399313.27807.28.camel@sussancws0025
Regards,
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
> This morning while running make check-world on my OSX Mavericks laptop, I
> found the following failure:

[ scratches head... ]  Doesn't reproduce on my OSX Mavericks laptop,
either with or without --disable-integer-datetimes.  What compiler
are you using exactly?  Any special build options?
        regards, tom lane



Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Michael Paquier
Date:


On Mon, Oct 6, 2014 at 12:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
> This morning while running make check-world on my OSX Mavericks laptop, I
> found the following failure:

[ scratches head... ]  Doesn't reproduce on my OSX Mavericks laptop,
either with or without --disable-integer-datetimes.
What compiler are you using exactly?
clang from developer tools 6.0 of September 2014, even if configure points to "gcc" in /usr/bin/:
$ which gcc
/usr/bin/gcc
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
$ clang --version
Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
 
Any special build options?
Nothing really fancy:
$ ./configure --enable-depend --enable-debug --disable-rpath --enable-cassert --prefix=/to/path/bin/pgsql --with-libxml CFLAGS=                                                  
I am attaching config.log in case. Btw that's 10.9.5, and I have been able to reproduce it on a second machine running 10.9.5 as well.
Regards,
--
Michael
Attachment
Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Oct 6, 2014 at 12:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> [ scratches head... ]  Doesn't reproduce on my OSX Mavericks laptop,
>> either with or without --disable-integer-datetimes.
>> What compiler are you using exactly?

> clang from developer tools 6.0 of September 2014, even if configure points
> to "gcc" in /usr/bin/:
> $ which gcc
> /usr/bin/gcc
> $ gcc --version
> Configured with: --prefix=/Library/Developer/CommandLineTools/usr
> --with-gxx-include-dir=/usr/include/c++/4.2.1
> Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin13.4.0
> Thread model: posix

Exact same here, so that's not it.  (I think ... my Xcode says it's 6.0.1,
but the compiler --version report is just the same as you show.)

>> Any special build options?

> Nothing really fancy:
> $ ./configure --enable-depend --enable-debug --disable-rpath
> --enable-cassert --prefix=/to/path/bin/pgsql --with-libxml

That looks about like mine too, though I'm not using --disable-rpath
... what's the reason for that?

> I am attaching config.log in case. Btw that's 10.9.5, and I have been able
> to reproduce it on a second machine running 10.9.5 as well.

10.9.5 here as well.  We're running out of explanations ...
        regards, tom lane



Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Michael Paquier
Date:


On Mon, Oct 6, 2014 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
> Nothing really fancy:
> $ ./configure --enable-depend --enable-debug --disable-rpath
> --enable-cassert --prefix=/to/path/bin/pgsql --with-libxml

That looks about like mine too, though I'm not using --disable-rpath
... what's the reason for that?
No real reason. That was only some old remnant in a build script that was here for ages :)
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Oct 6, 2014 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That looks about like mine too, though I'm not using --disable-rpath
>> ... what's the reason for that?

> No real reason. That was only some old remnant in a build script that was
> here for ages :)

Hm.  Grasping at straws here ... what's your locale enviroment?
        regards, tom lane



Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Mon, Oct 6, 2014 at 10:45 PM, Tom Lane
<spandir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><spanclass="">Michael Paquier <<a
href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>>writes:<br /> > On Mon, Oct 6, 2014 at 1:15
PM,Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /></span><span class="">>>
Thatlooks about like mine too, though I'm not using --disable-rpath<br /> >> ... what's the reason for that?<br
/><br/> > No real reason. That was only some old remnant in a build script that was<br /> > here for ages :)<br
/><br/></span>Hm.  Grasping at straws here ... what's your locale enviroment?<br /></blockquote></div><br />The system
localeshave nothing really special...<br />$ locale<br />LANG=<br />LC_COLLATE="C"<br />LC_CTYPE="UTF-8"<br
/>LC_MESSAGES="C"<br/>LC_MONETARY="C"<br />LC_NUMERIC="C"<br />LC_TIME="C"<br />LC_ALL=<br clear="all" /></div><div
class="gmail_extra">Butnow that you mention it I have as well that:<br />$ defaults read -g AppleLocale<br />en_JP<br
/>--<br />Michael<br /></div></div> 

Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, Oct 7, 2014 at 8:14 AM, Michael
Paquier<span dir="ltr"><<a href="mailto:michael.paquier@gmail.com"
target="_blank">michael.paquier@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div dir="ltr">The system locales have nothing really special...<br
/><divclass="gmail_extra">$ locale<br />LANG=<br />LC_COLLATE="C"<br />LC_CTYPE="UTF-8"<br />LC_MESSAGES="C"<br
/>LC_MONETARY="C"<br/>LC_NUMERIC="C"<br />LC_TIME="C"<br />LC_ALL=<br clear="all" /></div><div class="gmail_extra">But
nowthat you mention it I have as well that:<br />$ defaults read -g AppleLocale<br />en_JP<span><font
color="#888888"><br/></font></span></div></div></blockquote></div>Hm... I have tried changing the system locales (to
en_USfor example) and time format but I can still trigger the issue all the time. I'll try to have a closer look.. It
lookslike this test does not like some settings at the OS level.<br />-- <br />Michael<br /></div></div> 
Michael Paquier <michael.paquier@gmail.com> writes:
> Hm... I have tried changing the system locales (to en_US for example) and
> time format but I can still trigger the issue all the time. I'll try to
> have a closer look.. It looks like this test does not like some settings at
> the OS level.

I eventually realized that the critical difference was you'd added
"CFLAGS=" to the configure call.  On this platform that has the net
effect of removing -O2 from the compiler flags, and apparently that
shifts around the stack layout enough to expose the clobber.

The fix is simple enough: ecpg's version of ParseDateTime is failing
to check for overrun of the field[] array until *after* it's already
clobbered the stack:

*** a/src/interfaces/ecpg/pgtypeslib/dt_common.c
--- b/src/interfaces/ecpg/pgtypeslib/dt_common.c
*************** ParseDateTime(char *timestr, char *lowst
*** 1695,1703 ****   while (*(*endstr) != '\0')   {       /* Record start of current field */
-       field[nf] = lp;       if (nf >= MAXDATEFIELDS)           return -1;        /* leading digit? then date or time
*/      if (isdigit((unsigned char) *(*endstr)))
 
--- 1695,1703 ----   while (*(*endstr) != '\0')   {       /* Record start of current field */       if (nf >=
MAXDATEFIELDS)          return -1;
 
+       field[nf] = lp;        /* leading digit? then date or time */       if (isdigit((unsigned char) *(*endstr)))

Kind of astonishing that nobody else has reported this, given that
there's been a regression test specifically meant to catch such a
problem since 4318dae.  The stack layout in PGTYPESdate_from_asc
must happen to avoid the issue on practically all platforms.
        regards, tom lane



Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Michael Paquier
Date:
On Tue, Oct 7, 2014 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
> Hm... I have tried changing the system locales (to en_US for example) and
> time format but I can still trigger the issue all the time. I'll try to
> have a closer look.. It looks like this test does not like some settings at
> the OS level.

I eventually realized that the critical difference was you'd added
"CFLAGS=" to the configure call.  On this platform that has the net
effect of removing -O2 from the compiler flags, and apparently that
shifts around the stack layout enough to expose the clobber.

At least my scripts are weird enough to trigger such behaviors. The funny part is that it's really a coincidence, CFLAGS was being set with an empty variable, variable removed in this script some time ago.

The fix is simple enough: ecpg's version of ParseDateTime is failing
to check for overrun of the field[] array until *after* it's already
clobbered the stack:
Kind of astonishing that nobody else has reported this, given that
there's been a regression test specifically meant to catch such a
problem since 4318dae.  The stack layout in PGTYPESdate_from_asc
must happen to avoid the issue on practically all platforms.
Yes, thanks. That's it. At least I am not going crazy.
Regards,
--
Michael

Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

From
Noah Misch
Date:
On Mon, Oct 06, 2014 at 08:57:54PM -0400, Tom Lane wrote:
> I eventually realized that the critical difference was you'd added
> "CFLAGS=" to the configure call.  On this platform that has the net
> effect of removing -O2 from the compiler flags, and apparently that
> shifts around the stack layout enough to expose the clobber.
> 
> The fix is simple enough: ecpg's version of ParseDateTime is failing
> to check for overrun of the field[] array until *after* it's already
> clobbered the stack:

Thanks for tracking that down.  Oops.