Thread: [HACKERS] Buildfarm failures on woodlouse (in ecpg-check)

[HACKERS] Buildfarm failures on woodlouse (in ecpg-check)

From
Christian Ullrich
Date:
Hello,

my buildfarm animal woodlouse (Visual Studio 2013 on Windows 7) stopped 
working correctly some months ago. After Tom kindly prodded me into 
fixing it, I noticed that I had configured it to skip the ecpg-check 
step because one of the tests in the "thread" section (not always the 
same) nearly always failed.

I had set it up in circa 2015, so I reenabled the step to see whether 
anything had changed since, and it started failing again.

Through some trial and error, and with the help of Microsoft's 
Application Verifier tool, I found what I think is the cause: It looks 
like the VS 2013 CRT's setlocale() function is not entirely thread-safe; 
the heap debugging options make it crash when manipulating per-thread 
locale state, and according to the comments surrounding that spot in the 
CRT source, the developers were aware the code is fragile.

I crudely hacked a critical section around the setlocale() call in 
pgwin32_setlocale(). After this change, the test does not crash, while 
without it, it crashes every time.

If there is interest in fixing this issue that is apparently limited to 
VS 2013, I will try and produce a proper patch. I notice, however, that 
there is a pthread compatibility layer available that I have no 
experience with at all, and I assume using it is preferred over straight 
Win32?

My hack is attached; please feel free to tell me I'm on the wrong track.
To build correctly, it requires defining _WIN32_WINNT to be 0x600 or 
above (and using an SDK that knows about InitOnceExecuteOnce()).

-- 
Christian

Re: [HACKERS] Buildfarm failures on woodlouse (in ecpg-check)

From
Andrew Dunstan
Date:

On 06/11/2017 11:33 AM, Christian Ullrich wrote:
> Hello,
>
> my buildfarm animal woodlouse (Visual Studio 2013 on Windows 7)
> stopped working correctly some months ago. After Tom kindly prodded me
> into fixing it, I noticed that I had configured it to skip the
> ecpg-check step because one of the tests in the "thread" section (not
> always the same) nearly always failed.
>
> I had set it up in circa 2015, so I reenabled the step to see whether
> anything had changed since, and it started failing again.
>
> Through some trial and error, and with the help of Microsoft's
> Application Verifier tool, I found what I think is the cause: It looks
> like the VS 2013 CRT's setlocale() function is not entirely
> thread-safe; the heap debugging options make it crash when
> manipulating per-thread locale state, and according to the comments
> surrounding that spot in the CRT source, the developers were aware the
> code is fragile.
>
> I crudely hacked a critical section around the setlocale() call in
> pgwin32_setlocale(). After this change, the test does not crash, while
> without it, it crashes every time.
>
> If there is interest in fixing this issue that is apparently limited
> to VS 2013, I will try and produce a proper patch. I notice, however,
> that there is a pthread compatibility layer available that I have no
> experience with at all, and I assume using it is preferred over
> straight Win32?
>
> My hack is attached; please feel free to tell me I'm on the wrong track.
> To build correctly, it requires defining _WIN32_WINNT to be 0x600 or
> above (and using an SDK that knows about InitOnceExecuteOnce()).
>
>

It's certainly worth doing.

I turned off testing ecpg ages ago on bowerbird because I was getting
errors. That's an older version of the toolset.

We already define _WIN32_WINNT to be 0x0600 on all appropriate platforms
(Vista/2008 and above), so I think you could probably just check for
that value.

I have no opinion on the pthread question.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Buildfarm failures on woodlouse (in ecpg-check)

From
Christian Ullrich
Date:
* Andrew Dunstan wrote:

> On 06/11/2017 11:33 AM, Christian Ullrich wrote:

>> To build correctly, it requires defining _WIN32_WINNT to be 0x600 or
>> above (and using an SDK that knows about InitOnceExecuteOnce()).

> We already define _WIN32_WINNT to be 0x0600 on all appropriate platforms
> (Vista/2008 and above), so I think you could probably just check for
> that value.

Not quite; the definition depends on the build toolset, not the build 
platform. When building with VS 2015 and above, _WIN32_WINNT is set to 
0x600 (Vista/2008), while with older compilers, the value is 0x501 
(XP/2003). This is also due to locale issues, but of a different kind, 
and is apparently coincidental.

The build platform does not figure into the target platform, which is 
clearly a good idea for binary distribution reasons.

-- 
Christian




Re: [HACKERS] Buildfarm failures on woodlouse (in ecpg-check)

From
Christian Ullrich
Date:
* I wrote:

> If there is interest in fixing this issue that is apparently limited
> to VS 2013, I will try and produce a proper patch.

Patch.

Perhaps surprisingly, the bug is in the failing test cases themselves, 
not ecpg. The CRT has two modes for its locale implementation: When 
called in a "global" mode thread, setlocale() affects all global mode 
threads; when called from a per-thread mode thread, it affects that 
thread only. [1]

In the threaded test cases, many global mode threads call setlocale() 
simultaneously, getting in each other's way. The fix is to switch the 
worker threads to per-thread mode first.

Without this patch, I see crashes in approximately 25 percent of runs 
(four crashes in 16 cycles of vcregress ecpgcheck, ~10 in 50 runs of 
thread.exe alone). With the patch, I have no crashes in 100 ecpgcheck runs.

On the other hand, this affects only the buildfarm VM I mentioned 
earlier. I have another VM where it does not ever crash even without the 
patch -- such are the joys of multithreading. Both of them should have 
plenty of cores, both physical and virtual.


There are some alternatives to fixing it this way, but I think this is 
the best approach:

- Selecting per-thread mode in ecpglib takes the choice away from the
   developer who might want shared locale.

- Adding locking around setlocale() is difficult because ecpglib already
   uses a wrapper around the CRT function, provided by libpgport.

   These test cases are the only consumers of the wrapper that have any
   concept of multithreading. Supporting it in libpgport for the sole
   benefit of threaded ECPG applications on Windows does not seem to
   be a good idea, and re-wrapping the wrapper in ecpglib is not only
   beyond my abilities to write, but is also going to be unmaintainable.

- Adding locking around every setlocale() call in ecpglib is just ugly.


While working on this, I also noticed that there seem to be two separate 
partial implementations of a pthread synchronization emulation for 
Win32. One is in ecpglib, uses mutexes and provides 
PTHREAD_MUTEX_INITIALIZER and pthread_once(), the other has the header 
in src/port and the implementation in libpq, uses critical sections and 
does not cover either feature.

Should the two be merged at some point?


[1] https://msdn.microsoft.com/en-us/library/ms235302(v=vs.120).aspx

-- 
Christian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment