Thread: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
"Shiv Shivaraju Gowda (shivshi)"
Date:
There seems to be a has a dormant bug in PostgreSQL source code build usin= g VisualStudio(VS) which shows up with the newer OS( not sure if it is the= OS or some other thing in the environment ). MinGW build doesn't access th= is code path and thus doesn't hit this bug. SYMPTOM: The following symptom is encountered in PostgreSQL build using Visual Studi= o: PostgreSQL regression tests fail with server crashing repeatedly with t= his message in the log file: "PANIC: could not lock semaphore: error code = 0". The issue is encountered for VS 2005, VS 2008( 32bit and 64bit executab= les), VS 2010 and VS 2012 built executables. The issue was reproducible wi= th PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW buil= d or EnterpriseDB Packaged executables (which seems to have been built usin= g VisualStudio 2010). CAUSE: The PGSemaphoreLock function in postgresql-9.2.5\src\backend\port\win32_sem= a.c (https://github.com/postgres/postgres/blob/master/src/backend/port/win3= 2_sema.c) uses "Ex" version of "WaitForMultipleObjects" Windows function (h= ttp://msdn.microsoft.com/en-us/library/windows/desktop/ms687028%28v=3Dvs.85= %29.aspx) however doesn't handle the additional awake calls from the "bAler= table" state. Specifically, it doesn't handle the WAIT_IO_COMPLETION return= code when woken up by a User-mode Asynchronous Procedure Call(APC) or Asyn= c IO completion. The part I do NOT understand is why do the User-Mode APC or Async IO comple= tion triggers get fired only in the executables we built and not in the one= s built by EnterpriseDB and bits built by other users since no one has comp= lained about it. Irrespective if it is triggered or not, the code should ha= ve handled all the return codes of the WaitForMultipleObjectsEX API and tha= t is the reason I think it is a bug. I checked the source code for calls which will trigger the Async IO or user= -mode APC(ReadFileEx<http://msdn.microsoft.com/en-us/library/windows/deskto= p/aa365468(v=3Dvs.85).aspx>, WriteFileEx<http://msdn.microsoft.com/en-us/li= brary/windows/desktop/aa365748(v=3Dvs.85).aspx>, QueueUserAPC<http://msdn.m= icrosoft.com/en-us/library/windows/desktop/ms684954(v=3Dvs.85).aspx>) and c= ould not find any. I am not sure what triggers the WAIT_IO_COMPLETION retur= n call. I could not find a way to figure out that information in the debug = environment. (Posible) FIX: Either of the following changes to the PGSemaphoreLock function work fine a= nd pass the regression tests. 1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects= . 2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There i= s a similar code like this in socket.c, so this change should be safe too.
Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
Alvaro Herrera
Date:
Shiv Shivaraju Gowda (shivshi) wrote: > SYMPTOM: > The following symptom is encountered in PostgreSQL build using Visual Studio: PostgreSQL regression tests fail with servercrashing repeatedly with this message in the log file: "PANIC: could not lock semaphore: error code 0". The issueis encountered for VS 2005, VS 2008( 32bit and 64bit executables), VS 2010 and VS 2012 built executables. The issuewas reproducible with PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW build or EnterpriseDB Packagedexecutables (which seems to have been built using VisualStudio 2010). > (Posible) FIX: > Either of the following changes to the PGSemaphoreLock function work fine and pass the regression tests. > > 1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects. > 2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There is a similar code like this in socket.c, so thischange should be safe too. Is anyone looking into this? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 22, 2013 at 01:59:58AM +0000, Shiv Shivaraju Gowda (shivshi) wrote: > There seems to be a has a dormant bug in PostgreSQL source code build using VisualStudio(VS) which shows up with thenewer OS( not sure if it is the OS or some other thing in the environment ). Thanks for this report, and apologies for the delay. Please tell us as much as possible about the OS configurations that have exhibited the problem and the OS configurations that have *not* exhibited the problem with the same binaries. > MinGW build doesn't access this code path and thus doesn't hit this bug. A MinGW build certainly includes win32_sema.c, and I don't see why it wouldn't use that code path. How did you determine that? > SYMPTOM: > The following symptom is encountered in PostgreSQL build using Visual Studio: PostgreSQL regression tests fail with servercrashing repeatedly with this message in the log file: "PANIC: could not lock semaphore: error code 0". The issueis encountered for VS 2005, VS 2008( 32bit and 64bit executables), VS 2010 and VS 2012 built executables. The issuewas reproducible with PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW build or EnterpriseDB Packagedexecutables (which seems to have been built using VisualStudio 2010). I assume you mean 9.2.3 and 9.2.5? > CAUSE: > The PGSemaphoreLock function in postgresql-9.2.5\src\backend\port\win32_sema.c (https://github.com/postgres/postgres/blob/master/src/backend/port/win32_sema.c)uses "Ex" version of "WaitForMultipleObjects"Windows function (http://msdn.microsoft.com/en-us/library/windows/desktop/ms687028%28v=vs.85%29.aspx)however doesn't handle the additionalawake calls from the "bAlertable" state. Specifically, it doesn't handle the WAIT_IO_COMPLETION return code whenwoken up by a User-mode Asynchronous Procedure Call(APC) or Async IO completion. > > The part I do NOT understand is why do the User-Mode APC or Async IO completion triggers get fired only in the executableswe built and not in the ones built by EnterpriseDB and bits built by other users since no one has complained aboutit. Irrespective if it is triggered or not, the code should have handled all the return codes of the WaitForMultipleObjectsEXAPI and that is the reason I think it is a bug. > > I checked the source code for calls which will trigger the Async IO or user-mode APC(ReadFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468(v=vs.85).aspx>, WriteFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx>, QueueUserAPC<http://msdn.microsoft.com/en-us/library/windows/desktop/ms684954(v=vs.85).aspx>)and could not find any. I amnot sure what triggers the WAIT_IO_COMPLETION return call. I could not find a way to figure out that information in thedebug environment. Yes, that's weird. I could believe anti-malware software doing this. If any successful test has taken place on the same machine and user account as the unsuccessful test, that explanation is unlikely. It's also possible that you're building against an unusual version of a 3rd-party library that, unlike versions used in popular builds, does use APCs or async I/O. All that being said, it's clearly wrong to call WaitForMultipleObjectsEx(..., TRUE) and then treat WAIT_IO_COMPLETION like WAIT_FAILED. > (Posible) FIX: > Either of the following changes to the PGSemaphoreLock function work fine and pass the regression tests. > > 1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects. I don't know how we decided which waits to make alertable, so I lean against changing this wait from alertable to normal. > 2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There is a similar code like this in socket.c, so thischange should be safe too. Sounds good. I looked through the other WaitFor*Object* callers for similar problems. pgwin32_select() also mishandles WAIT_IO_COMPLETION, returning zero as though the full timeout had elapsed. It should return -1 and set EINTR. This could affect CheckRADIUSAuth() and PgstatCollectorMain(), both of which distinguish full timeout expiration from EINTR. Would you verify that the attached patch fixes your builds? nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Attachment
Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > Would you verify that the attached patch fixes your builds? The aspect of this patch that leaves ImmediateInterruptOK set over a much larger code range in PGSemaphoreLock --- and in particular doesn't bother to reset it before entering ereport() --- is a horrid idea. Keep in mind that setting ImmediateInterruptOK means you can lose control *anywhere*, and so have no ability to safely manipulate anything except local variables. regards, tom lane
Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
Noah Misch
Date:
On Fri, Jan 24, 2014 at 10:14:24AM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > Would you verify that the attached patch fixes your builds? > > The aspect of this patch that leaves ImmediateInterruptOK set over a > much larger code range in PGSemaphoreLock --- and in particular doesn't > bother to reset it before entering ereport() --- is a horrid idea. > Keep in mind that setting ImmediateInterruptOK means you can lose control > *anywhere*, and so have no ability to safely manipulate anything except > local variables. What's wrong with ereport(FATAL) under ImmediateInterruptOK? ClientAuthentication() does it, and errfinish() clears the flag. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Attachment
Thanks Noah Misch for looking into this. > Please tell us as much as possible about the OS configurations that have > exhibited the problem and the OS configurations that have *not* exhibited > the problem with the same binaries. The OS configuration that has exhibited the problem is Windows 7 Enterprise, 64 bit OS on Intel Core-I7. We are not aware of a OS configuration on which this did work. > A MinGW build certainly includes win32_sema.c, and I don't see why it > wouldn't use that code path. How did you determine that? I thought the MinGW build which is built with gnu compiler invokes Posix libraries and doesn't call into Windows System calls directly. > I assume you mean 9.2.3 and 9.2.5? Correct, apologize for the typo. I couldn't change the title(or the content) after the issue was reated. > Would you verify that the attached patch fixes your builds? Do you want me to go ahead with the tests or do you want me to wait to incorporate Tom Lane's suggestions -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-6-2-5-Visual-Studio-Build-does-not-pass-the-regression-tests-tp5779681p5793932.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
Alvaro Herrera
Date:
Noah Misch wrote: > On Fri, Jan 24, 2014 at 10:14:24AM -0500, Tom Lane wrote: > > Noah Misch <noah@leadboat.com> writes: > > > Would you verify that the attached patch fixes your builds? You haven't pushed this patch yet, right? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
[Please use reply-to-all to keep me on the recipients list.] On Thu, Feb 27, 2014 at 01:43:20PM -0800, shivshi wrote: > > Please tell us as much as possible about the OS configurations that have > > exhibited the problem and the OS configurations that have *not* exhibited > > the problem with the same binaries. > > The OS configuration that has exhibited the problem is Windows 7 Enterprise, > 64 bit OS on Intel Core-I7. > We are not aware of a OS configuration on which this did work. Do you use any anti-malware software on all of those systems? > > A MinGW build certainly includes win32_sema.c, and I don't see why it > > wouldn't use that code path. How did you determine that? > > I thought the MinGW build which is built with gnu compiler invokes Posix > libraries and doesn't call into Windows System calls directly. No; that approximately describes the Cygwin build. > > Would you verify that the attached patch fixes your builds? > > Do you want me to go ahead with the tests or do you want me to wait to > incorporate Tom Lane's suggestions Please go ahead and test w32-io-completion-v1.patch. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
Noah Misch
Date:
On Tue, Mar 04, 2014 at 12:37:25PM -0300, Alvaro Herrera wrote: > Noah Misch wrote: > > On Fri, Jan 24, 2014 at 10:14:24AM -0500, Tom Lane wrote: > > > Noah Misch <noah@leadboat.com> writes: > > > > Would you verify that the attached patch fixes your builds? > > You haven't pushed this patch yet, right? Right. (Thanks for the bump; I missed the traffic on this thread.) -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
From
"Shiv Shivaraju Gowda (shivshi)"
Date:
> Do you use any anti-malware software on all of those systems? Yes they come pre-installed, but in one of the system I tested I tried my best to disable all the ones I knew. > No; that approximately describes the Cygwin build. OK, in that case I don=B9t why we don=B9t hit the problem with MinGW. May b= e it is the same reason why other people are not seeing this on VS build, i.e=20 WAIT_IO_COMPLETION is not triggered somehow. We are testing w32-io-completion-v1.patch, will keep you posted. On 3/4/14, 8:22 AM, "Noah Misch" <noah@leadboat.com> wrote: >[Please use reply-to-all to keep me on the recipients list.] > >On Thu, Feb 27, 2014 at 01:43:20PM -0800, shivshi wrote: >> > Please tell us as much as possible about the OS configurations that >>have >> > exhibited the problem and the OS configurations that have *not* >>exhibited >> > the problem with the same binaries. >>=20 >> The OS configuration that has exhibited the problem is Windows 7 >>Enterprise, >> 64 bit OS on Intel Core-I7. >> We are not aware of a OS configuration on which this did work. > >Do you use any anti-malware software on all of those systems? > >> > A MinGW build certainly includes win32_sema.c, and I don't see why it >> > wouldn't use that code path. How did you determine that? >>=20 >> I thought the MinGW build which is built with gnu compiler invokes Posix >> libraries and doesn't call into Windows System calls directly. > >No; that approximately describes the Cygwin build. > >> > Would you verify that the attached patch fixes your builds? >>=20 >> Do you want me to go ahead with the tests or do you want me to wait to >> incorporate Tom Lane's suggestions > >Please go ahead and test w32-io-completion-v1.patch. > >--=20 >Noah Misch >EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 05, 2014 at 10:00:12PM +0000, Shiv Shivaraju Gowda (shivshi) wrote: > We are testing w32-io-completion-v1.patch, will keep you posted. What was the result? Absent a specific report of the patch helping, I think I'll commit it to 9.5 only. -- Noah Misch EnterpriseDB http://www.enterprisedb.com