Thread: BUG #4162: Hotbackup recovery doesn't work in windows
The following bug has been logged online: Bug reference: 4162 Logged by: Javier Email address: elpochodelagente@gmail.com PostgreSQL version: 8.3.1 Operating system: Windows XP SP2 Description: Hotbackup recovery doesn't work in windows Details: I've found that the hotbackup recovery process as described in http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html fails in windows. The cause can be found in static XLogRecord * ReadRecord(XLogRecPtr *RecPtr, int emode); The problem is the following: When postgres tryies to recover, it looks for WAL files in incremental order until it reaches a file while doesn't exist. The variable readFile is assigned a file descriptor for the next WAL file to be recovered, which is obtained by procedure XLogReadFile. When postgres reaches a WAL file that doesn't exist, XLogReadFile returns -1. Now, after getting readFile value, the following code is executed: if (readFile < 0) goto next_record_is_invalid; label next_record_is_invalid is followed by next_record_is_invalid:; close(readFile); in POSIX version of close() it would just fail because the descriptor is invalid. In microsoft implementation (close() included from io.h), the application crashes (it doesn't even raise an exception, while debugging a message appears saying "Microsoft Visual Studio C Runtime Library has detected a fatal error in testclose.exe."). The bug may be fixed by testing readFile before close: next_record_is_invalid:; if (readFile >= 0) close(readFile); To reproduce bug you can try making a base backup in windows and then trying to recover. I discovered all this by attaching to postgres.exe process at start with visual studio 2005, the debugging symbols where invaluable. Then I tested this minimal code just to check in a new project. #include "stdafx.h" #include <io.h> int _tmain(int argc, _TCHAR* argv[]) { close(-1); return 0; } it crashes the same way.
On Tue, 2008-05-13 at 00:45 +0000, Javier wrote: > The following bug has been logged online: > > Bug reference: 4162 > Logged by: Javier > Email address: elpochodelagente@gmail.com > PostgreSQL version: 8.3.1 > Operating system: Windows XP SP2 > Description: Hotbackup recovery doesn't work in windows > Details: > > I've found that the hotbackup recovery process as described in > http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html > fails in windows. What are you running and how are you running this exactly? Please can you provide more details rather than your thoughts on > The cause... If what you say is correct then it is crash recovery that is failing. Thanks, -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2008-05-13 at 00:45 +0000, Javier wrote: > > The following bug has been logged online: > > > > Bug reference: 4162 > > Logged by: Javier > > Email address: elpochodelagente@gmail.com > > PostgreSQL version: 8.3.1 > > Operating system: Windows XP SP2 > > Description: Hotbackup recovery doesn't work in windows > > Details: > > > > I've found that the hotbackup recovery process as described in > > http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html > > fails in windows. > > What are you running and how are you running this exactly? > > Please can you provide more details rather than your thoughts on > > The cause... > > If what you say is correct then it is crash recovery that is failing. I think the analysis of the cause itself is correct - we can end up calling close(-1), and that will fail *on some versions of the runtime*. It doesn't crash on all. There may also be something else that's broken that's causing it to happen in this case, but that code certainly looks dangerous to me. I'd be inclined to fix it along the way he suggests, even if it's not the root cause of this issue. Any objection to that? Javier - is this happening with the binary win32 build, or do you have your own build? If you have your own, what compiler and runtime are you using? //Magnus
Hi, I'm making a base backup, and then I try to recover from it. I install postgres from msi installer, add some tables and data and then follow the steps in "24.3.2. Making a Base Backup". I don't keep archiving files forever but archive just the indispensable (basicaly the same than pg_hotbackup project but in windows).
After that I tryied to test the backup by reinstalling postgres and following the steps in "24.3.3. Recovering using a Continuous Archive Backup" . It fails in step 8, throwing an error message box. After a while I downloaded postgres source files and debug symbols and attached visual studio to postgres.exe and debugged it. It always failed in the line I told before. It may be possible that any kind of recovery be failing, I didn't test.
I'm not compiling anything, but made a binary patch to the exe (I was in a hurry!) that avoids calling close with -1 and recovery process started working.
--
Javier Pimás
Ciudad de Buenos Aires
After that I tryied to test the backup by reinstalling postgres and following the steps in "24.3.3. Recovering using a Continuous Archive Backup" . It fails in step 8, throwing an error message box. After a while I downloaded postgres source files and debug symbols and attached visual studio to postgres.exe and debugged it. It always failed in the line I told before. It may be possible that any kind of recovery be failing, I didn't test.
I'm not compiling anything, but made a binary patch to the exe (I was in a hurry!) that avoids calling close with -1 and recovery process started working.
On Tue, May 13, 2008 at 5:14 PM, Magnus Hagander <magnus@hagander.net> wrote:
I think the analysis of the cause itself is correct - we can end upSimon Riggs wrote:
> On Tue, 2008-05-13 at 00:45 +0000, Javier wrote:
> > The following bug has been logged online:
> >
> > Bug reference: 4162
> > Logged by: Javier
> > Email address: elpochodelagente@gmail.com
> > PostgreSQL version: 8.3.1
> > Operating system: Windows XP SP2
> > Description: Hotbackup recovery doesn't work in windows
> > Details:
> >
> > I've found that the hotbackup recovery process as described in
> > http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html
> > fails in windows.
>
> What are you running and how are you running this exactly?
>
> Please can you provide more details rather than your thoughts on
> > The cause...
>
> If what you say is correct then it is crash recovery that is failing.
calling close(-1), and that will fail *on some versions of the
runtime*. It doesn't crash on all. There may also be something else
that's broken that's causing it to happen in this case, but that code
certainly looks dangerous to me. I'd be inclined to fix it along the
way he suggests, even if it's not the root cause of this issue. Any
objection to that?
Javier - is this happening with the binary win32 build, or do you have
your own build? If you have your own, what compiler and runtime are you
using?
//Magnus
--
Javier Pimás
Ciudad de Buenos Aires
Javier Pim=C3=A1s wrote: > Hi, I'm making a base backup, and then I try to recover from it. I > install postgres from msi installer, add some tables and data and > then follow the steps in "24.3.2. Making a Base Backup". I don't keep > archiving files forever but archive just the indispensable (basicaly > the same than pg_hotbackup project but in windows). > After that I tryied to test the backup by reinstalling postgres and > following the steps in "24.3.3. Recovering using a Continuous Archive > Backup" . It fails in step 8, throwing an error message box. After a > while I downloaded postgres source files and debug symbols and > attached visual studio to postgres.exe and debugged it. It always > failed in the line I told before. It may be possible that any kind of > recovery be failing, I didn't test. >=20 > I'm not compiling anything, but made a binary patch to the exe (I was > in a hurry!) that avoids calling close with -1 and recovery process > started working. Ok, so the reason we've not seen this before is probably that the runtime used by a mingw build doesn't crash when given a negative file descriptor. I've applied a patch for this and backpatched it all the way to 7.4 to be consistent (even though that one doesn't build with MSVC, there could theoretically be other platforms with the same problem - though we most likely would've seen them by now). I'll defer to somebody else to try to figure out if something else is wrong that caused this codepath to be hit, but either way we certainly shouldn't go around crashing like this... Thanks for your report and diagnosis! //Magnus
On Tue, 2008-05-13 at 17:20 -0300, Javier Pimás wrote: > I'm not compiling anything, but made a binary patch to the exe (I was > in a hurry!) that avoids calling close with -1 and recovery process > started working. Good shooting. If you'd said that to begin with, I would have taken the report with less scepticism. (Thanks, Magnus) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com