Thread: BUG #4162: Hotbackup recovery doesn't work in windows

BUG #4162: Hotbackup recovery doesn't work in windows

From
"Javier"
Date:
The following bug has been logged online:

Bug reference:      4162
Logged by:          Javier
Email address:      elpochodelagente@gmail.com
PostgreSQL version: 8.3.1
Operating system:   Windows XP SP2
Description:        Hotbackup recovery doesn't work in windows
Details:

I've found that the hotbackup recovery process as described in
http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html
fails in windows.

The cause can be found in

static XLogRecord *
ReadRecord(XLogRecPtr *RecPtr, int emode);

The problem is the following: When postgres tryies to recover, it looks for
WAL files in incremental order until it reaches a file while doesn't exist.
The variable readFile is assigned a file descriptor for the next WAL file to
be recovered, which is obtained by procedure XLogReadFile. When postgres
reaches a WAL file that doesn't exist, XLogReadFile returns -1.

Now, after getting readFile value, the following code is executed:

if (readFile < 0)
    goto next_record_is_invalid;

label next_record_is_invalid is followed by

next_record_is_invalid:;
    close(readFile);

in POSIX version of close() it would just fail because the descriptor is
invalid. In microsoft implementation (close() included from io.h), the
application crashes (it doesn't even raise an exception, while debugging a
message appears saying "Microsoft Visual Studio C Runtime Library has
detected a fatal error in testclose.exe.").

The bug may be fixed by testing readFile before close:
next_record_is_invalid:;
    if (readFile >= 0)
        close(readFile);

To reproduce bug you can try making a base backup in windows and then trying
to recover.

I discovered all this by attaching to postgres.exe process at start with
visual studio 2005, the debugging symbols where invaluable. Then I tested
this minimal code just to check in a new project.

#include "stdafx.h"
#include <io.h>


int _tmain(int argc, _TCHAR* argv[])
{
    close(-1);

    return 0;
}

it crashes the same way.

Re: BUG #4162: Hotbackup recovery doesn't work in windows

From
Simon Riggs
Date:
On Tue, 2008-05-13 at 00:45 +0000, Javier wrote:
> The following bug has been logged online:
>
> Bug reference:      4162
> Logged by:          Javier
> Email address:      elpochodelagente@gmail.com
> PostgreSQL version: 8.3.1
> Operating system:   Windows XP SP2
> Description:        Hotbackup recovery doesn't work in windows
> Details:
>
> I've found that the hotbackup recovery process as described in
> http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html
> fails in windows.

What are you running and how are you running this exactly?

Please can you provide more details rather than your thoughts on
> The cause...

If what you say is correct then it is crash recovery that is failing.

Thanks,

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: BUG #4162: Hotbackup recovery doesn't work in windows

From
Magnus Hagander
Date:
Simon Riggs wrote:
> On Tue, 2008-05-13 at 00:45 +0000, Javier wrote:
> > The following bug has been logged online:
> >
> > Bug reference:      4162
> > Logged by:          Javier
> > Email address:      elpochodelagente@gmail.com
> > PostgreSQL version: 8.3.1
> > Operating system:   Windows XP SP2
> > Description:        Hotbackup recovery doesn't work in windows
> > Details:
> >
> > I've found that the hotbackup recovery process as described in
> > http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html
> > fails in windows.
>
> What are you running and how are you running this exactly?
>
> Please can you provide more details rather than your thoughts on
> > The cause...
>
> If what you say is correct then it is crash recovery that is failing.

I think the analysis of the cause itself is correct - we can end up
calling close(-1), and that will fail *on some versions of the
runtime*. It doesn't crash on all. There may also be something else
that's broken that's causing it to happen in this case, but that code
certainly looks dangerous to me. I'd be inclined to fix it along the
way he suggests, even if it's not the root cause of this  issue. Any
objection to that?

Javier - is this happening with the binary win32 build, or do you have
your own build? If you have your own, what compiler and runtime are you
using?

//Magnus

Re: BUG #4162: Hotbackup recovery doesn't work in windows

From
"Javier Pimás"
Date:
Hi, I'm making a base backup, and then I try to recover from it. I install postgres from msi installer, add some tables and data and then follow the steps in "24.3.2. Making a Base Backup". I don't keep archiving files forever but archive just the indispensable (basicaly the same than pg_hotbackup project but in windows).
After that I tryied to test the backup by reinstalling postgres and  following the steps in  "24.3.3. Recovering using a Continuous Archive Backup" . It fails in step 8, throwing an error message box. After a while I downloaded postgres source files and debug symbols and attached visual studio to postgres.exe and debugged it. It always failed in the line I told before. It may be possible that any kind of recovery be failing, I didn't test.

I'm not compiling anything, but made a binary patch to the exe (I was in a hurry!) that avoids calling close with -1 and recovery process started working.

On Tue, May 13, 2008 at 5:14 PM, Magnus Hagander <magnus@hagander.net> wrote:
Simon Riggs wrote:
> On Tue, 2008-05-13 at 00:45 +0000, Javier wrote:
> > The following bug has been logged online:
> >
> > Bug reference:      4162
> > Logged by:          Javier
> > Email address:      elpochodelagente@gmail.com
> > PostgreSQL version: 8.3.1
> > Operating system:   Windows XP SP2
> > Description:        Hotbackup recovery doesn't work in windows
> > Details:
> >
> > I've found that the hotbackup recovery process as described in
> > http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html
> > fails in windows.
>
> What are you running and how are you running this exactly?
>
> Please can you provide more details rather than your thoughts on
> > The cause...
>
> If what you say is correct then it is crash recovery that is failing.

I think the analysis of the cause itself is correct - we can end up
calling close(-1), and that will fail *on some versions of the
runtime*. It doesn't crash on all. There may also be something else
that's broken that's causing it to happen in this case, but that code
certainly looks dangerous to me. I'd be inclined to fix it along the
way he suggests, even if it's not the root cause of this  issue. Any
objection to that?

Javier - is this happening with the binary win32 build, or do you have
your own build? If you have your own, what compiler and runtime are you
using?

//Magnus



--
Javier Pimás
Ciudad de Buenos Aires

Re: BUG #4162: Hotbackup recovery doesn't work in windows

From
Magnus Hagander
Date:
Javier Pim=C3=A1s wrote:
> Hi, I'm making a base backup, and then I try to recover from it. I
> install postgres from msi installer, add some tables and data and
> then follow the steps in "24.3.2. Making a Base Backup". I don't keep
> archiving files forever but archive just the indispensable (basicaly
> the same than pg_hotbackup project but in windows).
> After that I tryied to test the backup by reinstalling postgres and
> following the steps in  "24.3.3. Recovering using a Continuous Archive
> Backup" . It fails in step 8, throwing an error message box. After a
> while I downloaded postgres source files and debug symbols and
> attached visual studio to postgres.exe and debugged it. It always
> failed in the line I told before. It may be possible that any kind of
> recovery be failing, I didn't test.
>=20
> I'm not compiling anything, but made a binary patch to the exe (I was
> in a hurry!) that avoids calling close with -1 and recovery process
> started working.

Ok, so the reason we've not seen this before is probably that the
runtime used by a mingw build doesn't crash when given a negative file
descriptor.

I've applied a patch for this and backpatched it all the way to 7.4 to
be consistent (even though that one doesn't build with MSVC, there
could theoretically be other platforms with the same problem - though
we most likely would've seen them by now).

I'll defer to somebody else to try to figure out if something else is
wrong that caused this codepath to be hit, but either way we certainly
shouldn't go around crashing like this...

Thanks for your report and diagnosis!

//Magnus

Re: BUG #4162: Hotbackup recovery doesn't work in windows

From
Simon Riggs
Date:
On Tue, 2008-05-13 at 17:20 -0300, Javier Pimás wrote:

> I'm not compiling anything, but made a binary patch to the exe (I was
> in a hurry!) that avoids calling close with -1 and recovery process
> started working.

Good shooting. If you'd said that to begin with, I would have taken the
report with less scepticism.

(Thanks, Magnus)

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com