Thread: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
manu@coopapotheken.be
Date:
The following bug has been logged on the website:

Bug reference:      13774
Logged by:          Manu Joye
Email address:      manu@coopapotheken.be
PostgreSQL version: 9.4.5
Operating system:   Windows server 2008r2
Description:

running pg_upgrade yields no warnings when there is not enough disk space to
do the upgrade. I  upgraded a server with about 10 databases following the
instructions as provided on the website here. Everything seemed to have gone
well except that some tables in various databases just vanished (pgadmin
complained about missing files) . Upon inspection I noticed the HD had only
a couple of KB of free space left. pg_upgrade seemed to have just restored
what it could and then exited without warning!

We caught it early on and could restore from backups but a check on required
free disk space before upgrade should be implemented or, if that doesn't
work, at least a warning from pg_upgrade that something went wrong.


Kind regards,

Manu

PS I LOVE postgresql, let that be clear. Thanks a whole lot to the whole
team for making this DBMS as awesome as it is.

Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
Bruce Momjian
Date:
On Thu, Nov 12, 2015 at 07:57:35PM +0000, manu@coopapotheken.be wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13774
> Logged by:          Manu Joye
> Email address:      manu@coopapotheken.be
> PostgreSQL version: 9.4.5
> Operating system:   Windows server 2008r2
> Description:
>
> running pg_upgrade yields no warnings when there is not enough disk space to
> do the upgrade. I  upgraded a server with about 10 databases following the
> instructions as provided on the website here. Everything seemed to have gone
> well except that some tables in various databases just vanished (pgadmin
> complained about missing files) . Upon inspection I noticed the HD had only
> a couple of KB of free space left. pg_upgrade seemed to have just restored
> what it could and then exited without warning!
>
> We caught it early on and could restore from backups but a check on required
> free disk space before upgrade should be implemented or, if that doesn't
> work, at least a warning from pg_upgrade that something went wrong.

Thank you for the report.  This is embarrassing, but the code was
testing for the wrong return value on Windows.  We used a macro to
define the same symbol on Windows and Unix (pg_copy_file), but for
Windows, we should have been testing for zero, while the code only
tested for a -1 return failure.  You can see the Windows failure zero
return value defined here:

    https://msdn.microsoft.com/en-us/library/windows/desktop/aa363851%28v=vs.85%29.aspx

    Return value

    If the function succeeds, the return value is nonzero.
    If the function fails, the return value is zero. To get extended error
    information, call GetLastError.

We do something similar for hard links, but we create a wrapper function
to return the proper value (-1), and we call it twice, so it seems wise
to keep that unchanged.

The attached patch fixes this and will be applied to all active branches
in the next minor release.  Sorry for the bug.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Attachment

Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
Bruce Momjian
Date:
On Fri, Nov 13, 2015 at 08:24:02PM -0500, Bruce Momjian wrote:
> Thank you for the report.  This is embarrassing, but the code was
> testing for the wrong return value on Windows.  We used a macro to
> define the same symbol on Windows and Unix (pg_copy_file), but for
> Windows, we should have been testing for zero, while the code only
> tested for a -1 return failure.  You can see the Windows failure zero
> return value defined here:
>
>     https://msdn.microsoft.com/en-us/library/windows/desktop/aa363851%28v=vs.85%29.aspx
>
>     Return value
>
>     If the function succeeds, the return value is nonzero.
>     If the function fails, the return value is zero. To get extended error
>     information, call GetLastError.
>
> We do something similar for hard links, but we create a wrapper function
> to return the proper value (-1), and we call it twice, so it seems wise
> to keep that unchanged.
>
> The attached patch fixes this and will be applied to all active branches
> in the next minor release.  Sorry for the bug.

Patch applied and backpatched through 9.1.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
"Manu Joye"
Date:
Thanks for the quick fix, Bruce. Much appreciated.

Manu

-----Original Message-----
From: Bruce Momjian [mailto:bruce@momjian.us]
Sent: zaterdag 14 november 2015 17:48
To: manu@coopapotheken.be
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without
enough disk space

On Fri, Nov 13, 2015 at 08:24:02PM -0500, Bruce Momjian wrote:
> Thank you for the report.  This is embarrassing, but the code was
> testing for the wrong return value on Windows.  We used a macro to
> define the same symbol on Windows and Unix (pg_copy_file), but for
> Windows, we should have been testing for zero, while the code only
> tested for a -1 return failure.  You can see the Windows failure zero
> return value defined here:
>
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa363851%28v=
> vs.85%29.aspx
>
>     Return value
>
>     If the function succeeds, the return value is nonzero.
>     If the function fails, the return value is zero. To get extended
error
>     information, call GetLastError.
>
> We do something similar for hard links, but we create a wrapper
> function to return the proper value (-1), and we call it twice, so it
> seems wise to keep that unchanged.
>
> The attached patch fixes this and will be applied to all active
> branches in the next minor release.  Sorry for the bug.

Patch applied and backpatched through 9.1.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
'Bruce Momjian'
Date:
On Sat, Nov 14, 2015 at 08:02:09PM +0100, Manu Joye wrote:
> Thanks for the quick fix, Bruce. Much appreciated.

I am a little worried that bug might have masked other pg_upgrade
Windows copy failures.  I am glad you were able to see the problem so
clearly.

---------------------------------------------------------------------------



>
> Manu
>
> -----Original Message-----
> From: Bruce Momjian [mailto:bruce@momjian.us]
> Sent: zaterdag 14 november 2015 17:48
> To: manu@coopapotheken.be
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without
> enough disk space
>
> On Fri, Nov 13, 2015 at 08:24:02PM -0500, Bruce Momjian wrote:
> > Thank you for the report.  This is embarrassing, but the code was
> > testing for the wrong return value on Windows.  We used a macro to
> > define the same symbol on Windows and Unix (pg_copy_file), but for
> > Windows, we should have been testing for zero, while the code only
> > tested for a -1 return failure.  You can see the Windows failure zero
> > return value defined here:
> >
> >
> > https://msdn.microsoft.com/en-us/library/windows/desktop/aa363851%28v=
> > vs.85%29.aspx
> >
> >     Return value
> >
> >     If the function succeeds, the return value is nonzero.
> >     If the function fails, the return value is zero. To get extended
> error
> >     information, call GetLastError.
> >
> > We do something similar for hard links, but we create a wrapper
> > function to return the proper value (-1), and we call it twice, so it
> > seems wise to keep that unchanged.
> >
> > The attached patch fixes this and will be applied to all active
> > branches in the next minor release.  Sorry for the bug.
>
> Patch applied and backpatched through 9.1.
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription                             +
>

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
Noah Misch
Date:
On Fri, Nov 13, 2015 at 08:24:02PM -0500, Bruce Momjian wrote:
> Thank you for the report.  This is embarrassing, but the code was
> testing for the wrong return value on Windows.  We used a macro to
> define the same symbol on Windows and Unix (pg_copy_file)

> *** a/src/bin/pg_upgrade/file.c
> --- b/src/bin/pg_upgrade/file.c
> *************** copyAndUpdateFile(pageCnvCtx *pageConver
> *** 34,40 ****
>   {
>       if (pageConverter == NULL)
>       {
> !         if (pg_copy_file(src, dst, force) == -1)
>               return getErrorText(errno);
>           else
>               return NULL;
> --- 34,44 ----
>   {
>       if (pageConverter == NULL)
>       {
> ! #ifndef WIN32
> !         if (copy_file(src, dst, force) == -1)
> ! #else
> !         if (CopyFile(src, dst, force) == 0)
> ! #endif
>               return getErrorText(errno);

Thanks.  This passage of code has at least two other problems on Windows.  The
third CopyFile() argument means the opposite of the third copy_file()
argument.  Next, getErrorText() is wrong on Windows:

const char *
getErrorText(int errNum)
{
#ifdef WIN32
    _dosmaperr(GetLastError());
#endif
    return pg_strdup(strerror(errNum));
}

Calling _dosmaperr() to set errno is futile, because we nonetheless proceed to
use the passed-in errNum.

Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
Bruce Momjian
Date:
On Tue, Nov 17, 2015 at 11:43:34PM -0500, Noah Misch wrote:
> On Fri, Nov 13, 2015 at 08:24:02PM -0500, Bruce Momjian wrote:
> > Thank you for the report.  This is embarrassing, but the code was
> > testing for the wrong return value on Windows.  We used a macro to
> > define the same symbol on Windows and Unix (pg_copy_file)
>
> > *** a/src/bin/pg_upgrade/file.c
> > --- b/src/bin/pg_upgrade/file.c
> > *************** copyAndUpdateFile(pageCnvCtx *pageConver
> > *** 34,40 ****
> >   {
> >       if (pageConverter == NULL)
> >       {
> > !         if (pg_copy_file(src, dst, force) == -1)
> >               return getErrorText(errno);
> >           else
> >               return NULL;
> > --- 34,44 ----
> >   {
> >       if (pageConverter == NULL)
> >       {
> > ! #ifndef WIN32
> > !         if (copy_file(src, dst, force) == -1)
> > ! #else
> > !         if (CopyFile(src, dst, force) == 0)
> > ! #endif
> >               return getErrorText(errno);
>
> Thanks.  This passage of code has at least two other problems on Windows.  The
> third CopyFile() argument means the opposite of the third copy_file()
> argument.  Next, getErrorText() is wrong on Windows:
>
> const char *
> getErrorText(int errNum)
> {
> #ifdef WIN32
>     _dosmaperr(GetLastError());
> #endif
>     return pg_strdup(strerror(errNum));
> }
>
> Calling _dosmaperr() to set errno is futile, because we nonetheless proceed to
> use the passed-in errNum.

Very good points.  The attached patch fixes the copy parameter and
addresses the errno issue by just using errno direcdtly after the
_dosmaperr call.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Attachment
Bruce Momjian <bruce@momjian.us> writes:
> Very good points.  The attached patch fixes the copy parameter and
> addresses the errno issue by just using errno direcdtly after the
> _dosmaperr call.

You did not change the comment

   *    Returns the text of the error message for the given error number

even though no error number is a "given" parameter anymore.  Maybe
"... for the most-recently-reported error"?

            regards, tom lane

Re: Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
Bruce Momjian
Date:
On Wed, Nov 18, 2015 at 09:30:45AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Very good points.  The attached patch fixes the copy parameter and
> > addresses the errno issue by just using errno direcdtly after the
> > _dosmaperr call.
>
> You did not change the comment
>
>    *    Returns the text of the error message for the given error number
>
> even though no error number is a "given" parameter anymore.  Maybe
> "... for the most-recently-reported error"?

Good point.  I have changed it to "Returns the text of the most recent
error".

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Re: Re: BUG #13774: upgrade from 9.1 to 9.4 'succeeds' without enough disk space

From
Bruce Momjian
Date:
On Wed, Nov 18, 2015 at 09:42:35AM -0500, Bruce Momjian wrote:
> On Wed, Nov 18, 2015 at 09:30:45AM -0500, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Very good points.  The attached patch fixes the copy parameter and
> > > addresses the errno issue by just using errno direcdtly after the
> > > _dosmaperr call.
> >
> > You did not change the comment
> >
> >    *    Returns the text of the error message for the given error number
> >
> > even though no error number is a "given" parameter anymore.  Maybe
> > "... for the most-recently-reported error"?
>
> Good point.  I have changed it to "Returns the text of the most recent
> error".

Patch applied and backpatched through 9.1.  I used a smaller patch, same
behavior, on pre-9.4 because the removal of the getErrorText() argument
were too invasive.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +