Thread: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Thomas Kellerer
Date:
Hello,

for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.

When I try to copy that file manually everything works fine.

After running a "vacuum full" on the table in question the upgrade goes through.

One thing I noticed in the --verbose output of pg_upgrade is that the old cluster - despite being a 9.5 one - has the
"pg_controlversion number 942" 

Here is the part of the pg_upgrade output:

.....

copying "d:/Daten/db/pgdata95/base/16410/85351" to "d:/Daten/db/pgdata96/base/16411/85351"
   d:/Daten/db/pgdata95/base/16410/85351_fsm
copying "d:/Daten/db/pgdata95/base/16410/85351_fsm" to "d:/Daten/db/pgdata96/base/16411/85351_fsm"
   d:/Daten/db/pgdata95/base/16410/85351_vm
copying "d:/Daten/db/pgdata95/base/16410/85351_vm" to "d:/Daten/db/pgdata96/base/16411/85351_vm"
   d:/Daten/db/pgdata95/base/16410/85358
copying "d:/Daten/db/pgdata95/base/16410/85358" to "d:/Daten/db/pgdata96/base/16411/85358"
   d:/Daten/db/pgdata95/base/16410/85358.1
copying "d:/Daten/db/pgdata95/base/16410/85358.1" to "d:/Daten/db/pgdata96/base/16411/85358.1"
   d:/Daten/db/pgdata95/base/16410/85358.2
copying "d:/Daten/db/pgdata95/base/16410/85358.2" to "d:/Daten/db/pgdata96/base/16411/85358.2"
   d:/Daten/db/pgdata95/base/16410/85358.3
copying "d:/Daten/db/pgdata95/base/16410/85358.3" to "d:/Daten/db/pgdata96/base/16411/85358.3"
   d:/Daten/db/pgdata95/base/16410/85358_fsm
copying "d:/Daten/db/pgdata95/base/16410/85358_fsm" to "d:/Daten/db/pgdata96/base/16411/85358_fsm"
   d:/Daten/db/pgdata95/base/16410/85358_vm
copying "d:/Daten/db/pgdata95/base/16410/85358_vm" to "d:/Daten/db/pgdata96/base/16411/85358_vm"

error while copying relation "public.wb_downloads" ("d:/Daten/db/pgdata95/base/16410/85358_vm" to
"d:/Daten/db/pgdata96/base/16411/85358_vm"):Invalid argument 
Failure, exiting

The file in question is 65.536 bytes in size.

I saved all log files and the complete output from the failed run, so if you are interested I can supply them (I ran
pg_upgradewith the --retain option). 

Regards
Thomas


Re: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Adrian Klaver
Date:
On 09/29/2016 12:50 PM, Thomas Kellerer wrote:
> Hello,
>
> for some reason pg_upgrade failed on Windows 10 for me, with an error
> message that one specifc _vm file couldn't be copied.
>
> When I try to copy that file manually everything works fine.
>
> After running a "vacuum full" on the table in question the upgrade goes
> through.

Assuming you did that on old cluster?

Upgrading to 9.6?

Where both clusters installed the same way?

>
> One thing I noticed in the --verbose output of pg_upgrade is that the
> old cluster - despite being a 9.5 one - has the "pg_control version
> number 942"

Which is correct:


https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/catalog/pg_control.h;h=0b8bea74a891831bc3cbe8fd4d4233475a8329c4;hb=ba37ac217791dfdf2b327c4b75e7083b6b03a2f5


What was the complete command line invocation of pg_upgrade?

>
> Here is the part of the pg_upgrade output:
>
> .....
>
> copying "d:/Daten/db/pgdata95/base/16410/85351" to
> "d:/Daten/db/pgdata96/base/16411/85351"
>   d:/Daten/db/pgdata95/base/16410/85351_fsm
> copying "d:/Daten/db/pgdata95/base/16410/85351_fsm" to
> "d:/Daten/db/pgdata96/base/16411/85351_fsm"
>   d:/Daten/db/pgdata95/base/16410/85351_vm
> copying "d:/Daten/db/pgdata95/base/16410/85351_vm" to
> "d:/Daten/db/pgdata96/base/16411/85351_vm"
>   d:/Daten/db/pgdata95/base/16410/85358
> copying "d:/Daten/db/pgdata95/base/16410/85358" to
> "d:/Daten/db/pgdata96/base/16411/85358"
>   d:/Daten/db/pgdata95/base/16410/85358.1
> copying "d:/Daten/db/pgdata95/base/16410/85358.1" to
> "d:/Daten/db/pgdata96/base/16411/85358.1"
>   d:/Daten/db/pgdata95/base/16410/85358.2
> copying "d:/Daten/db/pgdata95/base/16410/85358.2" to
> "d:/Daten/db/pgdata96/base/16411/85358.2"
>   d:/Daten/db/pgdata95/base/16410/85358.3
> copying "d:/Daten/db/pgdata95/base/16410/85358.3" to
> "d:/Daten/db/pgdata96/base/16411/85358.3"
>   d:/Daten/db/pgdata95/base/16410/85358_fsm
> copying "d:/Daten/db/pgdata95/base/16410/85358_fsm" to
> "d:/Daten/db/pgdata96/base/16411/85358_fsm"
>   d:/Daten/db/pgdata95/base/16410/85358_vm
> copying "d:/Daten/db/pgdata95/base/16410/85358_vm" to
> "d:/Daten/db/pgdata96/base/16411/85358_vm"
>
> error while copying relation "public.wb_downloads"
> ("d:/Daten/db/pgdata95/base/16410/85358_vm" to
> "d:/Daten/db/pgdata96/base/16411/85358_vm"): Invalid argument
> Failure, exiting
>
> The file in question is 65.536 bytes in size.
>
> I saved all log files and the complete output from the failed run, so if
> you are interested I can supply them (I ran pg_upgrade with the --retain
> option).
>
> Regards
> Thomas
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Thomas Kellerer
Date:
Adrian Klaver schrieb am 29.09.2016 um 22:55:
>> After running a "vacuum full" on the table in question the upgrade goes
>> through.
>
> Assuming you did that on old cluster?

Yes, correct. I did that on the 9.5 cluster

> Where both clusters installed the same way?

Yes.

I always download the ZIP Archive from http://www.enterprisedb.com/products-services-training/pgbindownload then run
initdbmanually. 

Both were initialized using:

    initdb -D "..."  --lc-messages=English -U postgres --pwfile=pwfile.txt -E UTF8 -A md5

> What was the complete command line invocation of pg_upgrade?

That was in a batch file:

set LC_MESSAGES=English

set oldbin=c:\Programme\PostgreSQL\9.5\bin
set newbin=c:\Programme\PostgreSQL\9.6\bin
"%newbin%\pg_upgrade" ^
   --username=postgres ^
   --old-bindir="%oldbin%" ^
   --new-bindir="%newbin%" ^
   --old-datadir=d:/Daten/db/pgdata95 ^
   --new-datadir=d:/Daten/db/pgdata96 ^
   --retain ^
   --verbose ^
   --old-port=5432 ^
   --new-port=5433




Re: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.

Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
code for 9.6 and hasn't really gotten that much testing.  Its error
reporting is shamefully bad --- you can't tell which step failed, and
I wouldn't even put a lot of faith in the errno being meaningful,
considering that it does close() calls before capturing the errno.

But what gets my attention in this connection is that it doesn't
seem to be taking the trouble to open the files in binary mode.
Could that lead to the reported failure?  Not sure, but it seems
like at the least it could result in corrupted VM files.

Has anyone tested vismap upgrades on Windows, and made an effort
to validate that the output wasn't garbage?

            regards, tom lane


Re: [HACKERS] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Thomas Kellerer <spam_eater@gmx.net> writes:
> > for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.
>
> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
> code for 9.6 and hasn't really gotten that much testing.  Its error
> reporting is shamefully bad --- you can't tell which step failed, and
> I wouldn't even put a lot of faith in the errno being meaningful,
> considering that it does close() calls before capturing the errno.

So we do close() in a bunch of places while closing shop, which calls
_close() on Windows; this function sets errno.  Then we call
getErrorText(), which calls _dosmaperr() on the result of
GetLastError().  But the last-error stuff is not set by _close; I suppose
GetLastError() returns 0 in that case, which promps _doserrmap to set errno to 0.
http://stackoverflow.com/questions/20056851/getlasterror-errno-formatmessagea-and-strerror-s
So this wouldn't quite have the effect you say; I think it'd say
"Failure while copying ...: Success" instead.

However surely we should have errno save/restore.

Other than that, I think the _dosmaperr() call should go entirely.
Moreover I think getErrorText() as a whole is misconceived and should be
removed altogether (why pstrdup the string?).  There are very few places
in pg_upgrade that require _dosmaperr; I can see only copyFile and
linkFile.  All the others should just be doing strerror() only, at least
according to the manual.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Moreover I think getErrorText() as a whole is misconceived and should be
> removed altogether (why pstrdup the string?).

Indeed.  I think bouncing the error back to the caller is misguided
to start with, seeing that the caller is just going to do pg_fatal
anyway.  We should rewrite these functions to just error out internally,
which will make it much easier to provide decent error reporting
indicating which call failed.

            regards, tom lane


Re: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Thomas Kellerer
Date:
Tom Lane schrieb am 29.09.2016 um 23:10:
> Thomas Kellerer <spam_eater@gmx.net> writes:
>> for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.
>
> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
> code for 9.6 and hasn't really gotten that much testing.  Its error
> reporting is shamefully bad --- you can't tell which step failed, and
> I wouldn't even put a lot of faith in the errno being meaningful,
> considering that it does close() calls before capturing the errno.
>
> But what gets my attention in this connection is that it doesn't
> seem to be taking the trouble to open the files in binary mode.
> Could that lead to the reported failure?  Not sure, but it seems
> like at the least it could result in corrupted VM files.

I did this on two different computers, one with Windows 10 the other with Windows 7.
(only test-databases, so no real issue anyway)

In both cases running a "vacuum full" for the table in question fixed the problem and pg_upgrade finished without
problems.


Re: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From
Masahiko Sawada
Date:
On Fri, Sep 30, 2016 at 6:40 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Tom Lane schrieb am 29.09.2016 um 23:10:
>> Thomas Kellerer <spam_eater@gmx.net> writes:
>>> for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.
>>
>> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
>> code for 9.6 and hasn't really gotten that much testing.  Its error
>> reporting is shamefully bad --- you can't tell which step failed, and
>> I wouldn't even put a lot of faith in the errno being meaningful,
>> considering that it does close() calls before capturing the errno.
>>
>> But what gets my attention in this connection is that it doesn't
>> seem to be taking the trouble to open the files in binary mode.
>> Could that lead to the reported failure?  Not sure, but it seems
>> like at the least it could result in corrupted VM files.
>
> I did this on two different computers, one with Windows 10 the other with Windows 7.
> (only test-databases, so no real issue anyway)
>
> In both cases running a "vacuum full" for the table in question fixed the problem and pg_upgrade finished without
problems.

Because vacuum full removes the _vm file, pg_upgrade completed job successfully.
If you still have the _vm file
("d:/Daten/db/pgdata95/base/16410/85358_vm") that lead an error, is it
possible that you check if there is '\r\n' [0d 0a] character in that
_vm file or share that _vm file with us?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center