Thread: Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

29 September 2016, 21:10:41

Thomas Kellerer <spam_eater@gmx.net> writes:
> for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.

Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
code for 9.6 and hasn't really gotten that much testing.  Its error
reporting is shamefully bad --- you can't tell which step failed, and
I wouldn't even put a lot of faith in the errno being meaningful,
considering that it does close() calls before capturing the errno.

But what gets my attention in this connection is that it doesn't
seem to be taking the trouble to open the files in binary mode.
Could that lead to the reported failure?  Not sure, but it seems
like at the least it could result in corrupted VM files.

Has anyone tested vismap upgrades on Windows, and made an effort
to validate that the output wasn't garbage?

            regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Alvaro Herrera

Date:

29 September 2016, 21:40:26

Tom Lane wrote:
> Thomas Kellerer <spam_eater@gmx.net> writes:
> > for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.
>
> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
> code for 9.6 and hasn't really gotten that much testing.  Its error
> reporting is shamefully bad --- you can't tell which step failed, and
> I wouldn't even put a lot of faith in the errno being meaningful,
> considering that it does close() calls before capturing the errno.

So we do close() in a bunch of places while closing shop, which calls
_close() on Windows; this function sets errno.  Then we call
getErrorText(), which calls _dosmaperr() on the result of
GetLastError().  But the last-error stuff is not set by _close; I suppose
GetLastError() returns 0 in that case, which promps _doserrmap to set errno to 0.
http://stackoverflow.com/questions/20056851/getlasterror-errno-formatmessagea-and-strerror-s
So this wouldn't quite have the effect you say; I think it'd say
"Failure while copying ...: Success" instead.

However surely we should have errno save/restore.

Other than that, I think the _dosmaperr() call should go entirely.
Moreover I think getErrorText() as a whole is misconceived and should be
removed altogether (why pstrdup the string?).  There are very few places
in pg_upgrade that require _dosmaperr; I can see only copyFile and
linkFile.  All the others should just be doing strerror() only, at least
according to the manual.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

29 September 2016, 21:48:38

I wrote:
> But what gets my attention in this connection is that it doesn't
> seem to be taking the trouble to open the files in binary mode.
> Could that lead to the reported failure?  Not sure, but it seems
> like at the least it could result in corrupted VM files.

On further thought, it definitely would lead to some sort of
hard-to-diagnose error.  Assuming there's some bytes that look
like \n, Windows read() would expand them to \r\n, which not
only produces garbage vismap data but throws off the page boundaries
in the file.  rewriteVisibilityMap() would not notice that, since
it contains exactly no sanity checking on the page headers,
but what it would notice is getting a short read on what it'll
think is a partial last page.  Then it does this:
       if ((bytesRead = read(src_fd, buffer, BLCKSZ)) != BLCKSZ)       {           close(dst_fd);
close(src_fd);          return getErrorText();       }
 

The read() won't have set errno, since by its lights there's nothing
wrong.  So while we know that getErrorText saw errno == EINVAL, there's
no way to guess what call that might've been left over from.

BTW, just to add to the already long list of mortal sins in this
bit of code, I believe it does 100% the wrong thing on big-endian
hardware.
               uint16        new_vmbits = 0;               ...               /* Copy new visibility map bit to new
formatpage */               memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK);
 

That's going to drop the two new bytes down in exactly the wrong
order if big-endian.

Oh, and for even more fun, this bit will dump core on alignment-picky
machines, if the vmbuf local array starts on an odd boundary:
               ((PageHeader) new_vmbuf)->pd_checksum =                   pg_checksum_page(new_vmbuf, new_blkno);

We might've caught these things earlier if the buildfarm testing
included cross-version upgrades, but of course that requires a
lot of test infrastructure that's not there ...
        regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

29 September 2016, 21:52:02

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Moreover I think getErrorText() as a whole is misconceived and should be
> removed altogether (why pstrdup the string?).

Indeed.  I think bouncing the error back to the caller is misguided
to start with, seeing that the caller is just going to do pg_fatal
anyway.  We should rewrite these functions to just error out internally,
which will make it much easier to provide decent error reporting
indicating which call failed.

            regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

29 September 2016, 22:09:07

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> I wouldn't even put a lot of faith in the errno being meaningful,
>> considering that it does close() calls before capturing the errno.

> So we do close() in a bunch of places while closing shop, which calls
> _close() on Windows; this function sets errno.

But only on failure, no?  The close()s usually shouldn't fail, and
therefore shouldn't change errno, it's just that you can't trust that
100%.

I think likely what's happening is that we're seeing a leftover value from
some previous syscall that set GetLastError's result (and, presumably,
wasn't fatal so far as pg_upgrade was concerned).
        regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Masahiko Sawada

Date:

30 September 2016, 04:12:07

On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> I wouldn't even put a lot of faith in the errno being meaningful,
>>> considering that it does close() calls before capturing the errno.
>
>> So we do close() in a bunch of places while closing shop, which calls
>> _close() on Windows; this function sets errno.
>
> But only on failure, no?  The close()s usually shouldn't fail, and
> therefore shouldn't change errno, it's just that you can't trust that
> 100%.
>
> I think likely what's happening is that we're seeing a leftover value from
> some previous syscall that set GetLastError's result (and, presumably,
> wasn't fatal so far as pg_upgrade was concerned).
>

It means that read() syscall could not read BLOCKSZ bytes because
probably _vm file is corrupted?

>     if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)

It could be happen that read() syscall stopped to read at where it
reads bits representing '\n' characters (0x5c6e) because we don't open
file with O_BINARY flag?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Kyotaro HORIGUCHI

Date:

30 September 2016, 04:26:30

Hello,

At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC1Ztdbp1v-qEnRn=RxaC0M6ZaTb4Cj8RyG+Dvs4MAHSw@mail.gmail.com>
> On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> Tom Lane wrote:
> >>> I wouldn't even put a lot of faith in the errno being meaningful,
> >>> considering that it does close() calls before capturing the errno.
> >
> >> So we do close() in a bunch of places while closing shop, which calls
> >> _close() on Windows; this function sets errno.
> >
> > But only on failure, no?  The close()s usually shouldn't fail, and
> > therefore shouldn't change errno, it's just that you can't trust that
> > 100%.
> >
> > I think likely what's happening is that we're seeing a leftover value from
> > some previous syscall that set GetLastError's result (and, presumably,
> > wasn't fatal so far as pg_upgrade was concerned).
> >
> 
> It means that read() syscall could not read BLOCKSZ bytes because
> probably _vm file is corrupted?
> 
> >     if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
> 
> It could be happen that read() syscall stopped to read at where it
> reads bits representing '\n' characters (0x5c6e) because we don't open
> file with O_BINARY flag?

Windows behaves stupidly there. fread() and even() read converts
'\r\n' into '\n' when text mode so every sequence of [0d 0a] in
the reading bytes shortens the data length by 1 byte.

I was careless about that..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Masahiko Sawada

Date:

30 September 2016, 05:41:01

On Fri, Sep 30, 2016 at 1:26 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC1Ztdbp1v-qEnRn=RxaC0M6ZaTb4Cj8RyG+Dvs4MAHSw@mail.gmail.com>
>> On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> >> Tom Lane wrote:
>> >>> I wouldn't even put a lot of faith in the errno being meaningful,
>> >>> considering that it does close() calls before capturing the errno.
>> >
>> >> So we do close() in a bunch of places while closing shop, which calls
>> >> _close() on Windows; this function sets errno.
>> >
>> > But only on failure, no?  The close()s usually shouldn't fail, and
>> > therefore shouldn't change errno, it's just that you can't trust that
>> > 100%.
>> >
>> > I think likely what's happening is that we're seeing a leftover value from
>> > some previous syscall that set GetLastError's result (and, presumably,
>> > wasn't fatal so far as pg_upgrade was concerned).
>> >
>>
>> It means that read() syscall could not read BLOCKSZ bytes because
>> probably _vm file is corrupted?
>>
>> >     if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
>>
>> It could be happen that read() syscall stopped to read at where it
>> reads bits representing '\n' characters (0x5c6e) because we don't open
>> file with O_BINARY flag?
>
> Windows behaves stupidly there. fread() and even() read converts
> '\r\n' into '\n' when text mode so every sequence of [0d 0a] in
> the reading bytes shortens the data length by 1 byte.
>

Ah, [0d 0a] is right.
After I manually added [0d 0a] into _vm file, I reproduced this bug on Windows.
I will post the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Masahiko Sawada

Date:

30 September 2016, 10:55:44

On Fri, Sep 30, 2016 at 6:40 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Tom Lane schrieb am 29.09.2016 um 23:10:
>> Thomas Kellerer <spam_eater@gmx.net> writes:
>>> for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be
copied.
>>
>> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new
>> code for 9.6 and hasn't really gotten that much testing.  Its error
>> reporting is shamefully bad --- you can't tell which step failed, and
>> I wouldn't even put a lot of faith in the errno being meaningful,
>> considering that it does close() calls before capturing the errno.
>>
>> But what gets my attention in this connection is that it doesn't
>> seem to be taking the trouble to open the files in binary mode.
>> Could that lead to the reported failure?  Not sure, but it seems
>> like at the least it could result in corrupted VM files.
>
> I did this on two different computers, one with Windows 10 the other with Windows 7.
> (only test-databases, so no real issue anyway)
>
> In both cases running a "vacuum full" for the table in question fixed the problem and pg_upgrade finished without
problems.

Because vacuum full removes the _vm file, pg_upgrade completed job successfully.
If you still have the _vm file
("d:/Daten/db/pgdata95/base/16410/85358_vm") that lead an error, is it
possible that you check if there is '\r\n' [0d 0a] character in that
_vm file or share that _vm file with us?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Andrew Dunstan

Date:

30 September 2016, 14:13:21

On 09/29/2016 05:48 PM, Tom Lane wrote:
> We might've caught these things earlier if the buildfarm testing
> included cross-version upgrades, but of course that requires a
> lot of test infrastructure that's not there ...
>
>             

I have done quite a bit of work on this - see 
<https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm>

This actually runs on crake, and it has found problems in the past which 
we've fixed. However, it requires quite a deal of disk space (4GB or 
so), and the results are not stable, which is why I haven't enabled it.

What I am thinking of doing is making the diff tests fuzzy - if, say, we 
don't get a diff of more than 2000 lines we won't consider it a failure.

Among this module's advantages is that it tests upgrading quite a bit 
more than the standard pg_upgrade check - all the buildfarm's regression 
databases, not just the one created by "make check". For example, 
currently it is failing to upgrade from 9.6 to HEAD with this error:
   Could not load library "$libdir/hstore_plpython2"   ERROR:  could not load library
"/home/bf/bfr/root/upgrade/HEAD/inst/lib/postgresql/hstore_plpython2.so":
/home/bf/bfr/root/upgrade/HEAD/inst/lib/postgresql/hstore_plpython2.so:  undefined symbol: _Py_NoneStruct

I need to look into that.

cheers

andrew

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

30 September 2016, 14:25:54

Andrew Dunstan <andrew@dunslane.net> writes:
> On 09/29/2016 05:48 PM, Tom Lane wrote:
>> We might've caught these things earlier if the buildfarm testing
>> included cross-version upgrades, but of course that requires a
>> lot of test infrastructure that's not there ...

> I have done quite a bit of work on this - see 
> <https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm>

Oh!  How would I enable or use that?

> This actually runs on crake, and it has found problems in the past which 
> we've fixed. However, it requires quite a deal of disk space (4GB or 
> so), and the results are not stable, which is why I haven't enabled it.

Fair enough, but right now I'd like to do a one-shot test using a
big-endian machine to see whether it catches the apparent endianness
problem in rewriteVisibilityMap.  I have a feeling that that is something
that won't easily be caught by automated checks, because it requires a
case where the table pages are in a variety of visibility states, and even
if corruption has happened, only index-only-scan queries would notice.
        regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Andrew Dunstan

Date:

30 September 2016, 14:54:07


On 09/30/2016 10:25 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 09/29/2016 05:48 PM, Tom Lane wrote:
>>> We might've caught these things earlier if the buildfarm testing
>>> included cross-version upgrades, but of course that requires a
>>> lot of test infrastructure that's not there ...
>> I have done quite a bit of work on this - see
>> <https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm>
> Oh!  How would I enable or use that?


1. Pull the latest version of the module from git.
2. enable it in your buildfarm config file
3. do "run_branches.pl --run-all --verbose --test" and watch the output


cheers

andrew

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

30 September 2016, 16:24:43

Andrew Dunstan <andrew@dunslane.net> writes:
> On 09/30/2016 10:25 AM, Tom Lane wrote:
>> Oh!  How would I enable or use that?

> 1. Pull the latest version of the module from git.
> 2. enable it in your buildfarm config file
> 3. do "run_branches.pl --run-all --verbose --test" and watch the output

Seems to be some additional prep work needed somewhere ...

No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
        regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Masahiko Sawada

Date:

30 September 2016, 16:24:55

On Fri, Sep 30, 2016 at 2:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Sep 30, 2016 at 1:26 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hello,
>>
>> At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC1Ztdbp1v-qEnRn=RxaC0M6ZaTb4Cj8RyG+Dvs4MAHSw@mail.gmail.com>
>>> On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> >> Tom Lane wrote:
>>> >>> I wouldn't even put a lot of faith in the errno being meaningful,
>>> >>> considering that it does close() calls before capturing the errno.
>>> >
>>> >> So we do close() in a bunch of places while closing shop, which calls
>>> >> _close() on Windows; this function sets errno.
>>> >
>>> > But only on failure, no?  The close()s usually shouldn't fail, and
>>> > therefore shouldn't change errno, it's just that you can't trust that
>>> > 100%.
>>> >
>>> > I think likely what's happening is that we're seeing a leftover value from
>>> > some previous syscall that set GetLastError's result (and, presumably,
>>> > wasn't fatal so far as pg_upgrade was concerned).
>>> >
>>>
>>> It means that read() syscall could not read BLOCKSZ bytes because
>>> probably _vm file is corrupted?
>>>
>>> >     if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
>>>
>>> It could be happen that read() syscall stopped to read at where it
>>> reads bits representing '\n' characters (0x5c6e) because we don't open
>>> file with O_BINARY flag?
>>
>> Windows behaves stupidly there. fread() and even() read converts
>> '\r\n' into '\n' when text mode so every sequence of [0d 0a] in
>> the reading bytes shortens the data length by 1 byte.
>>
>
> Ah, [0d 0a] is right.
> After I manually added [0d 0a] into _vm file, I reproduced this bug on Windows.
> I will post the patch.
>

Attached patch fixes this issue and doesn't include the improvement
for error messaging.
Please find attached file.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

add_obinary_flag.patch

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

30 September 2016, 16:39:04

Masahiko Sawada <sawada.mshk@gmail.com> writes:

> +#ifndef WIN32
>      if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
> +#else
> +    if ((src_fd = open(fromfile, O_RDONLY | O_BINARY)) < 0)
> +#endif

This is easier with PG_BINARY.  Also, both open() calls need this.

I'm rather inclined to also stick PG_BINARY into the open() calls in
copy_file() as well.  That function isn't actively broken since it's
not used under Windows, but I think it's highly likely that the
current bug arose from copying-and-pasting that code, so leaving it
in a state that's unsafe for Windows is just asking for trouble.

Attached is the patch I'm currently working with to fix all three
of the portability issues here (PG_BINARY, endianness, alignment,
and there's some cosmetic changes too).

> +            pg_log(PG_WARNING,
> +                   "could not read expected bytes: read = %u, expected = %u\n",
> +                   BLCKSZ, bytesRead);

I think this is probably better done as part of a wholesale revision
of the error reporting in this file.  What I have in mind is to
redefine copyFile, linkFile, and rewriteVisibilityMap as all being
responsible for reporting their own errors and then exiting (so
they can just return void).  We'd need to pass in the schema name
and table name so that they can include that context, so that the
reports don't get any less complete than they were before, but that
does not seem like a big downside.

A variant would be to have them print the error messages but then
return a bool success flag, leaving the caller to decide whether
it's fatal or not.  But that would be more complicated and I see
no real reason to think pg_upgrade would ever need it.

            regards, tom lane

diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index b33f0b4..9ae92a9 100644
*** a/src/bin/pg_upgrade/file.c
--- b/src/bin/pg_upgrade/file.c
***************
*** 18,25 ****
  #include <sys/stat.h>
  #include <fcntl.h>

- #define BITS_PER_HEAPBLOCK_OLD 1
-

  #ifndef WIN32
  static int    copy_file(const char *fromfile, const char *tofile);
--- 18,23 ----
*************** copy_file(const char *srcfile, const cha
*** 84,93 ****
          return -1;
      }

!     if ((src_fd = open(srcfile, O_RDONLY, 0)) < 0)
          return -1;

!     if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) < 0)
      {
          save_errno = errno;

--- 82,92 ----
          return -1;
      }

!     if ((src_fd = open(srcfile, O_RDONLY | PG_BINARY, 0)) < 0)
          return -1;

!     if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
!                         S_IRUSR | S_IWUSR)) < 0)
      {
          save_errno = errno;

*************** copy_file(const char *srcfile, const cha
*** 153,183 ****
   * version, we could refuse to copy visibility maps from the old cluster
   * to the new cluster; the next VACUUM would recreate them, but at the
   * price of scanning the entire table.  So, instead, we rewrite the old
!  * visibility maps in the new format.  That way, the all-visible bit
!  * remains set for the pages for which it was set previously.  The
!  * all-frozen bit is never set by this conversion; we leave that to
!  * VACUUM.
   */
  const char *
  rewriteVisibilityMap(const char *fromfile, const char *tofile)
  {
!     int            src_fd = 0;
!     int            dst_fd = 0;
!     char        buffer[BLCKSZ];
!     ssize_t        bytesRead;
      ssize_t        totalBytesRead = 0;
      ssize_t        src_filesize;
      int            rewriteVmBytesPerPage;
      BlockNumber new_blkno = 0;
      struct stat statbuf;

!     /* Compute we need how many old page bytes to rewrite a new page */
      rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2;

      if ((fromfile == NULL) || (tofile == NULL))
          return "Invalid old file or new file";

!     if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
          return getErrorText();

      if (fstat(src_fd, &statbuf) != 0)
--- 152,181 ----
   * version, we could refuse to copy visibility maps from the old cluster
   * to the new cluster; the next VACUUM would recreate them, but at the
   * price of scanning the entire table.  So, instead, we rewrite the old
!  * visibility maps in the new format.  That way, the all-visible bits
!  * remain set for the pages for which they were set previously.  The
!  * all-frozen bits are never set by this conversion; we leave that to VACUUM.
   */
  const char *
  rewriteVisibilityMap(const char *fromfile, const char *tofile)
  {
!     int            src_fd;
!     int            dst_fd;
!     char       *buffer;
!     char       *new_vmbuf;
      ssize_t        totalBytesRead = 0;
      ssize_t        src_filesize;
      int            rewriteVmBytesPerPage;
      BlockNumber new_blkno = 0;
      struct stat statbuf;

!     /* Compute number of old-format bytes per new page */
      rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2;

      if ((fromfile == NULL) || (tofile == NULL))
          return "Invalid old file or new file";

!     if ((src_fd = open(fromfile, O_RDONLY | PG_BINARY, 0)) < 0)
          return getErrorText();

      if (fstat(src_fd, &statbuf) != 0)
*************** rewriteVisibilityMap(const char *fromfil
*** 186,192 ****
          return getErrorText();
      }

!     if ((dst_fd = open(tofile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) < 0)
      {
          close(src_fd);
          return getErrorText();
--- 184,191 ----
          return getErrorText();
      }

!     if ((dst_fd = open(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
!                        S_IRUSR | S_IWUSR)) < 0)
      {
          close(src_fd);
          return getErrorText();
*************** rewriteVisibilityMap(const char *fromfil
*** 196,208 ****
      src_filesize = statbuf.st_size;

      /*
       * Turn each visibility map page into 2 pages one by one. Each new page
!      * has the same page header as the old one.  If the last section of last
!      * page is empty, we skip it, mostly to avoid turning one-page visibility
!      * maps for small relations into two pages needlessly.
       */
      while (totalBytesRead < src_filesize)
      {
          char       *old_cur;
          char       *old_break;
          char       *old_blkend;
--- 195,215 ----
      src_filesize = statbuf.st_size;

      /*
+      * Malloc the work buffers, rather than making them local arrays, to
+      * ensure adequate alignment.
+      */
+     buffer = (char *) pg_malloc(BLCKSZ);
+     new_vmbuf = (char *) pg_malloc(BLCKSZ);
+
+     /*
       * Turn each visibility map page into 2 pages one by one. Each new page
!      * has the same page header as the old one.  If the last section of the
!      * last page is empty, we skip it, mostly to avoid turning one-page
!      * visibility maps for small relations into two pages needlessly.
       */
      while (totalBytesRead < src_filesize)
      {
+         ssize_t        bytesRead;
          char       *old_cur;
          char       *old_break;
          char       *old_blkend;
*************** rewriteVisibilityMap(const char *fromfil
*** 225,285 ****
          /*
           * These old_* variables point to old visibility map page. old_cur
           * points to current position on old page. old_blkend points to end of
!          * old block. old_break points to old page break position for
!          * rewriting a new page. After wrote a new page, old_break proceeds
!          * rewriteVmBytesPerPage bytes.
           */
          old_cur = buffer + SizeOfPageHeaderData;
          old_blkend = buffer + bytesRead;
          old_break = old_cur + rewriteVmBytesPerPage;

!         while (old_blkend >= old_break)
          {
!             char        new_vmbuf[BLCKSZ];
!             char       *new_cur = new_vmbuf;
              bool        empty = true;
              bool        old_lastpart;

!             /* Copy page header in advance */
              memcpy(new_vmbuf, &pageheader, SizeOfPageHeaderData);

!             /* Rewrite the last part of the old page? */
!             old_lastpart = old_lastblk && (old_blkend == old_break);

!             new_cur += SizeOfPageHeaderData;

              /* Process old page bytes one by one, and turn it into new page. */
!             while (old_break > old_cur)
              {
                  uint16        new_vmbits = 0;
                  int            i;

                  /* Generate new format bits while keeping old information */
                  for (i = 0; i < BITS_PER_BYTE; i++)
                  {
!                     uint8        byte = *(uint8 *) old_cur;
!
!                     if (byte & (1 << (BITS_PER_HEAPBLOCK_OLD * i)))
                      {
                          empty = false;
!                         new_vmbits |= 1 << (BITS_PER_HEAPBLOCK * i);
                      }
                  }

                  /* Copy new visibility map bit to new format page */
!                 memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK);

!                 old_cur += BITS_PER_HEAPBLOCK_OLD;
                  new_cur += BITS_PER_HEAPBLOCK;
              }

!             /* If the last part of the old page is empty, skip writing it */
              if (old_lastpart && empty)
                  break;

!             /* Set new checksum for a visibility map page (if enabled) */
!             if (old_cluster.controldata.data_checksum_version != 0 &&
!                 new_cluster.controldata.data_checksum_version != 0)
                  ((PageHeader) new_vmbuf)->pd_checksum =
                      pg_checksum_page(new_vmbuf, new_blkno);

--- 232,290 ----
          /*
           * These old_* variables point to old visibility map page. old_cur
           * points to current position on old page. old_blkend points to end of
!          * old block.  old_break is the end+1 position on the old page for the
!          * data that will be transferred to the current new page.
           */
          old_cur = buffer + SizeOfPageHeaderData;
          old_blkend = buffer + bytesRead;
          old_break = old_cur + rewriteVmBytesPerPage;

!         while (old_break <= old_blkend)
          {
!             char       *new_cur;
              bool        empty = true;
              bool        old_lastpart;

!             /* First, copy old page header to new page */
              memcpy(new_vmbuf, &pageheader, SizeOfPageHeaderData);

!             /* Rewriting the last part of the last old page? */
!             old_lastpart = old_lastblk && (old_break == old_blkend);

!             new_cur = new_vmbuf + SizeOfPageHeaderData;

              /* Process old page bytes one by one, and turn it into new page. */
!             while (old_cur < old_break)
              {
+                 uint8        byte = *(uint8 *) old_cur;
                  uint16        new_vmbits = 0;
                  int            i;

                  /* Generate new format bits while keeping old information */
                  for (i = 0; i < BITS_PER_BYTE; i++)
                  {
!                     if (byte & (1 << i))
                      {
                          empty = false;
!                         new_vmbits |=
!                             VISIBILITYMAP_ALL_VISIBLE << (BITS_PER_HEAPBLOCK * i);
                      }
                  }

                  /* Copy new visibility map bit to new format page */
!                 new_cur[0] = (char) (new_vmbits & 0xFF);
!                 new_cur[1] = (char) (new_vmbits >> 8);

!                 old_cur++;
                  new_cur += BITS_PER_HEAPBLOCK;
              }

!             /* If the last part of the last page is empty, skip writing it */
              if (old_lastpart && empty)
                  break;

!             /* Set new checksum for visibility map page, if enabled */
!             if (new_cluster.controldata.data_checksum_version != 0)
                  ((PageHeader) new_vmbuf)->pd_checksum =
                      pg_checksum_page(new_vmbuf, new_blkno);

*************** rewriteVisibilityMap(const char *fromfil
*** 290,306 ****
                  return getErrorText();
              }

              old_break += rewriteVmBytesPerPage;
              new_blkno++;
          }
      }

!     /* Close files */
      close(dst_fd);
      close(src_fd);

      return NULL;
-
  }

  void
--- 295,313 ----
                  return getErrorText();
              }

+             /* Advance for next new page */
              old_break += rewriteVmBytesPerPage;
              new_blkno++;
          }
      }

!     /* Clean up */
!     pg_free(buffer);
!     pg_free(new_vmbuf);
      close(dst_fd);
      close(src_fd);

      return NULL;
  }

  void

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Andrew Dunstan

Date:

30 September 2016, 16:48:34


On 09/30/2016 12:24 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 09/30/2016 10:25 AM, Tom Lane wrote:
>>> Oh!  How would I enable or use that?
>> 1. Pull the latest version of the module from git.
>> 2. enable it in your buildfarm config file
>> 3. do "run_branches.pl --run-all --verbose --test" and watch the output
> Seems to be some additional prep work needed somewhere ...
>
> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
>
>             


Oh sorry, you also need an entry for that in the config file. Mine has:
   upgrade_install_root => '/home/bf/bfr/root/upgrade',


cheers

andre

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Thomas Kellerer

Date:

30 September 2016, 16:48:41

Masahiko Sawada schrieb am 30.09.2016 um 12:54:
>> I did this on two different computers, one with Windows 10 the other with Windows 7.
>> (only test-databases, so no real issue anyway)
>>
>> In both cases running a "vacuum full" for the table in question fixed the problem and pg_upgrade finished without
problems.
>
> Because vacuum full removes the _vm file, pg_upgrade completed job successfully.
> If you still have the _vm file
> ("d:/Daten/db/pgdata95/base/16410/85358_vm") that lead an error, is it
> possible that you check if there is '\r\n' [0d 0a] character in that
> _vm file or share that _vm file with us?


Yes, I saved one of the clusters.

The file can be downloaded from here: http://www.kellerer.eu/85358_vm

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Tom Lane

Date:

01 October 2016, 18:01:41

Andrew Dunstan <andrew@dunslane.net> writes:
> On 09/30/2016 12:24 PM, Tom Lane wrote:
>> Seems to be some additional prep work needed somewhere ...
>> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.

> Oh sorry, you also need an entry for that in the config file. Mine has:
>     upgrade_install_root => '/home/bf/bfr/root/upgrade',

Yesterday's runs of prairiedog had this turned on.  I'm not sure how
to interpret the output (because there isn't any, ahem), but it does
not appear that the test detected anything wrong.

I was able to demonstrate the problem, and that my patch of yesterday
fixed it, by building a table in 9.5 that was all-visible, deleting a few
widely scattered tuples, and then pg_upgrading.  It wasn't that easy to
exhibit a query failure, though --- my first attempt failed because I'd
done an index-only scan in 9.5 before upgrading, and that had the side
effect of marking the index tuples dead, making the same query in HEAD
produce correct answers despite having invalid vismap state.  If we wanted
a regression test, it seems like we would have to build a custom test
specifically designed to detect this type of problem, and I'm not sure
it's worth it.
        regards, tom lane

Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From

Andrew Dunstan

Date:

02 October 2016, 12:42:48


On 10/01/2016 02:01 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 09/30/2016 12:24 PM, Tom Lane wrote:
>>> Seems to be some additional prep work needed somewhere ...
>>> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51.
>> Oh sorry, you also need an entry for that in the config file. Mine has:
>>      upgrade_install_root => '/home/bf/bfr/root/upgrade',
> Yesterday's runs of prairiedog had this turned on.  I'm not sure how
> to interpret the output (because there isn't any, ahem), but it does
> not appear that the test detected anything wrong.


I'm working on collecting the log files and making the module more 
useful. But you can see pretty much all the logs (lots of them!) after a 
run inside the upgrade directories.

cheers

andrew