Thread: Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Thomas Kellerer <spam_eater@gmx.net> writes: > for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be copied. Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new code for 9.6 and hasn't really gotten that much testing. Its error reporting is shamefully bad --- you can't tell which step failed, and I wouldn't even put a lot of faith in the errno being meaningful, considering that it does close() calls before capturing the errno. But what gets my attention in this connection is that it doesn't seem to be taking the trouble to open the files in binary mode. Could that lead to the reported failure? Not sure, but it seems like at the least it could result in corrupted VM files. Has anyone tested vismap upgrades on Windows, and made an effort to validate that the output wasn't garbage? regards, tom lane
Tom Lane wrote: > Thomas Kellerer <spam_eater@gmx.net> writes: > > for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be copied. > > Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new > code for 9.6 and hasn't really gotten that much testing. Its error > reporting is shamefully bad --- you can't tell which step failed, and > I wouldn't even put a lot of faith in the errno being meaningful, > considering that it does close() calls before capturing the errno. So we do close() in a bunch of places while closing shop, which calls _close() on Windows; this function sets errno. Then we call getErrorText(), which calls _dosmaperr() on the result of GetLastError(). But the last-error stuff is not set by _close; I suppose GetLastError() returns 0 in that case, which promps _doserrmap to set errno to 0. http://stackoverflow.com/questions/20056851/getlasterror-errno-formatmessagea-and-strerror-s So this wouldn't quite have the effect you say; I think it'd say "Failure while copying ...: Success" instead. However surely we should have errno save/restore. Other than that, I think the _dosmaperr() call should go entirely. Moreover I think getErrorText() as a whole is misconceived and should be removed altogether (why pstrdup the string?). There are very few places in pg_upgrade that require _dosmaperr; I can see only copyFile and linkFile. All the others should just be doing strerror() only, at least according to the manual. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I wrote: > But what gets my attention in this connection is that it doesn't > seem to be taking the trouble to open the files in binary mode. > Could that lead to the reported failure? Not sure, but it seems > like at the least it could result in corrupted VM files. On further thought, it definitely would lead to some sort of hard-to-diagnose error. Assuming there's some bytes that look like \n, Windows read() would expand them to \r\n, which not only produces garbage vismap data but throws off the page boundaries in the file. rewriteVisibilityMap() would not notice that, since it contains exactly no sanity checking on the page headers, but what it would notice is getting a short read on what it'll think is a partial last page. Then it does this: if ((bytesRead = read(src_fd, buffer, BLCKSZ)) != BLCKSZ) { close(dst_fd); close(src_fd); return getErrorText(); } The read() won't have set errno, since by its lights there's nothing wrong. So while we know that getErrorText saw errno == EINVAL, there's no way to guess what call that might've been left over from. BTW, just to add to the already long list of mortal sins in this bit of code, I believe it does 100% the wrong thing on big-endian hardware. uint16 new_vmbits = 0; ... /* Copy new visibility map bit to new formatpage */ memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK); That's going to drop the two new bytes down in exactly the wrong order if big-endian. Oh, and for even more fun, this bit will dump core on alignment-picky machines, if the vmbuf local array starts on an odd boundary: ((PageHeader) new_vmbuf)->pd_checksum = pg_checksum_page(new_vmbuf, new_blkno); We might've caught these things earlier if the buildfarm testing included cross-version upgrades, but of course that requires a lot of test infrastructure that's not there ... regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Moreover I think getErrorText() as a whole is misconceived and should be > removed altogether (why pstrdup the string?). Indeed. I think bouncing the error back to the caller is misguided to start with, seeing that the caller is just going to do pg_fatal anyway. We should rewrite these functions to just error out internally, which will make it much easier to provide decent error reporting indicating which call failed. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> I wouldn't even put a lot of faith in the errno being meaningful, >> considering that it does close() calls before capturing the errno. > So we do close() in a bunch of places while closing shop, which calls > _close() on Windows; this function sets errno. But only on failure, no? The close()s usually shouldn't fail, and therefore shouldn't change errno, it's just that you can't trust that 100%. I think likely what's happening is that we're seeing a leftover value from some previous syscall that set GetLastError's result (and, presumably, wasn't fatal so far as pg_upgrade was concerned). regards, tom lane
On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Tom Lane wrote: >>> I wouldn't even put a lot of faith in the errno being meaningful, >>> considering that it does close() calls before capturing the errno. > >> So we do close() in a bunch of places while closing shop, which calls >> _close() on Windows; this function sets errno. > > But only on failure, no? The close()s usually shouldn't fail, and > therefore shouldn't change errno, it's just that you can't trust that > 100%. > > I think likely what's happening is that we're seeing a leftover value from > some previous syscall that set GetLastError's result (and, presumably, > wasn't fatal so far as pg_upgrade was concerned). > It means that read() syscall could not read BLOCKSZ bytes because probably _vm file is corrupted? > if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) It could be happen that read() syscall stopped to read at where it reads bits representing '\n' characters (0x5c6e) because we don't open file with O_BINARY flag? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hello, At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC1Ztdbp1v-qEnRn=RxaC0M6ZaTb4Cj8RyG+Dvs4MAHSw@mail.gmail.com> > On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> Tom Lane wrote: > >>> I wouldn't even put a lot of faith in the errno being meaningful, > >>> considering that it does close() calls before capturing the errno. > > > >> So we do close() in a bunch of places while closing shop, which calls > >> _close() on Windows; this function sets errno. > > > > But only on failure, no? The close()s usually shouldn't fail, and > > therefore shouldn't change errno, it's just that you can't trust that > > 100%. > > > > I think likely what's happening is that we're seeing a leftover value from > > some previous syscall that set GetLastError's result (and, presumably, > > wasn't fatal so far as pg_upgrade was concerned). > > > > It means that read() syscall could not read BLOCKSZ bytes because > probably _vm file is corrupted? > > > if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) > > It could be happen that read() syscall stopped to read at where it > reads bits representing '\n' characters (0x5c6e) because we don't open > file with O_BINARY flag? Windows behaves stupidly there. fread() and even() read converts '\r\n' into '\n' when text mode so every sequence of [0d 0a] in the reading bytes shortens the data length by 1 byte. I was careless about that.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Sep 30, 2016 at 1:26 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC1Ztdbp1v-qEnRn=RxaC0M6ZaTb4Cj8RyG+Dvs4MAHSw@mail.gmail.com> >> On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> >> Tom Lane wrote: >> >>> I wouldn't even put a lot of faith in the errno being meaningful, >> >>> considering that it does close() calls before capturing the errno. >> > >> >> So we do close() in a bunch of places while closing shop, which calls >> >> _close() on Windows; this function sets errno. >> > >> > But only on failure, no? The close()s usually shouldn't fail, and >> > therefore shouldn't change errno, it's just that you can't trust that >> > 100%. >> > >> > I think likely what's happening is that we're seeing a leftover value from >> > some previous syscall that set GetLastError's result (and, presumably, >> > wasn't fatal so far as pg_upgrade was concerned). >> > >> >> It means that read() syscall could not read BLOCKSZ bytes because >> probably _vm file is corrupted? >> >> > if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) >> >> It could be happen that read() syscall stopped to read at where it >> reads bits representing '\n' characters (0x5c6e) because we don't open >> file with O_BINARY flag? > > Windows behaves stupidly there. fread() and even() read converts > '\r\n' into '\n' when text mode so every sequence of [0d 0a] in > the reading bytes shortens the data length by 1 byte. > Ah, [0d 0a] is right. After I manually added [0d 0a] into _vm file, I reproduced this bug on Windows. I will post the patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Sep 30, 2016 at 6:40 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Tom Lane schrieb am 29.09.2016 um 23:10: >> Thomas Kellerer <spam_eater@gmx.net> writes: >>> for some reason pg_upgrade failed on Windows 10 for me, with an error message that one specifc _vm file couldn't be copied. >> >> Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new >> code for 9.6 and hasn't really gotten that much testing. Its error >> reporting is shamefully bad --- you can't tell which step failed, and >> I wouldn't even put a lot of faith in the errno being meaningful, >> considering that it does close() calls before capturing the errno. >> >> But what gets my attention in this connection is that it doesn't >> seem to be taking the trouble to open the files in binary mode. >> Could that lead to the reported failure? Not sure, but it seems >> like at the least it could result in corrupted VM files. > > I did this on two different computers, one with Windows 10 the other with Windows 7. > (only test-databases, so no real issue anyway) > > In both cases running a "vacuum full" for the table in question fixed the problem and pg_upgrade finished without problems. Because vacuum full removes the _vm file, pg_upgrade completed job successfully. If you still have the _vm file ("d:/Daten/db/pgdata95/base/16410/85358_vm") that lead an error, is it possible that you check if there is '\r\n' [0d 0a] character in that _vm file or share that _vm file with us? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 09/29/2016 05:48 PM, Tom Lane wrote: > We might've caught these things earlier if the buildfarm testing > included cross-version upgrades, but of course that requires a > lot of test infrastructure that's not there ... > > I have done quite a bit of work on this - see <https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm> This actually runs on crake, and it has found problems in the past which we've fixed. However, it requires quite a deal of disk space (4GB or so), and the results are not stable, which is why I haven't enabled it. What I am thinking of doing is making the diff tests fuzzy - if, say, we don't get a diff of more than 2000 lines we won't consider it a failure. Among this module's advantages is that it tests upgrading quite a bit more than the standard pg_upgrade check - all the buildfarm's regression databases, not just the one created by "make check". For example, currently it is failing to upgrade from 9.6 to HEAD with this error: Could not load library "$libdir/hstore_plpython2" ERROR: could not load library "/home/bf/bfr/root/upgrade/HEAD/inst/lib/postgresql/hstore_plpython2.so": /home/bf/bfr/root/upgrade/HEAD/inst/lib/postgresql/hstore_plpython2.so: undefined symbol: _Py_NoneStruct I need to look into that. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 09/29/2016 05:48 PM, Tom Lane wrote: >> We might've caught these things earlier if the buildfarm testing >> included cross-version upgrades, but of course that requires a >> lot of test infrastructure that's not there ... > I have done quite a bit of work on this - see > <https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm> Oh! How would I enable or use that? > This actually runs on crake, and it has found problems in the past which > we've fixed. However, it requires quite a deal of disk space (4GB or > so), and the results are not stable, which is why I haven't enabled it. Fair enough, but right now I'd like to do a one-shot test using a big-endian machine to see whether it catches the apparent endianness problem in rewriteVisibilityMap. I have a feeling that that is something that won't easily be caught by automated checks, because it requires a case where the table pages are in a variety of visibility states, and even if corruption has happened, only index-only-scan queries would notice. regards, tom lane
On 09/30/2016 10:25 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 09/29/2016 05:48 PM, Tom Lane wrote: >>> We might've caught these things earlier if the buildfarm testing >>> included cross-version upgrades, but of course that requires a >>> lot of test infrastructure that's not there ... >> I have done quite a bit of work on this - see >> <https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm> > Oh! How would I enable or use that? 1. Pull the latest version of the module from git. 2. enable it in your buildfarm config file 3. do "run_branches.pl --run-all --verbose --test" and watch the output cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 09/30/2016 10:25 AM, Tom Lane wrote: >> Oh! How would I enable or use that? > 1. Pull the latest version of the module from git. > 2. enable it in your buildfarm config file > 3. do "run_branches.pl --run-all --verbose --test" and watch the output Seems to be some additional prep work needed somewhere ... No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. regards, tom lane
On Fri, Sep 30, 2016 at 2:40 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Sep 30, 2016 at 1:26 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Hello, >> >> At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC1Ztdbp1v-qEnRn=RxaC0M6ZaTb4Cj8RyG+Dvs4MAHSw@mail.gmail.com> >>> On Fri, Sep 30, 2016 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> >> Tom Lane wrote: >>> >>> I wouldn't even put a lot of faith in the errno being meaningful, >>> >>> considering that it does close() calls before capturing the errno. >>> > >>> >> So we do close() in a bunch of places while closing shop, which calls >>> >> _close() on Windows; this function sets errno. >>> > >>> > But only on failure, no? The close()s usually shouldn't fail, and >>> > therefore shouldn't change errno, it's just that you can't trust that >>> > 100%. >>> > >>> > I think likely what's happening is that we're seeing a leftover value from >>> > some previous syscall that set GetLastError's result (and, presumably, >>> > wasn't fatal so far as pg_upgrade was concerned). >>> > >>> >>> It means that read() syscall could not read BLOCKSZ bytes because >>> probably _vm file is corrupted? >>> >>> > if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) >>> >>> It could be happen that read() syscall stopped to read at where it >>> reads bits representing '\n' characters (0x5c6e) because we don't open >>> file with O_BINARY flag? >> >> Windows behaves stupidly there. fread() and even() read converts >> '\r\n' into '\n' when text mode so every sequence of [0d 0a] in >> the reading bytes shortens the data length by 1 byte. >> > > Ah, [0d 0a] is right. > After I manually added [0d 0a] into _vm file, I reproduced this bug on Windows. > I will post the patch. > Attached patch fixes this issue and doesn't include the improvement for error messaging. Please find attached file. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Masahiko Sawada <sawada.mshk@gmail.com> writes: > +#ifndef WIN32 > if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) > +#else > + if ((src_fd = open(fromfile, O_RDONLY | O_BINARY)) < 0) > +#endif This is easier with PG_BINARY. Also, both open() calls need this. I'm rather inclined to also stick PG_BINARY into the open() calls in copy_file() as well. That function isn't actively broken since it's not used under Windows, but I think it's highly likely that the current bug arose from copying-and-pasting that code, so leaving it in a state that's unsafe for Windows is just asking for trouble. Attached is the patch I'm currently working with to fix all three of the portability issues here (PG_BINARY, endianness, alignment, and there's some cosmetic changes too). > + pg_log(PG_WARNING, > + "could not read expected bytes: read = %u, expected = %u\n", > + BLCKSZ, bytesRead); I think this is probably better done as part of a wholesale revision of the error reporting in this file. What I have in mind is to redefine copyFile, linkFile, and rewriteVisibilityMap as all being responsible for reporting their own errors and then exiting (so they can just return void). We'd need to pass in the schema name and table name so that they can include that context, so that the reports don't get any less complete than they were before, but that does not seem like a big downside. A variant would be to have them print the error messages but then return a bool success flag, leaving the caller to decide whether it's fatal or not. But that would be more complicated and I see no real reason to think pg_upgrade would ever need it. regards, tom lane diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c index b33f0b4..9ae92a9 100644 *** a/src/bin/pg_upgrade/file.c --- b/src/bin/pg_upgrade/file.c *************** *** 18,25 **** #include <sys/stat.h> #include <fcntl.h> - #define BITS_PER_HEAPBLOCK_OLD 1 - #ifndef WIN32 static int copy_file(const char *fromfile, const char *tofile); --- 18,23 ---- *************** copy_file(const char *srcfile, const cha *** 84,93 **** return -1; } ! if ((src_fd = open(srcfile, O_RDONLY, 0)) < 0) return -1; ! if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) < 0) { save_errno = errno; --- 82,92 ---- return -1; } ! if ((src_fd = open(srcfile, O_RDONLY | PG_BINARY, 0)) < 0) return -1; ! if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, ! S_IRUSR | S_IWUSR)) < 0) { save_errno = errno; *************** copy_file(const char *srcfile, const cha *** 153,183 **** * version, we could refuse to copy visibility maps from the old cluster * to the new cluster; the next VACUUM would recreate them, but at the * price of scanning the entire table. So, instead, we rewrite the old ! * visibility maps in the new format. That way, the all-visible bit ! * remains set for the pages for which it was set previously. The ! * all-frozen bit is never set by this conversion; we leave that to ! * VACUUM. */ const char * rewriteVisibilityMap(const char *fromfile, const char *tofile) { ! int src_fd = 0; ! int dst_fd = 0; ! char buffer[BLCKSZ]; ! ssize_t bytesRead; ssize_t totalBytesRead = 0; ssize_t src_filesize; int rewriteVmBytesPerPage; BlockNumber new_blkno = 0; struct stat statbuf; ! /* Compute we need how many old page bytes to rewrite a new page */ rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2; if ((fromfile == NULL) || (tofile == NULL)) return "Invalid old file or new file"; ! if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) return getErrorText(); if (fstat(src_fd, &statbuf) != 0) --- 152,181 ---- * version, we could refuse to copy visibility maps from the old cluster * to the new cluster; the next VACUUM would recreate them, but at the * price of scanning the entire table. So, instead, we rewrite the old ! * visibility maps in the new format. That way, the all-visible bits ! * remain set for the pages for which they were set previously. The ! * all-frozen bits are never set by this conversion; we leave that to VACUUM. */ const char * rewriteVisibilityMap(const char *fromfile, const char *tofile) { ! int src_fd; ! int dst_fd; ! char *buffer; ! char *new_vmbuf; ssize_t totalBytesRead = 0; ssize_t src_filesize; int rewriteVmBytesPerPage; BlockNumber new_blkno = 0; struct stat statbuf; ! /* Compute number of old-format bytes per new page */ rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2; if ((fromfile == NULL) || (tofile == NULL)) return "Invalid old file or new file"; ! if ((src_fd = open(fromfile, O_RDONLY | PG_BINARY, 0)) < 0) return getErrorText(); if (fstat(src_fd, &statbuf) != 0) *************** rewriteVisibilityMap(const char *fromfil *** 186,192 **** return getErrorText(); } ! if ((dst_fd = open(tofile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) < 0) { close(src_fd); return getErrorText(); --- 184,191 ---- return getErrorText(); } ! if ((dst_fd = open(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, ! S_IRUSR | S_IWUSR)) < 0) { close(src_fd); return getErrorText(); *************** rewriteVisibilityMap(const char *fromfil *** 196,208 **** src_filesize = statbuf.st_size; /* * Turn each visibility map page into 2 pages one by one. Each new page ! * has the same page header as the old one. If the last section of last ! * page is empty, we skip it, mostly to avoid turning one-page visibility ! * maps for small relations into two pages needlessly. */ while (totalBytesRead < src_filesize) { char *old_cur; char *old_break; char *old_blkend; --- 195,215 ---- src_filesize = statbuf.st_size; /* + * Malloc the work buffers, rather than making them local arrays, to + * ensure adequate alignment. + */ + buffer = (char *) pg_malloc(BLCKSZ); + new_vmbuf = (char *) pg_malloc(BLCKSZ); + + /* * Turn each visibility map page into 2 pages one by one. Each new page ! * has the same page header as the old one. If the last section of the ! * last page is empty, we skip it, mostly to avoid turning one-page ! * visibility maps for small relations into two pages needlessly. */ while (totalBytesRead < src_filesize) { + ssize_t bytesRead; char *old_cur; char *old_break; char *old_blkend; *************** rewriteVisibilityMap(const char *fromfil *** 225,285 **** /* * These old_* variables point to old visibility map page. old_cur * points to current position on old page. old_blkend points to end of ! * old block. old_break points to old page break position for ! * rewriting a new page. After wrote a new page, old_break proceeds ! * rewriteVmBytesPerPage bytes. */ old_cur = buffer + SizeOfPageHeaderData; old_blkend = buffer + bytesRead; old_break = old_cur + rewriteVmBytesPerPage; ! while (old_blkend >= old_break) { ! char new_vmbuf[BLCKSZ]; ! char *new_cur = new_vmbuf; bool empty = true; bool old_lastpart; ! /* Copy page header in advance */ memcpy(new_vmbuf, &pageheader, SizeOfPageHeaderData); ! /* Rewrite the last part of the old page? */ ! old_lastpart = old_lastblk && (old_blkend == old_break); ! new_cur += SizeOfPageHeaderData; /* Process old page bytes one by one, and turn it into new page. */ ! while (old_break > old_cur) { uint16 new_vmbits = 0; int i; /* Generate new format bits while keeping old information */ for (i = 0; i < BITS_PER_BYTE; i++) { ! uint8 byte = *(uint8 *) old_cur; ! ! if (byte & (1 << (BITS_PER_HEAPBLOCK_OLD * i))) { empty = false; ! new_vmbits |= 1 << (BITS_PER_HEAPBLOCK * i); } } /* Copy new visibility map bit to new format page */ ! memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK); ! old_cur += BITS_PER_HEAPBLOCK_OLD; new_cur += BITS_PER_HEAPBLOCK; } ! /* If the last part of the old page is empty, skip writing it */ if (old_lastpart && empty) break; ! /* Set new checksum for a visibility map page (if enabled) */ ! if (old_cluster.controldata.data_checksum_version != 0 && ! new_cluster.controldata.data_checksum_version != 0) ((PageHeader) new_vmbuf)->pd_checksum = pg_checksum_page(new_vmbuf, new_blkno); --- 232,290 ---- /* * These old_* variables point to old visibility map page. old_cur * points to current position on old page. old_blkend points to end of ! * old block. old_break is the end+1 position on the old page for the ! * data that will be transferred to the current new page. */ old_cur = buffer + SizeOfPageHeaderData; old_blkend = buffer + bytesRead; old_break = old_cur + rewriteVmBytesPerPage; ! while (old_break <= old_blkend) { ! char *new_cur; bool empty = true; bool old_lastpart; ! /* First, copy old page header to new page */ memcpy(new_vmbuf, &pageheader, SizeOfPageHeaderData); ! /* Rewriting the last part of the last old page? */ ! old_lastpart = old_lastblk && (old_break == old_blkend); ! new_cur = new_vmbuf + SizeOfPageHeaderData; /* Process old page bytes one by one, and turn it into new page. */ ! while (old_cur < old_break) { + uint8 byte = *(uint8 *) old_cur; uint16 new_vmbits = 0; int i; /* Generate new format bits while keeping old information */ for (i = 0; i < BITS_PER_BYTE; i++) { ! if (byte & (1 << i)) { empty = false; ! new_vmbits |= ! VISIBILITYMAP_ALL_VISIBLE << (BITS_PER_HEAPBLOCK * i); } } /* Copy new visibility map bit to new format page */ ! new_cur[0] = (char) (new_vmbits & 0xFF); ! new_cur[1] = (char) (new_vmbits >> 8); ! old_cur++; new_cur += BITS_PER_HEAPBLOCK; } ! /* If the last part of the last page is empty, skip writing it */ if (old_lastpart && empty) break; ! /* Set new checksum for visibility map page, if enabled */ ! if (new_cluster.controldata.data_checksum_version != 0) ((PageHeader) new_vmbuf)->pd_checksum = pg_checksum_page(new_vmbuf, new_blkno); *************** rewriteVisibilityMap(const char *fromfil *** 290,306 **** return getErrorText(); } old_break += rewriteVmBytesPerPage; new_blkno++; } } ! /* Close files */ close(dst_fd); close(src_fd); return NULL; - } void --- 295,313 ---- return getErrorText(); } + /* Advance for next new page */ old_break += rewriteVmBytesPerPage; new_blkno++; } } ! /* Clean up */ ! pg_free(buffer); ! pg_free(new_vmbuf); close(dst_fd); close(src_fd); return NULL; } void
On 09/30/2016 12:24 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 09/30/2016 10:25 AM, Tom Lane wrote: >>> Oh! How would I enable or use that? >> 1. Pull the latest version of the module from git. >> 2. enable it in your buildfarm config file >> 3. do "run_branches.pl --run-all --verbose --test" and watch the output > Seems to be some additional prep work needed somewhere ... > > No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. > No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. > No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. > > Oh sorry, you also need an entry for that in the config file. Mine has: upgrade_install_root => '/home/bf/bfr/root/upgrade', cheers andre
Masahiko Sawada schrieb am 30.09.2016 um 12:54: >> I did this on two different computers, one with Windows 10 the other with Windows 7. >> (only test-databases, so no real issue anyway) >> >> In both cases running a "vacuum full" for the table in question fixed the problem and pg_upgrade finished without problems. > > Because vacuum full removes the _vm file, pg_upgrade completed job successfully. > If you still have the _vm file > ("d:/Daten/db/pgdata95/base/16410/85358_vm") that lead an error, is it > possible that you check if there is '\r\n' [0d 0a] character in that > _vm file or share that _vm file with us? Yes, I saved one of the clusters. The file can be downloaded from here: http://www.kellerer.eu/85358_vm
Andrew Dunstan <andrew@dunslane.net> writes: > On 09/30/2016 12:24 PM, Tom Lane wrote: >> Seems to be some additional prep work needed somewhere ... >> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. > Oh sorry, you also need an entry for that in the config file. Mine has: > upgrade_install_root => '/home/bf/bfr/root/upgrade', Yesterday's runs of prairiedog had this turned on. I'm not sure how to interpret the output (because there isn't any, ahem), but it does not appear that the test detected anything wrong. I was able to demonstrate the problem, and that my patch of yesterday fixed it, by building a table in 9.5 that was all-visible, deleting a few widely scattered tuples, and then pg_upgrading. It wasn't that easy to exhibit a query failure, though --- my first attempt failed because I'd done an index-only scan in 9.5 before upgrading, and that had the side effect of marking the index tuples dead, making the same query in HEAD produce correct answers despite having invalid vismap state. If we wanted a regression test, it seems like we would have to build a custom test specifically designed to detect this type of problem, and I'm not sure it's worth it. regards, tom lane
On 10/01/2016 02:01 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 09/30/2016 12:24 PM, Tom Lane wrote: >>> Seems to be some additional prep work needed somewhere ... >>> No upgrade_install_root at /home/bfarm/bf-scripts/build-farm-4.17/PGBuild/Modules/TestUpgradeXversion.pm line 51. >> Oh sorry, you also need an entry for that in the config file. Mine has: >> upgrade_install_root => '/home/bf/bfr/root/upgrade', > Yesterday's runs of prairiedog had this turned on. I'm not sure how > to interpret the output (because there isn't any, ahem), but it does > not appear that the test detected anything wrong. I'm working on collecting the log files and making the module more useful. But you can see pretty much all the logs (lots of them!) after a run inside the upgrade directories. cheers andrew