Thread: Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Thomas Munro
Date:
On Thu, Nov 21, 2024 at 11:44 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > This literally looks like something like off_t/size_t would be limited to 2^31 somewhere. off_t is 32 bits on Windows. I'd be quite suspicious of the arithmetic involving 'currpos'. What happens if you change all the off_t in walmethods.c/.h to pgoff_t? The lseek() is suspicious too, and might need to be redirected to _lseeki64().
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Jacob Champion
Date:
On Thu, Nov 21, 2024 at 5:53 AM Thomas Munro <thomas.munro@gmail.com> wrote: > The lseek() is suspicious too, > and might need to be redirected to _lseeki64(). There's a call to ftruncate() in there too. Looks like its Windows definition is also 32-bit: #define ftruncate(a,b) chsize(a,b) --Jacob
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Thomas Munro
Date:
On Fri, Nov 22, 2024 at 8:59 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Thu, Nov 21, 2024 at 5:53 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > The lseek() is suspicious too, > > and might need to be redirected to _lseeki64(). > > There's a call to ftruncate() in there too. Looks like its Windows > definition is also 32-bit: > > #define ftruncate(a,b) chsize(a,b) Ah, yes. I wrote a draft fix for those two and more in the 0002 patch[1] for another project[2]. (I'm still planning on proposing the main idea in that thread -- allowing large files for relations -- but I was talked out of doing a generalised pgoff_t-ification drilling through loads of layers all over our tree just for Windows, so the idea was that Windows would simply not get the large file support, and I dropped that patch from that project. But you can see the functions with that problem that I found, and how I thought we should fix them. Obviously I didn't know we still had more bugs around existing ways to make large files, which various people have tackled over the years, so there are some functions that are 64 bit capable like stat, just not the full set...) As for "could not close ...", I guess dodgy data type maths must be causing tar_close() to return -1 while trying to compute padding or something like that, it's not literally close() that is failing. [1] https://www.postgresql.org/message-id/attachment/146666/0002-Use-pgoff_t-in-system-call-replacements-on-Windows.patch [2] https://www.postgresql.org/message-id/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Thomas Munro
Date:
On Fri, Nov 22, 2024 at 10:55 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > but with attached _lseeki64 dirty patch I've got success and resulting tarball greater than 2^31 too: > > C:\compiledpg\bin>pg_basebackup.exe -U postgres -D "c:\backup7" -F t -P -X stream -c fast --compress=none --create-slot--slot=slot11 > Password: > 18134035/18134035 kB (100%), 1/1 tablespace > C:\compiledpg\bin> > C:\compiledpg\bin>dir c:\backup7\pg_wal.tar > [..] > 11/22/2024 10:37 AM 4,026,778,112 pg_wal.tar > > PoC patch is attached and I'll register CFentry to see how the tests will go (all of this MSVC/MinGW stuff is frighteningto me, of course this will fail on non-Win32...). Great news. This will indeed require solving a few archeology puzzles to back-patch. For lseek(), I think it should be OK to just point it to _lseeki64() in win32_port.h as I showed earlier, because _lseeki64() was present in ancient msvcrt.dll. It's technically possible that an extension might be upset by the change but it'd have to be doing something pretty strange, like using lseek's type, or other such assumptions, not terribly likely. A more cautious change would do that just inside that translation unit to avoid upsetting anything else (or fixing anything else...), but that's probably too cautious, let's just do it in win32_port.h. For ftruncate(), though you didn't see a problem with that, Jacob is quite right that it must be able to corrupt stuff, and I retract that replacement code I showed earlier, it's not really OK to move the file position around. I would like to use _s_chsize() instead, but it arrived in msvcr80.dll so I don't think old MinGW can use it. Here's a new idea (sketch code not tested): static inline int ftruncate(int fd, pgoff_t length) { #if defined(_UCRT) || defined(_MSC_VER) /* MinGW + UCRT and all supported MSVC versions have this. */ errno = _s_chsize(fd, length); return errno == 0 ? 0 : -1; #else /* MinGW + ancient msvcrt.dll has only _chsize, limited by off_t (long). */ if (length > LONG_MAX) { /* A clear error is better than silent corruption. */ errno = EFBIG; return - 1; } return _chsize(fd, length); #endif } I'm not sure if we'd ever know if we broke MinGW + msvcrt.dll in the back branches, ie if any computer actually existing on earth would ever compile and run the second branch, and if so, be used for serious database work and then be able to reach the failure case. I think an honest but disappointing errno is probably OK in that hypothetical case. It's hard to know how far to go when programming ghost computers. Thoughts, anyone? (Obviously an exorcism is overdue in master, working on it...)
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Robert Haas
Date:
On Fri, Nov 22, 2024 at 6:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I'm not sure if we'd ever know if we broke MinGW + msvcrt.dll in the > back branches, ie if any computer actually existing on earth would > ever compile and run the second branch, and if so, be used for serious > database work and then be able to reach the failure case. I think an > honest but disappointing errno is probably OK in that hypothetical > case. It's hard to know how far to go when programming ghost > computers. Thoughts, anyone? Hypothetically speaking, suppose we just used _s_chsize everywhere i.e. we deleted the #if/#else/#endif in your example and everything between #else and #endif. I know very little about Windows, but I suppose what would happen is that if you tried to compile a thus-modified version of PostgreSQL on a very old MinGW, it would fail to compile, and if you compiled it on a newer machine and tried to run it on an older one, you'd get the equivalent of a dynamic linker failure when you tried to start the server. At least that's what would happen on Linux or macOS. Would Windows postpone the failure until we actually tried to jump to the nonexistent entrypoint? In any case, one idea would be to adopt this approach and, if we heard of actual failures, we could then decide that the fallback path is needed, and if not, then we're done. Perhaps that's too relaxed an attitude; after all, breaking the entire server in a minor release is not great. However, if you think that systems without msvcr80.dll are extinct in the wild, maybe it's OK. I don't know the history of these things, but a quick Google search suggested that maybe msvcr80.dll was part of the "Microsoft Visual C++ 2005 Redistributable Package". I don't know if that means all Windows shipped after 2005 would have it, but that's a long time ago. > (Obviously an exorcism is overdue in master, working on it...) How does what need to be done compare to the patches from Jakub and Davinder? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Thomas Munro
Date:
On Wed, Dec 4, 2024 at 7:04 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Nov 22, 2024 at 6:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > I'm not sure if we'd ever know if we broke MinGW + msvcrt.dll in the > > back branches, ie if any computer actually existing on earth would > > ever compile and run the second branch, and if so, be used for serious > > database work and then be able to reach the failure case. I think an > > honest but disappointing errno is probably OK in that hypothetical > > case. It's hard to know how far to go when programming ghost > > computers. Thoughts, anyone? > > Hypothetically speaking, suppose we just used _s_chsize everywhere > i.e. we deleted the #if/#else/#endif in your example and everything > between #else and #endif. I know very little about Windows, but I > suppose what would happen is that if you tried to compile a > thus-modified version of PostgreSQL on a very old MinGW, it would fail > to compile, and if you compiled it on a newer machine and tried to run > it on an older one, you'd get the equivalent of a dynamic linker > failure when you tried to start the server. At least that's what would > happen on Linux or macOS. Would Windows postpone the failure until we > actually tried to jump to the nonexistent entrypoint? I think it would fail to link, while building the server? > In any case, one idea would be to adopt this approach and, if we heard > of actual failures, we could then decide that the fallback path is > needed, and if not, then we're done. Perhaps that's too relaxed an > attitude; after all, breaking the entire server in a minor release is > not great. As a data point, v17 already made a similar assumption and nobody squawked. But that was a major release. I honestly don't know who is really using that toolchain outside CI-type systems used by people who want more Unix/Windows toolchain harmony. It's a shame that it has caused so many projects so much Windows/Windows disharmony over the years, but it seems things have dramatically improved on that front. AFAIK we don't really have any defined support criteria beyond "keep the BF and CI green" for new development on master. Maybe you're right and we don't need to be so cautious about the back-branches, though it would irk me to leave the tree full of contradictions in terms of code that is guarded by macros. In this case it's a pretty small amount of extra code so it doesn't bother me much to maintain it, as long as it's gone in master. I'll have some feedback on the patch in the next few days. > However, if you think that systems without msvcr80.dll are > extinct in the wild, maybe it's OK. I don't know the history of these > things, but a quick Google search suggested that maybe msvcr80.dll was > part of the "Microsoft Visual C++ 2005 Redistributable Package". I > don't know if that means all Windows shipped after 2005 would have it, > but that's a long time ago. I think it was more complicated in that time period, but is simple again as of the past decade. Something like: all systems to this day have had a msrvcrt.dll from the 90s as part of the OS, and all systems have ucrt since ~2015 (Windows 10) as part of the OS, but in the period in between, each MSVC version had its own library that was *not* shipped with the OS and you had to ship it with your application. Visual Studio users could do that, but the MinGW project didn't like the terms and conditions and stuck to the old thing, and also couldn't use the new ucrt thing for some reason, at then some point the project forked, producing "MinGW-w64" (which we usually just refer to as MinGW around here), and the successor focused on making ucrt work for them, which took a while. That's what I gathered from skim reading, anyway. A lot of stuff happened in the decades between those two runtimes... UTF-8, hello, I mean, that was kind of a big deal to not have! > > (Obviously an exorcism is overdue in master, working on it...) > > How does what need to be done compare to the patches from Jakub and Davinder? Already done last week, but that's only for master: commit 1758d42446161f5dfae9b14791c5640239b86faa Author: Thomas Munro <tmunro@postgresql.org> Date: Wed Nov 27 22:34:11 2024 +1300 Require ucrt if using MinGW.
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Thomas Munro
Date:
Today while trying to figure out why CI didn't pick up a mistake but the build farm did, I learned a couple of interesting new-to-me facts that are relevant to this thread: 1. When meson builds on "Linux-like" systems (by which it means GCC or Clang detected, and MinGW is GCC), it jams -D_FILE_OFFSET_BITS=64 onto the command line[1]. 2. When MinGW's headers see -D_FILE_OFFSET_BITS=64 it defines off_t as large[2], along with related functions and structs. That's a potential hazard to be aware of for our own code that intercepts some but not all standard functions, and maybe different ones in different branches, and it also doesn't match non-meson or Visual Studio builds. We also already shipped recent releases that way so it would be an ABI break to change it, if that matters. Wow, what a lot of variations we have to handle with no coverage. I'm coming around to your proposal Robert. We decide that it is OK to back-patch freely, under a policy along the lines "we try to keep stable branches working on the latest MinGW toolchain version only, as a developer convenience". Even if it creates contradictions in back-branches (using some stuff unguarded, even if other stuff is guarded because it needed to be at the time). I'm not sure what quorum is needed for such a decree, but from a verification point of view, that's the effective reality already. An interested party would need to show up with the resources to maintain another platform variant; so yeah, why not let them do that, if something we back-patch turns out to be a real problem for them? [1] https://github.com/mesonbuild/meson/blob/853634a48da025c59eef70161dba0d150833f60d/mesonbuild/compilers.py#L2292 [2] https://github.com/mingw-w64/mingw-w64/blob/8bcd5fc1a72c0b6da3147bf21a4a494c81d14fae/mingw-w64-headers/crt/_mingw_off_t.h#L24
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
From
Robert Haas
Date:
On Wed, Dec 4, 2024 at 11:23 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Wow, what a lot of variations we have to handle with no coverage. I'm > coming around to your proposal Robert. We decide that it is OK to > back-patch freely, under a policy along the lines "we try to keep > stable branches working on the latest MinGW toolchain version only, as > a developer convenience". Even if it creates contradictions in > back-branches (using some stuff unguarded, even if other stuff is > guarded because it needed to be at the time). I'm not sure what > quorum is needed for such a decree, but from a verification point of > view, that's the effective reality already. An interested party would > need to show up with the resources to maintain another platform > variant; so yeah, why not let them do that, if something we back-patch > turns out to be a real problem for them? I don't want to go too far in the direction of being willing to break things in the back-branches, for obvious reasons. But in a case like this, we definitely have a bug that is affecting users on Windows, and we can't verify whether the fallback code is correct or would ever be used by anyone. If you think that fallback code is reasonably likely to be correct and work, I'm not against including it. But if we really have no idea, then IMHO it's reasonable to ship something that we expect to cause a hard failure on such platforms if they exist (like a missing symbol). That way, if they do exist, we'll be more likely to find about it and therefore more likely to be able to deliver a complete and proper fix, vs. shipping something that might be quite wrong and praying that it isn't. -- Robert Haas EDB: http://www.enterprisedb.com