Thread: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
Hi there, first-time contributor here. I certainly hope I got the patch creation and email workflow right. Let me know if anything can be improved as I`m eager to learn. Regression tests (check) were successful on native Win32 MSVC as well as Debian. Here comes the patch and corresponding commit text. During archive initialization pg_backup_custom.c determines if the file pointer it should read from or write to is seekable. pg_dump uses this information to rewrite the custom output format's TOC enriched with known offsets into the archive on close. pg_restore uses seeking to speed up file operations when searching for specific blocks within the archive. The seekable property of a file pointer is currently checked by invoking ftello and subsequently fseeko. Both calls succeed on Windows platforms if the underlying file descriptor represents a terminal handle or an anonymous or named pipe. Obviously, these type of devices do not support seeking. In the case of pg_dump, this leads to the TOC being appended to the end of the output when attempting to rewrite known offsets. Furthermore, pg_restore may try to seek to known file offsets if the custom format archive's TOC supports it and subsequently fails to locate blocks. This commit improves the detection of the seekable property by checking a descriptor's file type (st_mode) and filtering character special devices and pipes. The current customized implementation of fstat on Windows platforms (_pgfstat64) erroneously marks terminal and pipe handles as regular files (_S_IFREG). This was improved on by utilizing WinAPI functionality (GetFileType) to correctly distinguish and flag descriptors based on their native OS handle's file type. Daniel --- src/bin/pg_dump/pg_backup_archiver.c | 12 +++++ src/include/port/win32_port.h | 6 +++ src/port/win32stat.c | 68 ++++++++++++++++++++-------- 3 files changed, 67 insertions(+), 19 deletions(-)
Attachment
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Daniel Gustafsson
Date:
> On 16 Dec 2022, at 16:09, Daniel Watzinger <daniel.watzinger@gmail.com> wrote: > first-time contributor here. I certainly hope I got the patch > creation and email workflow right. Let me know if anything can be > improved as I`m eager to learn. Welcome! The patch seems to be in binary format or using some form of non-standard encoding? Can you re-send in plain text format? -- Daniel Gustafsson https://vmware.com/
Well, this is embarassing. Sorry for the inconvenience. Some part of my company's network infrastruture must have mangled the attachment. Both mails were sent using a combination of git format-patch and git send-email. However, as this is my first foray into this email-based workflow, I won't rule out a failure on my part. Bear with me and let's try again.
Attachment
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Michael Paquier
Date:
On Sat, Dec 17, 2022 at 12:55:24AM +0100, Daniel Watzinger wrote: > Well, this is embarassing. Sorry for the inconvenience. Some part > of my company's network infrastruture must have mangled the attachment. > Both mails were sent using a combination of git format-patch > and git send-email. However, as this is my first foray into this > email-based workflow, I won't rule out a failure on my part. Bear > with me and let's try again. No problem. You got the email and the patch format rights! We had better make sure that this does not break again 10260c7, and these could not be reproduced with automated tests as they needed a Windows terminal. Isn't this issue like the other commit, where the automated testing cannot reproduce any of that because it requires a terminal? If not, could it be possible to add some tests to have some coverage? The tests of pg_dump in src/bin/pg_dump/t/ invoke the custom format in a few scenarios already, and these are tested in the buildfarm for a couple of years now, without failing, but perhaps we'd need a small tweak to have a reproducible test case for automation? The patch has some formatting problems, see git diff --check for example. This does not prevent looking at the patch. The internal implementation of _pgstat64() is used in quite a few places, so we'd better update this part first, IMO, and then focus on the pg_dump part. Anyway, it looks like you are right here: there is nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32 implementation of fstat(). I am amazed to hear that both ftello64() and fseek64() actually succeed if you use a pipe: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html Could it be something we should try to make more portable by ourselves with a wrapper for these on WIN32? That would not be the first one to accomodate our code with POSIX, and who knows what code could be broken because of that, like external extensions that use fseek64() without knowing it. - if (hFile == INVALID_HANDLE_VALUE || buf == NULL) + if (hFile == INVALID_HANDLE_VALUE || hFile == (HANDLE)-2 || buf == NULL) What's the -2 for? Perhaps this should have a comment? + fileType = GetFileType(hFile); + lastError = GetLastError(); [...] if (fileType == FILE_TYPE_UNKNOWN && lastError != NO_ERROR) { + _dosmaperr(lastError); + return -1; } So, the patched code assumes that all the file types classified as FILE_TYPE_UNKNOWN when GetFileType() does not fail refer to fileno being either stdin, stderr or stdout. Perhaps we had better cross-check that fileno points to one of these three cases in the switch under FILE_TYPE_UNKNOWN? Could there be other cases where we have FILE_TYPE_UNKNOWN but GetFileType() does not fail? Per the documentation of GetFileType, FILE_TYPE_REMOTE is unused: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfiletype Perhaps it would be safer to fail in this case? checkSeek(FILE *fp) { pgoff_t tpos; + struct stat st; + + /* Check if this is a terminal descriptor */ + if (isatty(fileno(fp))) { + return false; + } + + /* Check if this is an unseekable character special device or pipe */ + if ((fstat(fileno(fp), &st) == 0) && (S_ISCHR(st.st_mode) + || S_ISFIFO(st.st_mode))) { + return false; + } Using that without a control of WIN32 is disturbing, but that comes down to if we'd want to tackle that within an extra layer of fseek()/ftello() in the WIN32 port. I am adding Juan in CC, as I am sure he'd have comments to offer on this area of the code. -- Michael
Attachment
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Juan José Santamaría Flecha
Date:
On Thu, Mar 2, 2023 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:
We had better make sure that this does not break again 10260c7, and
these could not be reproduced with automated tests as they needed a
Windows terminal. Isn't this issue like the other commit, where the
automated testing cannot reproduce any of that because it requires a
terminal? If not, could it be possible to add some tests to have some
coverage? The tests of pg_dump in src/bin/pg_dump/t/ invoke the
custom format in a few scenarios already, and these are tested in the
buildfarm for a couple of years now, without failing, but perhaps we'd
need a small tweak to have a reproducible test case for automation?
I've been able to manually reproduce the problem with:
pg_dump --format=custom > custom.dump
pg_restore --file=toc.txt --list custom.dump
pg_dump --format=custom | pg_restore --file=toc.dump --use-list=toc.txt
The error I get is:
pg_restore: error: unsupported version (0.7) in file header
I'm not really sure how to integrate this in a tap test.
The internal implementation of _pgstat64() is used in quite a few
places, so we'd better update this part first, IMO, and then focus on
the pg_dump part. Anyway, it looks like you are right here: there is
nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32
implementation of fstat().
I am amazed to hear that both ftello64() and fseek64() actually
succeed if you use a pipe:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
Could it be something we should try to make more portable by ourselves
with a wrapper for these on WIN32? That would not be the first one to
accomodate our code with POSIX, and who knows what code could be broken
because of that, like external extensions that use fseek64() without
knowing it.
The error is reproducible in versions previous to win32stat.c, so that might work as bug fix.
- if (hFile == INVALID_HANDLE_VALUE || buf == NULL)
+ if (hFile == INVALID_HANDLE_VALUE || hFile == (HANDLE)-2 || buf == NULL)
What's the -2 for? Perhaps this should have a comment?
There's a note on _get_osfhandle() [1] about when -2 is returned, but a comment seems appropriate.
+ fileType = GetFileType(hFile);
+ lastError = GetLastError();
[...]
if (fileType == FILE_TYPE_UNKNOWN && lastError != NO_ERROR) {
+ _dosmaperr(lastError);
+ return -1;
}
So, the patched code assumes that all the file types classified as
FILE_TYPE_UNKNOWN when GetFileType() does not fail refer to fileno
being either stdin, stderr or stdout. Perhaps we had better
cross-check that fileno points to one of these three cases in the
switch under FILE_TYPE_UNKNOWN? Could there be other cases where we
have FILE_TYPE_UNKNOWN but GetFileType() does not fail?
I don't think we should set st_mode for FILE_TYPE_UNKNOWN.
Per the documentation of GetFileType, FILE_TYPE_REMOTE is unused:
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfiletype
Perhaps it would be safer to fail in this case?
+1, we don't know what that might involve.
Regards,
Juan José Santamaría Flecha
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Juan José Santamaría Flecha
Date:
On Tue, Mar 7, 2023 at 1:36 PM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:
On Thu, Mar 2, 2023 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:
The internal implementation of _pgstat64() is used in quite a fewplaces, so we'd better update this part first, IMO, and then focus on
the pg_dump part. Anyway, it looks like you are right here: there is
nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32
implementation of fstat().
I am amazed to hear that both ftello64() and fseek64() actually
succeed if you use a pipe:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
Could it be something we should try to make more portable by ourselves
with a wrapper for these on WIN32? That would not be the first one to
accomodate our code with POSIX, and who knows what code could be broken
because of that, like external extensions that use fseek64() without
knowing it.The error is reproducible in versions previous to win32stat.c, so that might work as bug fix.
I've broken the patch in two:
1. fixes the detection of unseekable files in checkSeek(), using logic that hopefully is backpatchable,
2. the improvements on file type detection for stat() proposed by the OP.
Regards,
Juan José Santamaría Flecha
Attachment
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Michael Paquier
Date:
On Fri, Mar 10, 2023 at 12:12:37AM +0100, Juan José Santamaría Flecha wrote: > I've broken the patch in two: > 1. fixes the detection of unseekable files in checkSeek(), using logic that > hopefully is backpatchable, > 2. the improvements on file type detection for stat() proposed by the OP. I am OK with 0002, so I'll try to get this part backpatched down to where the implementation of stat() has been added. I am not completely sure that 0001 is the right way forward, though, particularly with the long-term picture.. In the backend, we have one caller of fseeko() as of read_binary_file(), so we would never pass down a pipe to that. However, there could be a risk of some silent breakages on Windows if some new code relies on that? There is a total of 11 callers of fseeko() in pg_dump, so rather than relying on checkSeek() to see if it actually works, I'd like to think that we should have a central policy to make this code more bullet-proof in the future. -- Michael
Attachment
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Juan José Santamaría Flecha
Date:
On Fri, Mar 10, 2023 at 2:37 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Mar 10, 2023 at 12:12:37AM +0100, Juan José Santamaría Flecha wrote:
> I've broken the patch in two:
> 1. fixes the detection of unseekable files in checkSeek(), using logic that
> hopefully is backpatchable,
> 2. the improvements on file type detection for stat() proposed by the OP.
I am OK with 0002, so I'll try to get this part backpatched down to
where the implementation of stat() has been added. I am not
completely sure that 0001 is the right way forward, though,
particularly with the long-term picture.. In the backend, we have one
caller of fseeko() as of read_binary_file(), so we would never pass
down a pipe to that. However, there could be a risk of some silent
breakages on Windows if some new code relies on that?
There is a total of 11 callers of fseeko() in pg_dump, so rather than
relying on checkSeek() to see if it actually works, I'd like to think
that we should have a central policy to make this code more
bullet-proof in the future.
WFM, making fseek() behaviour more resilient seems like a good improvement overall.
Should we open a new thread to make that part more visible?
Regards,
Juan José Santamaría Flecha
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Michael Paquier
Date:
On Mon, Mar 13, 2023 at 05:49:41PM +0100, Juan José Santamaría Flecha wrote: > WFM, making fseek() behaviour more resilient seems like a good improvement > overall. I have not looked in details, but my guess would be to add a win32seek.c similar to win32stat.c with a port of fseek() that's more resilient to the definitions in POSIX. > Should we open a new thread to make that part more visible? Yes, perhaps it makes sense to do so to attract the correct audience, There may be a few things we are missing. When it comes to pg_dump, both fixes are required, still it seems to me that adjusting the fstat() port and the fseek() ports are two different bugs, as they influence different parts of the code base when taken individually (aka this fseek() port for WIN32 would need fstat() to properly detect a pipe, as far as I understand). Meanwhile, I'll go apply and backpatch 0001 to fix the first bug at hand with the fstat() port, if there are no objections. -- Michael
Attachment
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Daniel Watzinger
Date:
I'm sorry I couldn't contribute to the discussion in time. The fix of the fstat() Win32 port looks good to me. I agree that there's a need for multiple fseek() ports to address the shortcomings of the MSVC functionality.
The documentation event states that "on devices incapable of seeking, the return value is undefined". A simple wrapper using GetFileType() or the new fstat(), to filter non-seekable devices before delegation, will probably suffice.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fseek-fseeki64?view=msvc-170
Regarding test automation and regression testing, there's a programmatic way to simulate how the "pipe operator" of cmd.exe and other shells works using CreateProcess and manual "piping" by means of various WinAPI functionality. This is actually how the bug was discovered in the first case. However, existing tests are probably platform-agnostic.
Regarding test automation and regression testing, there's a programmatic way to simulate how the "pipe operator" of cmd.exe and other shells works using CreateProcess and manual "piping" by means of various WinAPI functionality. This is actually how the bug was discovered in the first case. However, existing tests are probably platform-agnostic.
--
Daniel
On Tue, Mar 14, 2023 at 12:41 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Mar 13, 2023 at 05:49:41PM +0100, Juan José Santamaría Flecha wrote:
> WFM, making fseek() behaviour more resilient seems like a good improvement
> overall.
I have not looked in details, but my guess would be to add a
win32seek.c similar to win32stat.c with a port of fseek() that's more
resilient to the definitions in POSIX.
> Should we open a new thread to make that part more visible?
Yes, perhaps it makes sense to do so to attract the correct audience,
There may be a few things we are missing.
When it comes to pg_dump, both fixes are required, still it seems to
me that adjusting the fstat() port and the fseek() ports are two
different bugs, as they influence different parts of the code base
when taken individually (aka this fseek() port for WIN32 would need
fstat() to properly detect a pipe, as far as I understand).
Meanwhile, I'll go apply and backpatch 0001 to fix the first bug at
hand with the fstat() port, if there are no objections.
--
Michael
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Juan José Santamaría Flecha
Date:
Please, don't top post.
On Tue, Mar 14, 2023 at 12:30 PM Daniel Watzinger <daniel.watzinger@gmail.com> wrote:
I'm sorry I couldn't contribute to the discussion in time. The fix of the fstat() Win32 port looks good to me. I agree that there's a need for multiple fseek() ports to address the shortcomings of the MSVC functionality.The documentation event states that "on devices incapable of seeking, the return value is undefined". A simple wrapper using GetFileType() or the new fstat(), to filter non-seekable devices before delegation, will probably suffice.
I have just posted a patch to enforce the detection of unseekable streams in the fseek() calls [1], please feel free to review it.
Regards,
Juan José Santamaría Flecha
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
From
Michael Paquier
Date:
On Tue, Mar 14, 2023 at 01:47:09PM +0100, Juan José Santamaría Flecha wrote: > I have just posted a patch to enforce the detection of unseekable streams > in the fseek() calls [1], please feel free to review it. Thanks. I have been able to get around 0001 to fix _pgfstat64() and applied it down to v14 where this code has been introduced. Now to the part about fseek() and ftello().. -- Michael