Thread: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Daniel Watzinger
Date:
Hi there,

first-time contributor here. I certainly hope I got the patch
creation and email workflow right. Let me know if anything can be
improved as I`m eager to learn. Regression tests (check) were 
successful on native Win32 MSVC as well as Debian. Here comes the 
patch and corresponding commit text.

During archive initialization pg_backup_custom.c determines if the file
pointer it should read from or write to is seekable. pg_dump
uses this information to rewrite the custom output format's TOC
enriched with known offsets into the archive on close. pg_restore uses
seeking to speed up file operations when searching for specific
blocks within the archive.

The seekable property of a file pointer is currently checked by
invoking ftello and subsequently fseeko. Both calls succeed
on Windows platforms if the underlying file descriptor represents a
terminal handle or an anonymous or named pipe. Obviously, these type
of devices do not support seeking. In the case of pg_dump, this
leads to the TOC being appended to the end of the output when attempting
to rewrite known offsets. Furthermore, pg_restore may try to seek to
known file offsets if the custom format archive's TOC supports it
and subsequently fails to locate blocks.

This commit improves the detection of the seekable property by checking
a descriptor's file type (st_mode) and filtering character special
devices and pipes. The current customized implementation of fstat on
Windows platforms (_pgfstat64) erroneously marks terminal and pipe
handles as regular files (_S_IFREG). This was improved on by
utilizing WinAPI functionality (GetFileType) to correctly distinguish
and flag descriptors based on their native OS handle's file type.

Daniel

---
 src/bin/pg_dump/pg_backup_archiver.c | 12 +++++
 src/include/port/win32_port.h        |  6 +++
 src/port/win32stat.c                 | 68 ++++++++++++++++++++--------
 3 files changed, 67 insertions(+), 19 deletions(-)

Attachment

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Daniel Gustafsson
Date:
> On 16 Dec 2022, at 16:09, Daniel Watzinger <daniel.watzinger@gmail.com> wrote:

> first-time contributor here. I certainly hope I got the patch
> creation and email workflow right. Let me know if anything can be
> improved as I`m eager to learn.

Welcome!  The patch seems to be in binary format or using some form of
non-standard encoding? Can you re-send in plain text format?

--
Daniel Gustafsson        https://vmware.com/




pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Daniel Watzinger
Date:
Well, this is embarassing. Sorry for the inconvenience. Some part
of my company's network infrastruture must have mangled the attachment.
Both mails were sent using a combination of git format-patch 
and git send-email. However, as this is my first foray into this 
email-based workflow, I won't rule out a failure on my part. Bear
with me and let's try again.
Attachment

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Michael Paquier
Date:
On Sat, Dec 17, 2022 at 12:55:24AM +0100, Daniel Watzinger wrote:
> Well, this is embarassing. Sorry for the inconvenience. Some part
> of my company's network infrastruture must have mangled the attachment.
> Both mails were sent using a combination of git format-patch
> and git send-email. However, as this is my first foray into this
> email-based workflow, I won't rule out a failure on my part. Bear
> with me and let's try again.

No problem.  You got the email and the patch format rights!

We had better make sure that this does not break again 10260c7, and
these could not be reproduced with automated tests as they needed a
Windows terminal.  Isn't this issue like the other commit, where the
automated testing cannot reproduce any of that because it requires a
terminal?  If not, could it be possible to add some tests to have some
coverage?  The tests of pg_dump in src/bin/pg_dump/t/ invoke the
custom format in a few scenarios already, and these are tested in the
buildfarm for a couple of years now, without failing, but perhaps we'd
need a small tweak to have a reproducible test case for automation?

The patch has some formatting problems, see git diff --check for
example.  This does not prevent looking at the patch.

The internal implementation of _pgstat64() is used in quite a few
places, so we'd better update this part first, IMO, and then focus on
the pg_dump part.  Anyway, it looks like you are right here: there is
nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32
implementation of fstat().

I am amazed to hear that both ftello64() and fseek64() actually
succeed if you use a pipe:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
Could it be something we should try to make more portable by ourselves
with a wrapper for these on WIN32?  That would not be the first one to
accomodate our code with POSIX, and who knows what code could be broken
because of that, like external extensions that use fseek64() without
knowing it.

-   if (hFile == INVALID_HANDLE_VALUE || buf == NULL)
+   if (hFile == INVALID_HANDLE_VALUE || hFile == (HANDLE)-2 || buf == NULL)
What's the -2 for?  Perhaps this should have a comment?

+   fileType = GetFileType(hFile);
+   lastError = GetLastError();
[...]
   if (fileType == FILE_TYPE_UNKNOWN && lastError != NO_ERROR) {
+       _dosmaperr(lastError);
+       return -1;
    }
So, the patched code assumes that all the file types classified as
FILE_TYPE_UNKNOWN when GetFileType() does not fail refer to fileno
being either stdin, stderr or stdout.  Perhaps we had better
cross-check that fileno points to one of these three cases in the
switch under FILE_TYPE_UNKNOWN?  Could there be other cases where we
have FILE_TYPE_UNKNOWN but GetFileType() does not fail?

Per the documentation of GetFileType, FILE_TYPE_REMOTE is unused:
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfiletype
Perhaps it would be safer to fail in this case?

 checkSeek(FILE *fp)
 {
    pgoff_t     tpos;
+   struct stat st;
+
+   /* Check if this is a terminal descriptor */
+   if (isatty(fileno(fp))) {
+       return false;
+   }
+
+   /* Check if this is an unseekable character special device or pipe */
+   if ((fstat(fileno(fp), &st) == 0) && (S_ISCHR(st.st_mode)
+       || S_ISFIFO(st.st_mode))) {
+       return false;
+   }
Using that without a control of WIN32 is disturbing, but that comes
down to if we'd want to tackle that within an extra layer of
fseek()/ftello() in the WIN32 port.

I am adding Juan in CC, as I am sure he'd have comments to offer on
this area of the code.
--
Michael

Attachment

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Juan José Santamaría Flecha
Date:

On Thu, Mar 2, 2023 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:

We had better make sure that this does not break again 10260c7, and
these could not be reproduced with automated tests as they needed a
Windows terminal.  Isn't this issue like the other commit, where the
automated testing cannot reproduce any of that because it requires a
terminal?  If not, could it be possible to add some tests to have some
coverage?  The tests of pg_dump in src/bin/pg_dump/t/ invoke the
custom format in a few scenarios already, and these are tested in the
buildfarm for a couple of years now, without failing, but perhaps we'd
need a small tweak to have a reproducible test case for automation?

I've been able to manually reproduce the problem with:

pg_dump --format=custom > custom.dump
pg_restore --file=toc.txt --list custom.dump
pg_dump --format=custom | pg_restore --file=toc.dump --use-list=toc.txt

The error I get is:

pg_restore: error: unsupported version (0.7) in file header

I'm not really sure how to integrate this in a tap test.
 
The internal implementation of _pgstat64() is used in quite a few
places, so we'd better update this part first, IMO, and then focus on
the pg_dump part.  Anyway, it looks like you are right here: there is
nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32
implementation of fstat().

I am amazed to hear that both ftello64() and fseek64() actually
succeed if you use a pipe:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
Could it be something we should try to make more portable by ourselves
with a wrapper for these on WIN32?  That would not be the first one to
accomodate our code with POSIX, and who knows what code could be broken
because of that, like external extensions that use fseek64() without
knowing it.

The error is reproducible in versions previous to win32stat.c, so that might work as bug fix.
 
-   if (hFile == INVALID_HANDLE_VALUE || buf == NULL)
+   if (hFile == INVALID_HANDLE_VALUE || hFile == (HANDLE)-2 || buf == NULL)
What's the -2 for?  Perhaps this should have a comment?

 There's a note on _get_osfhandle() [1] about when -2 is returned, but a comment seems appropriate. 

+   fileType = GetFileType(hFile);
+   lastError = GetLastError();
[...]
   if (fileType == FILE_TYPE_UNKNOWN && lastError != NO_ERROR) {
+       _dosmaperr(lastError);
+       return -1;
    }
So, the patched code assumes that all the file types classified as
FILE_TYPE_UNKNOWN when GetFileType() does not fail refer to fileno
being either stdin, stderr or stdout.  Perhaps we had better
cross-check that fileno points to one of these three cases in the
switch under FILE_TYPE_UNKNOWN?  Could there be other cases where we
have FILE_TYPE_UNKNOWN but GetFileType() does not fail?

I don't think we should set st_mode for FILE_TYPE_UNKNOWN.
 
Per the documentation of GetFileType, FILE_TYPE_REMOTE is unused:
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfiletype
Perhaps it would be safer to fail in this case?

+1, we don't know what that might involve. 
 

Regards,

Juan José Santamaría Flecha

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Juan José Santamaría Flecha
Date:

On Tue, Mar 7, 2023 at 1:36 PM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:

On Thu, Mar 2, 2023 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:

The internal implementation of _pgstat64() is used in quite a few
places, so we'd better update this part first, IMO, and then focus on
the pg_dump part.  Anyway, it looks like you are right here: there is
nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32
implementation of fstat().

I am amazed to hear that both ftello64() and fseek64() actually
succeed if you use a pipe:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
Could it be something we should try to make more portable by ourselves
with a wrapper for these on WIN32?  That would not be the first one to
accomodate our code with POSIX, and who knows what code could be broken
because of that, like external extensions that use fseek64() without
knowing it.

The error is reproducible in versions previous to win32stat.c, so that might work as bug fix.

I've broken the patch in two:
1. fixes the detection of unseekable files in checkSeek(), using logic that hopefully is backpatchable,
2. the improvements on file type detection for stat() proposed by the OP.

Regards,

Juan José Santamaría Flecha
Attachment

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Michael Paquier
Date:
On Fri, Mar 10, 2023 at 12:12:37AM +0100, Juan José Santamaría Flecha wrote:
> I've broken the patch in two:
> 1. fixes the detection of unseekable files in checkSeek(), using logic that
> hopefully is backpatchable,
> 2. the improvements on file type detection for stat() proposed by the OP.

I am OK with 0002, so I'll try to get this part backpatched down to
where the implementation of stat() has been added.  I am not
completely sure that 0001 is the right way forward, though,
particularly with the long-term picture..  In the backend, we have one
caller of fseeko() as of read_binary_file(), so we would never pass
down a pipe to that.  However, there could be a risk of some silent
breakages on Windows if some new code relies on that?

There is a total of 11 callers of fseeko() in pg_dump, so rather than
relying on checkSeek() to see if it actually works, I'd like to think
that we should have a central policy to make this code more
bullet-proof in the future.
--
Michael

Attachment

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Juan José Santamaría Flecha
Date:

On Fri, Mar 10, 2023 at 2:37 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Mar 10, 2023 at 12:12:37AM +0100, Juan José Santamaría Flecha wrote:
> I've broken the patch in two:
> 1. fixes the detection of unseekable files in checkSeek(), using logic that
> hopefully is backpatchable,
> 2. the improvements on file type detection for stat() proposed by the OP.

I am OK with 0002, so I'll try to get this part backpatched down to
where the implementation of stat() has been added.  I am not
completely sure that 0001 is the right way forward, though,
particularly with the long-term picture..  In the backend, we have one
caller of fseeko() as of read_binary_file(), so we would never pass
down a pipe to that.  However, there could be a risk of some silent
breakages on Windows if some new code relies on that?

There is a total of 11 callers of fseeko() in pg_dump, so rather than
relying on checkSeek() to see if it actually works, I'd like to think
that we should have a central policy to make this code more
bullet-proof in the future.

WFM, making fseek() behaviour more resilient seems like a good improvement overall.

Should we open a new thread to make that part more visible?

Regards,

Juan José Santamaría Flecha

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Michael Paquier
Date:
On Mon, Mar 13, 2023 at 05:49:41PM +0100, Juan José Santamaría Flecha wrote:
> WFM, making fseek() behaviour more resilient seems like a good improvement
> overall.

I have not looked in details, but my guess would be to add a
win32seek.c similar to win32stat.c with a port of fseek() that's more
resilient to the definitions in POSIX.

> Should we open a new thread to make that part more visible?

Yes, perhaps it makes sense to do so to attract the correct audience,
There may be a few things we are missing.

When it comes to pg_dump, both fixes are required, still it seems to
me that adjusting the fstat() port and the fseek() ports are two
different bugs, as they influence different parts of the code base
when taken individually (aka this fseek() port for WIN32 would need
fstat() to properly detect a pipe, as far as I understand).

Meanwhile, I'll go apply and backpatch 0001 to fix the first bug at
hand with the fstat() port, if there are no objections.
--
Michael

Attachment

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Daniel Watzinger
Date:
I'm sorry I couldn't contribute to the discussion in time. The fix of the fstat() Win32 port looks good to me. I agree that there's a need for multiple fseek() ports to address the shortcomings of the MSVC functionality.

The documentation event states that "on devices incapable of seeking, the return value is undefined". A simple wrapper using GetFileType() or the new fstat(), to filter non-seekable devices before delegation, will probably suffice.

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fseek-fseeki64?view=msvc-170

Regarding test automation and regression testing, there's a programmatic way to simulate how the "pipe operator" of cmd.exe and other shells works using CreateProcess and manual "piping" by means of various WinAPI functionality. This is actually how the bug was discovered in the first case. However, existing tests are probably platform-agnostic.

--
Daniel

On Tue, Mar 14, 2023 at 12:41 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Mar 13, 2023 at 05:49:41PM +0100, Juan José Santamaría Flecha wrote:
> WFM, making fseek() behaviour more resilient seems like a good improvement
> overall.

I have not looked in details, but my guess would be to add a
win32seek.c similar to win32stat.c with a port of fseek() that's more
resilient to the definitions in POSIX.

> Should we open a new thread to make that part more visible?

Yes, perhaps it makes sense to do so to attract the correct audience,
There may be a few things we are missing.

When it comes to pg_dump, both fixes are required, still it seems to
me that adjusting the fstat() port and the fseek() ports are two
different bugs, as they influence different parts of the code base
when taken individually (aka this fseek() port for WIN32 would need
fstat() to properly detect a pipe, as far as I understand).

Meanwhile, I'll go apply and backpatch 0001 to fix the first bug at
hand with the fstat() port, if there are no objections.
--
Michael

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Juan José Santamaría Flecha
Date:
Please, don't top post.

On Tue, Mar 14, 2023 at 12:30 PM Daniel Watzinger <daniel.watzinger@gmail.com> wrote:
I'm sorry I couldn't contribute to the discussion in time. The fix of the fstat() Win32 port looks good to me. I agree that there's a need for multiple fseek() ports to address the shortcomings of the MSVC functionality.

The documentation event states that "on devices incapable of seeking, the return value is undefined". A simple wrapper using GetFileType() or the new fstat(), to filter non-seekable devices before delegation, will probably suffice.


I have just posted a patch to enforce the detection of unseekable streams in the fseek() calls [1], please feel free to review it.


Regards,

Juan José Santamaría Flecha

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

From
Michael Paquier
Date:
On Tue, Mar 14, 2023 at 01:47:09PM +0100, Juan José Santamaría Flecha wrote:
> I have just posted a patch to enforce the detection of unseekable streams
> in the fseek() calls [1], please feel free to review it.

Thanks.  I have been able to get around 0001 to fix _pgfstat64() and
applied it down to v14 where this code has been introduced.  Now to
the part about fseek() and ftello()..
--
Michael

Attachment