Thread: stat() on Windows might cause error if target file is larger than4GB

stat() on Windows might cause error if target file is larger than4GB

From
"Higuchi, Daisuke"
Date:
Hi, 

This mail is about following bug report post: 
https://www.postgresql.org/message-id/flat/153442391458.1505.9181095584291689853%40wrigleys.postgresql.org

When pg_dump has '--format=directory' option and the dump file size become 
4GB over, the strange error message 'Unknown error' will be output. 

This is because _stat64i32() is used for stat() on Windows, I think. 
Seeing following URL, _stat64i32() could use 32 bit, it means 4GB is max size. 
https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx

When I create the simple application to use stat() on VS2013, 
stat() could not deal with 4GB file and failed with errno=132.
I think the pg_dump occurs EOVERFLOW, but Windows' errno.h does not have 
this errno, so 'Unknown error' is output. 

So, pgwin32_safestat() should be changed to solve this problem. 
Do you have any idea or comments?

Regards, 
Daisuke Higuchi




Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
On Mon, Sep 10, 2018 at 02:01:53AM +0000, Higuchi, Daisuke wrote:
> This mail is about following bug report post:
> https://www.postgresql.org/message-id/flat/153442391458.1505.9181095584291689853%40wrigleys.postgresql.org

Thanks for following up on this thread.  I marked that as one of my
TODOs but could not move around to look at it seriously.

> This is because _stat64i32() is used for stat() on Windows, I think.
> Seeing following URL, _stat64i32() could use 32 bit, it means 4GB is max
> size.
> https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx
>
> When I create the simple application to use stat() on VS2013,
> stat() could not deal with 4GB file and failed with errno=132.

I don't quite follow the logic here, we map stat() to pgwin32_safestat()
when compiling with MSVC.

> I think the pg_dump occurs EOVERFLOW, but Windows' errno.h does not have
> this errno, so 'Unknown error' is output.

The error mapping happens in win32error.c, so we'd want to update this
table to make sure that EOVERFLOW is the errno reported is correct.

> So, pgwin32_safestat() should be changed to solve this problem.
> Do you have any idea or comments?

Yes, the fix needs to happen there.  It seems to me that what we are
looking for here is to complete the calculation of st_size with
nFileSizeHigh, so as we are able to compile a correct 64-bit integer
size, allowing support for larger files on Win32, which is something
that fsync() support for pg_dump has actually changed by opening stat()
to be called for files with a size larger than 4GB.  (I need badly to
setup a Windows machine...)
--
Michael

Attachment

Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
Higuchi-san,

On Mon, Sep 10, 2018 at 03:44:54PM +0900, Michael Paquier wrote:
> Yes, the fix needs to happen there.  It seems to me that what we are
> looking for here is to complete the calculation of st_size with
> nFileSizeHigh, so as we are able to compile a correct 64-bit integer
> size, allowing support for larger files on Win32, which is something
> that fsync() support for pg_dump has actually changed by opening stat()
> to be called for files with a size larger than 4GB.  (I need badly to
> setup a Windows machine...)

Does something like the patch attached help?  This makes sure that
st_size is set correctly for files with a size larger than 4GB.
Unfortunately I have not been able to test it, I have tried for some
time today to deploy a Win10 VM with VS 2017 (community version), but I
cannot get past some errors because of a set of missing files when
trying to compile, the one build complains about is Platform.targets,
which goes missing even if vc_redist is installed for all the C++
deliverables.  The user experience has gotten worse lately, at least for
me..
--
Michael

Attachment

RE: stat() on Windows might cause error if target file is largerthan 4GB

From
"Higuchi, Daisuke"
Date:
Michael-san, 

From: Michael Paquier [mailto:michael@paquier.xyz]
>Does something like the patch attached help?
>This makes sure that st_size is set correctly for files with a size larger than 4GB.

Thank you for creating patch, but this does not solve current problem. 
Of cause setting wrong size to st_size is problem, 
I think the cause of this problem is stat()'s return value (=-1). 

In pgwin32_safestat(), if stat() try to deal with files with a size larger than 4GB, 
the return value is -1. So, pgwin32_safestat() exits before calculating buf->st_size. 

----
pgwin32_safestat(const char *path, struct stat *buf)
{
        int            r;
        WIN32_FILE_ATTRIBUTE_DATA attr;

        r = stat(path, buf);
        if (r < 0)
        {
...
                return r;
        }
...
        buf->st_size = attr.nFileSizeLow;

        return 0;
}
----

So, attached patch help me and strange message disappeared, 
but I ignore the impact of this for others now. 

Regards, 
Daisuke, Higuchi


Attachment

Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
On Tue, Sep 11, 2018 at 10:36:51AM +0000, Higuchi, Daisuke wrote:
> So, attached patch help me and strange message disappeared,
> but I ignore the impact of this for others now.

@@ -386,7 +386,6 @@ pgwin32_safestat(const char *path, struct stat *buf)
            return -1;
        }

-       return r;
    }
Simply ignoring errors is not a solution, and makes things worse.

At the end, I have finally been able to put my hands on a Windows VM
which uses VS2015, and I am able to see the problem.  In short, Windows
definition of stat() is an utter mess as this documentation page, which
is the latest one available, nicely summarizes:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=vs-2017

It is possible to get away with the error by using _stat64(), which
returns as result a _stat64 structure.  Still, it has one difference
with the native result returned by stat() (which maps to _stat64i32) as
st_size is a 32-bit integer in _stat64i32, and a 64-bit integer with
_stat64.  This mess is mixed also with the fact that pgwin32_safestat
relies on a result stored in _stat, so we'd lose the 32 high bits from
the size if we only do a direct mapping, which is bad.

Getting full support for stat() with files larger than 4GB would be the
nicest solution, and requires roughly the following things I think:
- Use _stat64 in pgwin32_safestat.
- Enforce the return result stat to map with _stat64, so as st_size gets
the correct 64-bit size.

Postgres could live in a better world if Windows decided that stat() is
able to handle properly files bigger than 4GB with x64, but as
backward-compatibility matters a lot for Redmond's folks, it is hard to
believe that this is going to ever change.  I cannot blame the
compatibility argument either.
--
Michael

Attachment

Re: stat() on Windows might cause error if target file is larger than 4GB

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> At the end, I have finally been able to put my hands on a Windows VM
> which uses VS2015, and I am able to see the problem.  In short, Windows
> definition of stat() is an utter mess as this documentation page, which
> is the latest one available, nicely summarizes:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=vs-2017

Egad.

> It is possible to get away with the error by using _stat64(), which
> returns as result a _stat64 structure.  Still, it has one difference
> with the native result returned by stat() (which maps to _stat64i32) as
> st_size is a 32-bit integer in _stat64i32, and a 64-bit integer with
> _stat64.  This mess is mixed also with the fact that pgwin32_safestat
> relies on a result stored in _stat, so we'd lose the 32 high bits from
> the size if we only do a direct mapping, which is bad.

Could we fix things so that Postgres thinks that struct stat is
struct __stat64?  That is, act as though that variant is the native
stat structure?

            regards, tom lane


Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
On Tue, Sep 11, 2018 at 11:27:08PM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> It is possible to get away with the error by using _stat64(), which
>> returns as result a _stat64 structure.  Still, it has one difference
>> with the native result returned by stat() (which maps to _stat64i32) as
>> st_size is a 32-bit integer in _stat64i32, and a 64-bit integer with
>> _stat64.  This mess is mixed also with the fact that pgwin32_safestat
>> relies on a result stored in _stat, so we'd lose the 32 high bits from
>> the size if we only do a direct mapping, which is bad.
>
> Could we fix things so that Postgres thinks that struct stat is
> struct __stat64?  That is, act as though that variant is the native
> stat structure?

That's exactly what I would like to do and what I meant two paragraphs
above as that's the only solution I think would be clean, and this would
take care of st_size nicely.  I have not tested (yet), but some tweaks
in win32_port.h could be enough before mapping lstat to stat?
--
Michael

Attachment

Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
On Wed, Sep 12, 2018 at 12:47:31PM +0900, Michael Paquier wrote:
> That's exactly what I would like to do and what I meant two paragraphs
> above as that's the only solution I think would be clean, and this would
> take care of st_size nicely.  I have not tested (yet), but some tweaks
> in win32_port.h could be enough before mapping lstat to stat?

FWIW, I have been digging into this one and as "struct stat" is already
an existing structure when it comes to MSVC, enforcing a mapping of
__stat64 to that is proving to be tedious in one of the port headers.
Another solution would be to modify pgwin32_safestat so as it directly
uses _stat64, and then fill in the results from __stat64 directly to
_stat field by field.  One thing which would be bad is that
_stat.st_size is 4 bytes, which would cause the eight high bytes of
__stat64.st_size to be lost, hence if working on a file larger than 4GB
we would send an incorrect size back to the caller, which is worse than
the OVERFLOW we have now. We had better be careful with MinGW as well,
and cygwin does not take this path.  Perhaps somebody has a smart idea?
--
Michael

Attachment

Re: stat() on Windows might cause error if target file is larger than 4GB

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Sep 12, 2018 at 12:47:31PM +0900, Michael Paquier wrote:
>> That's exactly what I would like to do and what I meant two paragraphs
>> above as that's the only solution I think would be clean, and this would
>> take care of st_size nicely.  I have not tested (yet), but some tweaks
>> in win32_port.h could be enough before mapping lstat to stat?

> FWIW, I have been digging into this one and as "struct stat" is already
> an existing structure when it comes to MSVC, enforcing a mapping of
> __stat64 to that is proving to be tedious in one of the port headers.

Yeah, I was afraid of that.  We could invent a typedef "pg_struct_stat"
that maps to the right thing, but using that everywhere would be a mighty
invasive change :-(.  And we can't just "#define stat __stat64" because
that would break the calls to stat().  Or wait, maybe we could do that
and also have a one-liner function named __stat64 that would catch the
calls and redirect to _stat64?

            regards, tom lane


Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
On Wed, Sep 12, 2018 at 08:17:05PM -0400, Tom Lane wrote:
> Yeah, I was afraid of that.  We could invent a typedef "pg_struct_stat"
> that maps to the right thing, but using that everywhere would be a mighty
> invasive change :-(.

There are 130 places where "struct stat " is used, so I'd rather avoid
that.

> And we can't just "#define stat __stat64" because
> that would break the calls to stat().  Or wait, maybe we could do that
> and also have a one-liner function named __stat64 that would catch the
> calls and redirect to _stat64?

I don't think I grab your point here.  "#define stat __stat64" cannot
exist from the start so how would you do that?
--
Michael

Attachment

Re: stat() on Windows might cause error if target file is larger than 4GB

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Sep 12, 2018 at 08:17:05PM -0400, Tom Lane wrote:
>> And we can't just "#define stat __stat64" because
>> that would break the calls to stat().  Or wait, maybe we could do that
>> and also have a one-liner function named __stat64 that would catch the
>> calls and redirect to _stat64?

> I don't think I grab your point here.  "#define stat __stat64" cannot
> exist from the start so how would you do that?

What I was vaguely imagining is that win32_port.h could #include
whichever Windows header defines these functions and structs, and
then do

#define stat __stat64

static inline ... __stat64(...) { return _stat64(...); }

What would need testing is whether the #define has nasty side-effects
even if we've already included the system header.  I don't think it'd
hurt, eg, local variables named "stat"; though people might be surprised
when examining things in a debugger.

            regards, tom lane


Re: stat() on Windows might cause error if target file is larger than 4GB

From
Robert Haas
Date:
On Thu, Sep 13, 2018 at 10:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What I was vaguely imagining is that win32_port.h could #include
> whichever Windows header defines these functions and structs, and
> then do
>
> #define stat __stat64
>
> static inline ... __stat64(...) { return _stat64(...); }
>
> What would need testing is whether the #define has nasty side-effects
> even if we've already included the system header.  I don't think it'd
> hurt, eg, local variables named "stat"; though people might be surprised
> when examining things in a debugger.

This, to me, seems way too clever.  Replacing 'struct stat' with
something else everywhere in the code is more churn, but far less
likely to have surprising consequences down the road.  Or so I would
think, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Michael Paquier
Date:
On Thu, Sep 13, 2018 at 02:23:47PM -0400, Robert Haas wrote:
> This, to me, seems way too clever.  Replacing 'struct stat' with
> something else everywhere in the code is more churn, but far less
> likely to have surprising consequences down the road.  Or so I would
> think, anyway.

I don't have the mind set to work on that today (enough Windows-ism for
the day), but I would rather use the clever solution because, as far as
I know, we want a back-patchable solution so things should not be
invasive.
--
Michael

Attachment

Re: stat() on Windows might cause error if target file is larger than 4GB

From
Robert Haas
Date:
On Thu, Sep 13, 2018 at 10:05 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Sep 13, 2018 at 02:23:47PM -0400, Robert Haas wrote:
>> This, to me, seems way too clever.  Replacing 'struct stat' with
>> something else everywhere in the code is more churn, but far less
>> likely to have surprising consequences down the road.  Or so I would
>> think, anyway.
>
> I don't have the mind set to work on that today (enough Windows-ism for
> the day), but I would rather use the clever solution because, as far as
> I know, we want a back-patchable solution so things should not be
> invasive.

I do not think that using #define to play clever tricks like this can
reasonably be classified as non-invasive.  Non-invasive doesn't mean
it touches a small number of lines; it means it's unlikely to break
stuff.  Otherwise,

#define continue break

would qualify as non-invasive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: stat() on Windows might cause error if target file is larger than 4GB

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I do not think that using #define to play clever tricks like this can
> reasonably be classified as non-invasive.  Non-invasive doesn't mean
> it touches a small number of lines; it means it's unlikely to break
> stuff.  Otherwise,
> #define continue break
> would qualify as non-invasive.

This argument would hold more water if it weren't that "stat" is already
a macro in our Windows port:

#ifndef UNSAFE_STAT_OK
extern int    pgwin32_safestat(const char *path, struct stat *buf);
#define stat(a,b) pgwin32_safestat(a,b)
#endif

Admittedly, a macro with params will fire in fewer places than one
without, but claiming that the current situation is entirely surprise-free
seems wrong.

I also think that you're underestimating the value of continuing to spell
"struct stat" in the standard way.  People know what that is, if they've
done any Unix programming before, whereas "pg_struct_stat" requires some
learning.

More, I can just about guarantee that even if we make the substitution
today, new occurrences of "struct stat" will sneak in via patches, because
not everybody will remember this PG-ism all the time.  Yeah, probably the
buildfarm will find those mistakes, but maybe not quickly or reliably ---
I think it'd only show up as a warning not an error, which isn't going to
be something we'd notice easily.

So I'm not buying that "#define stat" is so evil it should be rejected
out of hand.  It may be that it doesn't work for some reason, but we
should at least test it.

            regards, tom lane


Re: stat() on Windows might cause error if target file is largerthan 4GB

From
Alvaro Herrera
Date:
On 2018-Sep-13, Tom Lane wrote:

> What I was vaguely imagining is that win32_port.h could #include
> whichever Windows header defines these functions and structs, and
> then do
> 
> #define stat __stat64
> 
> static inline ... __stat64(...) { return _stat64(...); }
> 
> What would need testing is whether the #define has nasty side-effects
> even if we've already included the system header.  I don't think it'd
> hurt, eg, local variables named "stat"; though people might be surprised
> when examining things in a debugger.

Did anybody test this idea?  It seems we let this problem slip unfixed,
which means Windows users cannot use pg_dump -Fd (incl. parallel dump)
when output files are large.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: stat() on Windows might cause error if target file is larger than 4GB

From
Juan José Santamaría Flecha
Date:

On Sat, May 9, 2020 at 2:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2018-Sep-13, Tom Lane wrote:

> What I was vaguely imagining is that win32_port.h could #include
> whichever Windows header defines these functions and structs, and
> then do
>
> #define stat __stat64
>
> static inline ... __stat64(...) { return _stat64(...); }
>
> What would need testing is whether the #define has nasty side-effects
> even if we've already included the system header.  I don't think it'd
> hurt, eg, local variables named "stat"; though people might be surprised
> when examining things in a debugger.

Did anybody test this idea?  It seems we let this problem slip unfixed,
which means Windows users cannot use pg_dump -Fd (incl. parallel dump)
when output files are large.

This issue gets reported from time to time as bug, it also affects COPY. There is an open item for so:


Regards,

Juan José Santamaría Flecha