Thread: Replace open mode with PG_BINARY_R/W/A macros

Replace open mode with PG_BINARY_R/W/A macros

From
Japin Li
Date:
Hi, hackers

I found we defined PG_BINARY_R/W/A macros for opening files, however,
there are some places use the constant strings.  IMO we should use
those macros instead of constant strings.  Here is a patch for it.
Any thoughts?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.


Attachment

Re: Replace open mode with PG_BINARY_R/W/A macros

From
Tom Lane
Date:
Japin Li <japinli@hotmail.com> writes:
> I found we defined PG_BINARY_R/W/A macros for opening files, however,
> there are some places use the constant strings.  IMO we should use
> those macros instead of constant strings.  Here is a patch for it.
> Any thoughts?

A lot of these changes look wrong to me: they are substituting "rb" for
"r", etc, in places that mean to read text files.  You have to think
about the Windows semantics.

If you think any of those changes are correct, then they are bug fixes
that need to be considered separately from cosmetic tidying.

            regards, tom lane



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Japin Li
Date:
On Mon, 18 Apr 2022 at 22:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Japin Li <japinli@hotmail.com> writes:
>> I found we defined PG_BINARY_R/W/A macros for opening files, however,
>> there are some places use the constant strings.  IMO we should use
>> those macros instead of constant strings.  Here is a patch for it.
>> Any thoughts?
>
> A lot of these changes look wrong to me: they are substituting "rb" for
> "r", etc, in places that mean to read text files.  You have to think
> about the Windows semantics.
>

I do this substituting, since the comment says it can be used for opening
text files.  Maybe I misunderstand the comment.

    /*
     *  NOTE:  this is also used for opening text files.
     *  WIN32 treats Control-Z as EOF in files opened in text mode.
     *  Therefore, we open files in binary mode on Win32 so we can read
     *  literal control-Z.  The other affect is that we see CRLF, but
     *  that is OK because we can already handle those cleanly.
     */
    #if defined(WIN32) || defined(__CYGWIN__)
    #define PG_BINARY   O_BINARY
    #define PG_BINARY_A "ab"
    #define PG_BINARY_R "rb"
    #define PG_BINARY_W "wb"
    #else
    #define PG_BINARY   0
    #define PG_BINARY_A "a"
    #define PG_BINARY_R "r"
    #define PG_BINARY_W "w"
    #endif

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Michael Paquier
Date:
On Tue, Apr 19, 2022 at 01:29:18PM +0800, Japin Li wrote:
> On Mon, 18 Apr 2022 at 22:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Japin Li <japinli@hotmail.com> writes:
>>> I found we defined PG_BINARY_R/W/A macros for opening files, however,
>>> there are some places use the constant strings.  IMO we should use
>>> those macros instead of constant strings.  Here is a patch for it.
>>> Any thoughts?
>>
>> A lot of these changes look wrong to me: they are substituting "rb" for
>> "r", etc, in places that mean to read text files.  You have to think
>> about the Windows semantics.

This reminded me of the business from a couple of years ago in
pgwin32_open() to enforce the text mode in the frontend if O_BINARY is
not specified.

> I do this substituting, since the comment says it can be used for opening
> text files.  Maybe I misunderstand the comment.

'b' is normally ignored on POSIX platforms (per the Linux man page for
fopen), but your patch has as effect to silently switch to binary mode
on Windows all those code paths.  See _setmode() in pgwin32_open(),
that changes the behavior of CRLF when reading or writing such files,
as described here:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170

The change in adminpack.c would be actually as 'b' should be ignored
on non-WIN32, but Tom's point is to not take lightly all the others.
--
Michael

Attachment

Re: Replace open mode with PG_BINARY_R/W/A macros

From
Tom Lane
Date:
Japin Li <japinli@hotmail.com> writes:
> On Mon, 18 Apr 2022 at 22:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A lot of these changes look wrong to me: they are substituting "rb" for
>> "r", etc, in places that mean to read text files.  You have to think
>> about the Windows semantics.

> I do this substituting, since the comment says it can be used for opening
> text files.  Maybe I misunderstand the comment.

I think the comment's at best misleading.  See e.g. 66f8687a8.
It might be okay to use "rb" to read a text file when there
is actually \r-stripping logic present, but you need to check
that.  Using "wb" to write a text file is flat wrong.

            regards, tom lane



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Japin Li
Date:
On Tue, 19 Apr 2022 at 14:14, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Apr 19, 2022 at 01:29:18PM +0800, Japin Li wrote:
>> On Mon, 18 Apr 2022 at 22:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Japin Li <japinli@hotmail.com> writes:
>>>> I found we defined PG_BINARY_R/W/A macros for opening files, however,
>>>> there are some places use the constant strings.  IMO we should use
>>>> those macros instead of constant strings.  Here is a patch for it.
>>>> Any thoughts?
>>>
>>> A lot of these changes look wrong to me: they are substituting "rb" for
>>> "r", etc, in places that mean to read text files.  You have to think
>>> about the Windows semantics.
>
> This reminded me of the business from a couple of years ago in
> pgwin32_open() to enforce the text mode in the frontend if O_BINARY is
> not specified.
>
>> I do this substituting, since the comment says it can be used for opening
>> text files.  Maybe I misunderstand the comment.
>
> 'b' is normally ignored on POSIX platforms (per the Linux man page for
> fopen), but your patch has as effect to silently switch to binary mode
> on Windows all those code paths.  See _setmode() in pgwin32_open(),
> that changes the behavior of CRLF when reading or writing such files,
> as described here:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170
>
> The change in adminpack.c would be actually as 'b' should be ignored
> on non-WIN32, but Tom's point is to not take lightly all the others.

Oh, I understand your points.  Thanks for the explanation.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Japin Li
Date:
On Tue, 19 Apr 2022 at 14:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Japin Li <japinli@hotmail.com> writes:
>> On Mon, 18 Apr 2022 at 22:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> A lot of these changes look wrong to me: they are substituting "rb" for
>>> "r", etc, in places that mean to read text files.  You have to think
>>> about the Windows semantics.
>
>> I do this substituting, since the comment says it can be used for opening
>> text files.  Maybe I misunderstand the comment.
>
> I think the comment's at best misleading.  See e.g. 66f8687a8.
> It might be okay to use "rb" to read a text file when there
> is actually \r-stripping logic present, but you need to check
> that.  Using "wb" to write a text file is flat wrong.
>

Thanks for the detail explanation.  Should we remove the misleading comment?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Tom Lane
Date:
Japin Li <japinli@hotmail.com> writes:
> On Tue, 19 Apr 2022 at 14:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the comment's at best misleading.  See e.g. 66f8687a8.
>> It might be okay to use "rb" to read a text file when there
>> is actually \r-stripping logic present, but you need to check
>> that.  Using "wb" to write a text file is flat wrong.

> Thanks for the detail explanation.  Should we remove the misleading comment?

We should rewrite it, not just remove it.  But I'm not 100% sure
what to say instead.  I wonder whether the comment's claims about
control-Z processing still apply on modern Windows.

Another question is whether we actually like the current shape of
the code.  I can see at least two different directions we might
prefer to the status quo:

* Invent '#define PG_TEXT_R "r"' and so on, and use those in the
calls that currently use plain "r" etc, establishing a project
policy that you should use one of these six macros and never the
underlying strings directly.  This perhaps has some advantages
in greppability and clarity of intent, but I can't help wondering
if it's mostly obsessive-compulsiveness.

* In the other direction, decide that the PG_BINARY_X macros are
offering no benefit at all and just rip 'em out, writing "rb" and
so on in their place.  POSIX specifies that the character "b" has
no effect on Unix-oid systems, and it has said that for thirty years
now, so we do not really need the platform dependency that presently
exists in the macro definitions.  The presence or absence of "b"
would serve fine as an indicator of intent, and there would be one
less PG-specific coding convention to remember.

Or maybe it's fine as-is.  Any sort of wide-ranging change like this
creates hazards for back-patching, so we shouldn't do it lightly.

            regards, tom lane



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Japin Li
Date:
On Tue, 19 Apr 2022 at 22:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Japin Li <japinli@hotmail.com> writes:
>> On Tue, 19 Apr 2022 at 14:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think the comment's at best misleading.  See e.g. 66f8687a8.
>>> It might be okay to use "rb" to read a text file when there
>>> is actually \r-stripping logic present, but you need to check
>>> that.  Using "wb" to write a text file is flat wrong.
>
>> Thanks for the detail explanation.  Should we remove the misleading comment?
>
> We should rewrite it, not just remove it.  But I'm not 100% sure
> what to say instead.  I wonder whether the comment's claims about
> control-Z processing still apply on modern Windows.
>

It might be true [1].

[1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170

> Another question is whether we actually like the current shape of
> the code.  I can see at least two different directions we might
> prefer to the status quo:
>
> * Invent '#define PG_TEXT_R "r"' and so on, and use those in the
> calls that currently use plain "r" etc, establishing a project
> policy that you should use one of these six macros and never the
> underlying strings directly.  This perhaps has some advantages
> in greppability and clarity of intent, but I can't help wondering
> if it's mostly obsessive-compulsiveness.
>
> * In the other direction, decide that the PG_BINARY_X macros are
> offering no benefit at all and just rip 'em out, writing "rb" and
> so on in their place.  POSIX specifies that the character "b" has
> no effect on Unix-oid systems, and it has said that for thirty years
> now, so we do not really need the platform dependency that presently
> exists in the macro definitions.  The presence or absence of "b"
> would serve fine as an indicator of intent, and there would be one
> less PG-specific coding convention to remember.
>

I'm incline the second direction if we need to change this.

> Or maybe it's fine as-is.  Any sort of wide-ranging change like this
> creates hazards for back-patching, so we shouldn't do it lightly.
>

Agreed.  Thanks again for the explanation.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Peter Eisentraut
Date:
On 19.04.22 16:21, Tom Lane wrote:
> * In the other direction, decide that the PG_BINARY_X macros are
> offering no benefit at all and just rip 'em out, writing "rb" and
> so on in their place.  POSIX specifies that the character "b" has
> no effect on Unix-oid systems, and it has said that for thirty years
> now, so we do not really need the platform dependency that presently
> exists in the macro definitions.  The presence or absence of "b"
> would serve fine as an indicator of intent, and there would be one
> less PG-specific coding convention to remember.

I can only imagine that there must have been some Unix systems that did 
not understand the "binary" APIs required for Windows.  (For example, 
neither the Linux nor the macOS open(2) man page mentions O_BINARY.) 
Otherwise, these macros don't make any sense, because then you could 
just write the thing directly on all platforms.



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 19.04.22 16:21, Tom Lane wrote:
>> * In the other direction, decide that the PG_BINARY_X macros are
>> offering no benefit at all and just rip 'em out, writing "rb" and
>> so on in their place.  POSIX specifies that the character "b" has
>> no effect on Unix-oid systems, and it has said that for thirty years
>> now, so we do not really need the platform dependency that presently
>> exists in the macro definitions.  The presence or absence of "b"
>> would serve fine as an indicator of intent, and there would be one
>> less PG-specific coding convention to remember.

> I can only imagine that there must have been some Unix systems that did 
> not understand the "binary" APIs required for Windows.  (For example, 
> neither the Linux nor the macOS open(2) man page mentions O_BINARY.) 
> Otherwise, these macros don't make any sense, because then you could 
> just write the thing directly on all platforms.

PG_BINARY is useful for open().  It's the PG_BINARY_R/W/A macros for
fopen() that are redundant per POSIX.  Possibly someone generalized
inappropriately; or maybe long ago we supported some platform that
rejected the "b" option?

            regards, tom lane



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Peter Eisentraut
Date:
On 20.04.22 22:29, Tom Lane wrote:
> PG_BINARY is useful for open().  It's the PG_BINARY_R/W/A macros for
> fopen() that are redundant per POSIX.  Possibly someone generalized
> inappropriately; or maybe long ago we supported some platform that
> rejected the "b" option?

I think the latter was the case.  I doubt it's still a problem.

I see some of the new code in pg_basebackup uses "wb" directly.  It 
would probably be good to fix that to be consistent one way or the 
other.  I vote for getting rid of the macros.




Re: Replace open mode with PG_BINARY_R/W/A macros

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 20.04.22 22:29, Tom Lane wrote:
>> PG_BINARY is useful for open().  It's the PG_BINARY_R/W/A macros for
>> fopen() that are redundant per POSIX.  Possibly someone generalized
>> inappropriately; or maybe long ago we supported some platform that
>> rejected the "b" option?

> I think the latter was the case.  I doubt it's still a problem.

We could find that out with little effort, at least for machines in the
buildfarm, by modifying c.h to use the form with "b" always.

> I see some of the new code in pg_basebackup uses "wb" directly.  It 
> would probably be good to fix that to be consistent one way or the 
> other.  I vote for getting rid of the macros.

Yeah, I suspect there have been other inconsistencies for years :-(

            regards, tom lane



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Japin Li
Date:
On Fri, 22 Apr 2022 at 04:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> On 20.04.22 22:29, Tom Lane wrote:
>>> PG_BINARY is useful for open().  It's the PG_BINARY_R/W/A macros for
>>> fopen() that are redundant per POSIX.  Possibly someone generalized
>>> inappropriately; or maybe long ago we supported some platform that
>>> rejected the "b" option?
>
>> I think the latter was the case.  I doubt it's still a problem.
>
> We could find that out with little effort, at least for machines in the
> buildfarm, by modifying c.h to use the form with "b" always.
>

I think we should also consider the popen() (see: OpenPipeStream() function),
on the Windows, it can use "b", however,  for linux, it might be not right.
So, modifying c.h to use the form with "b" isn't always right.

[1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/popen-wpopen?view=msvc-170

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Replace open mode with PG_BINARY_R/W/A macros

From
Tom Lane
Date:
Japin Li <japinli@hotmail.com> writes:
> I think we should also consider the popen() (see: OpenPipeStream() function),
> on the Windows, it can use "b", however,  for linux, it might be not right.

Oh, ugh ... POSIX says for popen():

    The behavior of popen() is specified for values of mode of r and
    w. Other modes such as rb and wb might be supported by specific
    implementations, but these would not be portable features. Note
    that historical implementations of popen() only check to see if
    the first character of mode is r. Thus, a mode of robert the robot
    would be treated as mode r, and a mode of anything else would be
    treated as mode w.

Maybe it's best to leave well enough alone here.

            regards, tom lane