Thread: [HACKERS] Re: [BUGS] BUG #14634: On Windows pg_basebackup should write tar tostdout in binary mode

[Adding -hackers mailing list]

On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
The following bug has been logged on the website:

Bug reference:      14634
Logged by:          Henry Boehlert
Email address:      henry_boehlert@agilent.com
PostgreSQL version: 9.6.2
Operating system:   Windows Server 2012 R2 6.3.9600
Description:

Executing command pg_basebackup -D -F t writes its output to stdout, which
is open in text mode, causing LF to be converted to CR LF thus corrupting
the resulting archive.

To write the tar to stdout, on Windows stdout's mode should be temporarily
switched to binary.

https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx

Thanks for reporting the issue. 
With the attached patch, I was able to extract the tar file that gets
generated when the tar file is written into stdout. I tested the
the compressed tar also.

This bug needs to be fixed in back branches also.

Regards,
Hari Babu
Fujitsu Australia
Attachment
On 3 May 2017 at 12:32, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> [Adding -hackers mailing list]
>
> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      14634
>> Logged by:          Henry Boehlert
>> Email address:      henry_boehlert@agilent.com
>> PostgreSQL version: 9.6.2
>> Operating system:   Windows Server 2012 R2 6.3.9600
>> Description:
>>
>> Executing command pg_basebackup -D -F t writes its output to stdout, which
>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>> the resulting archive.
>>
>> To write the tar to stdout, on Windows stdout's mode should be temporarily
>> switched to binary.
>>
>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>
>
> Thanks for reporting the issue.
> With the attached patch, I was able to extract the tar file that gets
> generated when the tar file is written into stdout. I tested the
> the compressed tar also.
>
> This bug needs to be fixed in back branches also.

We should do the same for pg_dump in -Fc mode.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services





On Wed, May 3, 2017 at 3:20 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 3 May 2017 at 12:32, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> [Adding -hackers mailing list]
>
> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      14634
>> Logged by:          Henry Boehlert
>> Email address:      henry_boehlert@agilent.com
>> PostgreSQL version: 9.6.2
>> Operating system:   Windows Server 2012 R2 6.3.9600
>> Description:
>>
>> Executing command pg_basebackup -D -F t writes its output to stdout, which
>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>> the resulting archive.
>>
>> To write the tar to stdout, on Windows stdout's mode should be temporarily
>> switched to binary.
>>
>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>
>
> Thanks for reporting the issue.
> With the attached patch, I was able to extract the tar file that gets
> generated when the tar file is written into stdout. I tested the
> the compressed tar also.
>
> This bug needs to be fixed in back branches also.

We should do the same for pg_dump in -Fc mode.

There are no "CR LF" characters that are present in the dump file
that is created with custom format.

what is the problem do you see in custom format that needs similar
handling like pg_basebackup?

Regards,
Hari Babu
Fujitsu Australia
Hi Craig,

On Wed, May 3, 2017 at 10:50 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 3 May 2017 at 12:32, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> [Adding -hackers mailing list]
>>
>> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      14634
>>> Logged by:          Henry Boehlert
>>> Email address:      henry_boehlert@agilent.com
>>> PostgreSQL version: 9.6.2
>>> Operating system:   Windows Server 2012 R2 6.3.9600
>>> Description:
>>>
>>> Executing command pg_basebackup -D -F t writes its output to stdout, which
>>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>>> the resulting archive.
>>>
>>> To write the tar to stdout, on Windows stdout's mode should be temporarily
>>> switched to binary.
>>>
>>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>>
>>
>> Thanks for reporting the issue.
>> With the attached patch, I was able to extract the tar file that gets
>> generated when the tar file is written into stdout. I tested the
>> the compressed tar also.
>>
>> This bug needs to be fixed in back branches also.
>
> We should do the same for pg_dump in -Fc mode.

Did you meant -Fp mode. I think we are already setting stdout file to
binary mode if the output format is custom. Please refer to the
following code in parseArchiveFormat() and _allocAH() respectively

static ArchiveFormat
parseArchiveFormat(const char *format, ArchiveMode *mode)
{   ...............   ...............   else if (pg_strcasecmp(format, "c") == 0)       archiveFormat = archCustom;
elseif (pg_strcasecmp(format, "custom") == 0)       archiveFormat = archCustom;
 
   else if (pg_strcasecmp(format, "p") == 0)       archiveFormat = archNull;   else if (pg_strcasecmp(format, "plain")
==0)       archiveFormat = archNull;   ...............   ...............
 
}

static ArchiveHandle *
_allocAH(const char *FileSpec, const ArchiveFormat fmt,        const int compression, bool dosync, ArchiveMode mode,
   SetupWorkerPtrType setupWorkerPtr)
 
{
   ...............   ...............
#ifdef WIN32   if (fmt != archNull &&       (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))   {       if (mode ==
archModeWrite)          setmode(fileno(stdout), O_BINARY);       else           setmode(fileno(stdin), O_BINARY);   }
 
#endif   ..................   ..................
}

Please confirm.

Meanwhile, I have unit tested the patch submitted by Hari upthread on
postgresql v10 and it works fine. Below are the steps that i have
followed to test Hari's patch.

Without patch:
==============
C:\Users\ashu\postgresql\TMP\test\bin>.\pg_basebackup.exe -D - -F t -X
none > base.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL seg
ments are copied through other means to complete the backup

C:\Users\ashu\postgresql\TMP\test\bin>tar -xf base.tar
tar: Skipping to next header
tar: Exiting with failure status due to previous errors


With patch:
===========
C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\pg_basebackup.exe
-D - -F t -X none > base.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL seg
ments are copied through other means to complete the backup

C:\Users\ashu\postgresql\TMP\test\bin>cp base.tar ..\basebakup

C:\Users\ashu\postgresql\TMP\test\basebakup>tar -xf base.tar

C:\Users\ashu\postgresql\TMP\test\basebakup>ls
PG_VERSION    pg_commit_ts   pg_multixact  pg_stat      pg_wal
backup_label  pg_dynshmem    pg_notify     pg_stat_tmp  pg_xact
base          pg_hba.conf    pg_replslot   pg_subtrans  postgresql.auto.conf
base.tar      pg_ident.conf  pg_serial     pg_tblspc    postgresql.conf
global        pg_logical     pg_snapshots  pg_twophase  tablespace_map

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com





On Wed, May 3, 2017 at 10:44 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Craig,

On Wed, May 3, 2017 at 10:50 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 3 May 2017 at 12:32, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> [Adding -hackers mailing list]
>>
>> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      14634
>>> Logged by:          Henry Boehlert
>>> Email address:      henry_boehlert@agilent.com
>>> PostgreSQL version: 9.6.2
>>> Operating system:   Windows Server 2012 R2 6.3.9600
>>> Description:
>>>
>>> Executing command pg_basebackup -D -F t writes its output to stdout, which
>>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>>> the resulting archive.
>>>
>>> To write the tar to stdout, on Windows stdout's mode should be temporarily
>>> switched to binary.
>>>
>>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>>
>>
>> Thanks for reporting the issue.
>> With the attached patch, I was able to extract the tar file that gets
>> generated when the tar file is written into stdout. I tested the
>> the compressed tar also.
>>
>> This bug needs to be fixed in back branches also.
>
> We should do the same for pg_dump in -Fc mode.

Did you meant -Fp mode. I think we are already setting stdout file to
binary mode if the output format is custom. Please refer to the
following code in parseArchiveFormat() and _allocAH() respectively

static ArchiveFormat
parseArchiveFormat(const char *format, ArchiveMode *mode)
{
    ...............
    ...............
    else if (pg_strcasecmp(format, "c") == 0)
        archiveFormat = archCustom;
    else if (pg_strcasecmp(format, "custom") == 0)
        archiveFormat = archCustom;

    else if (pg_strcasecmp(format, "p") == 0)
        archiveFormat = archNull;
    else if (pg_strcasecmp(format, "plain") == 0)
        archiveFormat = archNull;
    ...............
    ...............
}

static ArchiveHandle *
_allocAH(const char *FileSpec, const ArchiveFormat fmt,
         const int compression, bool dosync, ArchiveMode mode,
         SetupWorkerPtrType setupWorkerPtr)
{

    ...............
    ...............
#ifdef WIN32
    if (fmt != archNull &&
        (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
    {
        if (mode == archModeWrite)
            setmode(fileno(stdout), O_BINARY);
        else
            setmode(fileno(stdin), O_BINARY);
    }
#endif
    ..................
    ..................
}

Please confirm. 

I also think it is the plain text mode. There is no problem with text
mode file that contains the CR LF characters.

Meanwhile, I have unit tested the patch submitted by Hari upthread on
postgresql v10 and it works fine. Below are the steps that i have
followed to test Hari's patch.

Without patch:
==============
C:\Users\ashu\postgresql\TMP\test\bin>.\pg_basebackup.exe -D - -F t -X
none > base.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL seg
ments are copied through other means to complete the backup

C:\Users\ashu\postgresql\TMP\test\bin>tar -xf base.tar
tar: Skipping to next header
tar: Exiting with failure status due to previous errors


With patch:
===========
C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\pg_basebackup.exe
-D - -F t -X none > base.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL seg
ments are copied through other means to complete the backup

C:\Users\ashu\postgresql\TMP\test\bin>cp base.tar ..\basebakup

C:\Users\ashu\postgresql\TMP\test\basebakup>tar -xf base.tar

C:\Users\ashu\postgresql\TMP\test\basebakup>ls
PG_VERSION    pg_commit_ts   pg_multixact  pg_stat      pg_wal
backup_label  pg_dynshmem    pg_notify     pg_stat_tmp  pg_xact
base          pg_hba.conf    pg_replslot   pg_subtrans  postgresql.auto.conf
base.tar      pg_ident.conf  pg_serial     pg_tblspc    postgresql.conf
global        pg_logical     pg_snapshots  pg_twophase  tablespace_map


Thanks for the tests to verify the patch.

 
Regards,
Hari Babu
Fujitsu Australia
On 05/03/2017 07:32 AM, Haribabu Kommi wrote:
> [Adding -hackers mailing list]
>
> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:      14634
>> Logged by:          Henry Boehlert
>> Email address:      henry_boehlert@agilent.com
>> PostgreSQL version: 9.6.2
>> Operating system:   Windows Server 2012 R2 6.3.9600
>> Description:
>>
>> Executing command pg_basebackup -D -F t writes its output to stdout, which
>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>> the resulting archive.
>>
>> To write the tar to stdout, on Windows stdout's mode should be temporarily
>> switched to binary.
>>
>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>>
>
> Thanks for reporting the issue.
> With the attached patch, I was able to extract the tar file that gets
> generated when the tar file is written into stdout. I tested the
> the compressed tar also.
>
> This bug needs to be fixed in back branches also.

Seems reasonable. One question:

In the patch, you used "_setmode" function, while the calls in 
src/bin/pg_dump/pg_backup_archiver.c use "setmode". There are a few 
places in the backend that also use "_setmode". What's the difference? 
Should we change some of them to be consistent?

- Heikki






On Fri, Jul 14, 2017 at 2:54 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 05/03/2017 07:32 AM, Haribabu Kommi wrote:
[Adding -hackers mailing list]

On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:

The following bug has been logged on the website:

Bug reference:      14634
Logged by:          Henry Boehlert
Email address:      henry_boehlert@agilent.com
PostgreSQL version: 9.6.2
Operating system:   Windows Server 2012 R2 6.3.9600
Description:

Executing command pg_basebackup -D -F t writes its output to stdout, which
is open in text mode, causing LF to be converted to CR LF thus corrupting
the resulting archive.

To write the tar to stdout, on Windows stdout's mode should be temporarily
switched to binary.

https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx


Thanks for reporting the issue.
With the attached patch, I was able to extract the tar file that gets
generated when the tar file is written into stdout. I tested the
the compressed tar also.

This bug needs to be fixed in back branches also.

Seems reasonable. One question:

In the patch, you used "_setmode" function, while the calls in src/bin/pg_dump/pg_backup_archiver.c use "setmode". There are a few places in the backend that also use "_setmode". What's the difference? Should we change some of them to be consistent?

Actually there is no functional difference between these two functions.
one is a POSIX variant and another one is ISO C++ variant [1]. The support
of POSIX variant is deprecated in windows, because of this reason we should
use the _setmode instead of setmode.

I attached the patch to change the pg_dump code to use _setmode function
instead of _setmode to be consistent with other functions.



Regards,
Hari Babu
Fujitsu Australia
Attachment
On 07/14/2017 05:27 AM, Haribabu Kommi wrote:
> On Fri, Jul 14, 2017 at 2:54 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
>> On 05/03/2017 07:32 AM, Haribabu Kommi wrote:
>>
>>> [Adding -hackers mailing list]
>>>
>>> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehlert@agilent.com> wrote:
>>>
>>>> Executing command pg_basebackup -D -F t writes its output to stdout,
>>>> which
>>>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>>>> the resulting archive.
>>>>
>>>> To write the tar to stdout, on Windows stdout's mode should be
>>>> temporarily
>>>> switched to binary.
>>>>
>>>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>>>
>>> Thanks for reporting the issue.
>>> With the attached patch, I was able to extract the tar file that gets
>>> generated when the tar file is written into stdout. I tested the
>>> the compressed tar also.
>>>
>>> This bug needs to be fixed in back branches also.
>>
>> Seems reasonable. One question:
>>
>> In the patch, you used "_setmode" function, while the calls in
>> src/bin/pg_dump/pg_backup_archiver.c use "setmode". There are a few
>> places in the backend that also use "_setmode". What's the difference?
>> Should we change some of them to be consistent?
>
> Actually there is no functional difference between these two functions.
> one is a POSIX variant and another one is ISO C++ variant [1]. The support
> of POSIX variant is deprecated in windows, because of this reason we should
> use the _setmode instead of setmode.
>
> I attached the patch to change the pg_dump code to use _setmode function
> instead of _setmode to be consistent with other functions.

Ok, committed and backpatched both patches. Thanks!

- Heikki