Thread: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

[HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Kuntal Ghosh
Date:
Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

I'm referring the following part of pg_dump code:

    /*
     * On Windows, we need to use binary mode to read/write non-text archive
     * formats.  Force stdin/stdout into binary mode if that is what we are
     * using.
     */
#ifdef WIN32
    if (fmt != archNull &&
        (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
    {
        if (mode == archModeWrite)
            setmode(fileno(stdout), O_BINARY);
        else
            setmode(fileno(stdin), O_BINARY);
    }
#endif

For plain-text format, fmt is set to archNull. In that case, the
binary mode will not be forced(I think). To fix this, I've attached a
patch which adds one extra check in the 'if condition' to check the
compression level. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Kuntal Ghosh
Date:
On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> Hello,
> In Windows, if one needs to take a dump in plain text format (this is
> the default option, or can be specified using -Fp) with some level of
> compression (-Z[0-9]), an output file has to
> be specified. Otherwise, if the output is redirected to stdout, it'll
> create a corrupted dump (cmd is set to ASCII mode, so it'll put
> carriage returns in the file).
To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Craig Ringer
Date:
On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
>> Hello,
>> In Windows, if one needs to take a dump in plain text format (this is
>> the default option, or can be specified using -Fp) with some level of
>> compression (-Z[0-9]), an output file has to
>> be specified. Otherwise, if the output is redirected to stdout, it'll
>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>> carriage returns in the file).
> To reproduce the issue, please use the following command in windows cmd:
>
> pg_dump -Z 9 test > E:\test_xu.backup
> pg_dump -Fp -Z 9 test > E:\test_xu.backup

This is a known problem. It is not specific to PostgreSQL, it affects
any software that attempts to use stdin/stdout on Windows via cmd,
where it is not 8-bit clean.

We don't just refuse to run with stdout as a destination because it's
perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
as I know, tell whether it's being invoked by cmd or something else.

If you have concrete ideas on how to improve this they'd be welcomed.
Is there anywhere you expected to find info in the docs? Do you know
of a way to detect in Windows if the output stream is not 8-bit clean
from within the application program? ... other?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Kuntal Ghosh
Date:
On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>> <kuntalghosh.2007@gmail.com> wrote:
>>> Hello,
>>> In Windows, if one needs to take a dump in plain text format (this is
>>> the default option, or can be specified using -Fp) with some level of
>>> compression (-Z[0-9]), an output file has to
>>> be specified. Otherwise, if the output is redirected to stdout, it'll
>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>>> carriage returns in the file).
>> To reproduce the issue, please use the following command in windows cmd:
>>
>> pg_dump -Z 9 test > E:\test_xu.backup
>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>
> This is a known problem. It is not specific to PostgreSQL, it affects
> any software that attempts to use stdin/stdout on Windows via cmd,
> where it is not 8-bit clean.
>
> We don't just refuse to run with stdout as a destination because it's
> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
> as I know, tell whether it's being invoked by cmd or something else.
ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

> If you have concrete ideas on how to improve this they'd be welcomed.
> Is there anywhere you expected to find info in the docs? Do you know
> of a way to detect in Windows if the output stream is not 8-bit clean
> from within the application program? ... other?
Actually, I'm not that familiar with windows environment. But, I
couldn't find any note to user in pg_dump documentation regarding the
issue. In cmd, if someone needs a plain-text dump in compressed
format, they should specify the output file, otherwise they may run
into the above problem. However, if a dump is corrupted due to the
above issue, a fix for that is provided in [1]. Should we include this
in the documentation?



[1] http://www.gzip.org/
Use fixgz.c to remove the extra CR (carriage return) bytes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Kuntal Ghosh
Date:
On Fri, Mar 24, 2017 at 2:17 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>>> <kuntalghosh.2007@gmail.com> wrote:
>>>> Hello,
>>>> In Windows, if one needs to take a dump in plain text format (this is
>>>> the default option, or can be specified using -Fp) with some level of
>>>> compression (-Z[0-9]), an output file has to
>>>> be specified. Otherwise, if the output is redirected to stdout, it'll
>>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>>>> carriage returns in the file).
>>> To reproduce the issue, please use the following command in windows cmd:
>>>
>>> pg_dump -Z 9 test > E:\test_xu.backup
>>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>>
>> This is a known problem. It is not specific to PostgreSQL, it affects
>> any software that attempts to use stdin/stdout on Windows via cmd,
>> where it is not 8-bit clean.
>>
>> We don't just refuse to run with stdout as a destination because it's
>> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
>> as I know, tell whether it's being invoked by cmd or something else.
> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
> whenever we want compression on a plain-text dump file, we can set the
> stdout mode to O_BINARY. Is it a wrong approach?
With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: BUG: pg_dump generates corrupted gzip file in Windows

From
Robert Haas
Date:
On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>> whenever we want compression on a plain-text dump file, we can set the
>> stdout mode to O_BINARY. Is it a wrong approach?
> With the help from Ashutosh Sharma, I tested this in Windows
> environment. Sadly, it still doesn't work. :( IMHO, we should document
> the issue somewhere.

Why not?  I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case?  And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Ashutosh Sharma
Date:
Hi,

On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>>> whenever we want compression on a plain-text dump file, we can set the
>>> stdout mode to O_BINARY. Is it a wrong approach?
>> With the help from Ashutosh Sharma, I tested this in Windows
>> environment. Sadly, it still doesn't work. :( IMHO, we should document
>> the issue somewhere.
>
> Why not?  I mean, if there's code there to force the output into
> binary mode, does that not work for the -Fc case?  And if it does work
> for the -Fc case, then why doesn't it also work for -Z9?
>

I have re-verified the patch with the help of my colleague 'Neha
Sharma' on Windows server 2008 R2 machine and the fix looks to be
working fine. With the help of attached patch, i could see that when
pg_dump is ran in default mode (i.e. -Fp mode) with compression level
> 0 and output redirected to stdout, the dump file generated is not a
corrupted file and can be easily decompressed with the help of gzip
command and also it can be easily restored without any issues. Thanks.

Please note that earlier i was not able to reproduce the issue on my
windows setup and had to borrow Neha's machine to validate the fix.
Sorry, for the delayed response.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Amit Kapila
Date:
On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Hi,
>
> On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
>> <kuntalghosh.2007@gmail.com> wrote:
>>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>>>> whenever we want compression on a plain-text dump file, we can set the
>>>> stdout mode to O_BINARY. Is it a wrong approach?
>>> With the help from Ashutosh Sharma, I tested this in Windows
>>> environment. Sadly, it still doesn't work. :( IMHO, we should document
>>> the issue somewhere.
>>
>> Why not?  I mean, if there's code there to force the output into
>> binary mode, does that not work for the -Fc case?  And if it does work
>> for the -Fc case, then why doesn't it also work for -Z9?
>>
>
> I have re-verified the patch with the help of my colleague 'Neha
> Sharma' on Windows server 2008 R2 machine and the fix looks to be
> working fine. With the help of attached patch,
>

Did you intended to attach the patch to this e-mail or are you
referring to Kuntal's patch up thread?  If later, then it is better to
mention the link of mail which has a patch that you have verified.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Ashutosh Sharma
Date:
Hi,

On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Hi,
>>
>> On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
>>> <kuntalghosh.2007@gmail.com> wrote:
>>>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>>>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>>>>> whenever we want compression on a plain-text dump file, we can set the
>>>>> stdout mode to O_BINARY. Is it a wrong approach?
>>>> With the help from Ashutosh Sharma, I tested this in Windows
>>>> environment. Sadly, it still doesn't work. :( IMHO, we should document
>>>> the issue somewhere.
>>>
>>> Why not?  I mean, if there's code there to force the output into
>>> binary mode, does that not work for the -Fc case?  And if it does work
>>> for the -Fc case, then why doesn't it also work for -Z9?
>>>
>>
>> I have re-verified the patch with the help of my colleague 'Neha
>> Sharma' on Windows server 2008 R2 machine and the fix looks to be
>> working fine. With the help of attached patch,
>>
>
> Did you intended to attach the patch to this e-mail or are you
> referring to Kuntal's patch up thread?  If later, then it is better to
> mention the link of mail which has a patch that you have verified.
>

I am referring to Kuntal's patch upthread.  The link for the email
having the patch is,

https://www.postgresql.org/message-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw%40mail.gmail.com

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Tom Lane
Date:
Ashutosh Sharma <ashu.coek88@gmail.com> writes:
> On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>> On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Why not?  I mean, if there's code there to force the output into
>>>> binary mode, does that not work for the -Fc case?  And if it does work
>>>> for the -Fc case, then why doesn't it also work for -Z9?

>>> I have re-verified the patch with the help of my colleague 'Neha
>>> Sharma' on Windows server 2008 R2 machine and the fix looks to be
>>> working fine. With the help of attached patch,

>> Did you intended to attach the patch to this e-mail or are you
>> referring to Kuntal's patch up thread?  If later, then it is better to
>> mention the link of mail which has a patch that you have verified.

> I am referring to Kuntal's patch upthread.  The link for the email
> having the patch is,
> https://www.postgresql.org/message-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw%40mail.gmail.com

That patch looks reasonable to me, will push.
        regards, tom lane



Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

From
Tom Lane
Date:
I wrote:
> That patch looks reasonable to me, will push.

On closer inspection, the patch did contain a bug: it tested for
compression being active with "compression > 0", which is the wrong
thing --- everyplace else in pg_dump tests with "compression != 0".
This is important because Z_DEFAULT_COMPRESSION is -1.  The net
effect is that the patch would have appeared to fix the bug when
writing "-Z1".."-Z9", but not when writing just "-Z".  Possibly
this explains why Kuntal couldn't confirm success at one point
upthread.

Pushed with that correction and some wordsmithing on the comment.
        regards, tom lane