Thread: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows
Hello, In Windows, if one needs to take a dump in plain text format (this is the default option, or can be specified using -Fp) with some level of compression (-Z[0-9]), an output file has to be specified. Otherwise, if the output is redirected to stdout, it'll create a corrupted dump (cmd is set to ASCII mode, so it'll put carriage returns in the file). I'm referring the following part of pg_dump code: /* * On Windows, we need to use binary mode to read/write non-text archive * formats. Force stdin/stdout into binary mode if that is what we are * using. */ #ifdef WIN32 if (fmt != archNull && (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0)) { if (mode == archModeWrite) setmode(fileno(stdout), O_BINARY); else setmode(fileno(stdin), O_BINARY); } #endif For plain-text format, fmt is set to archNull. In that case, the binary mode will not be forced(I think). To fix this, I've attached a patch which adds one extra check in the 'if condition' to check the compression level. PFA. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > Hello, > In Windows, if one needs to take a dump in plain text format (this is > the default option, or can be specified using -Fp) with some level of > compression (-Z[0-9]), an output file has to > be specified. Otherwise, if the output is redirected to stdout, it'll > create a corrupted dump (cmd is set to ASCII mode, so it'll put > carriage returns in the file). To reproduce the issue, please use the following command in windows cmd: pg_dump -Z 9 test > E:\test_xu.backup pg_dump -Fp -Z 9 test > E:\test_xu.backup -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh > <kuntalghosh.2007@gmail.com> wrote: >> Hello, >> In Windows, if one needs to take a dump in plain text format (this is >> the default option, or can be specified using -Fp) with some level of >> compression (-Z[0-9]), an output file has to >> be specified. Otherwise, if the output is redirected to stdout, it'll >> create a corrupted dump (cmd is set to ASCII mode, so it'll put >> carriage returns in the file). > To reproduce the issue, please use the following command in windows cmd: > > pg_dump -Z 9 test > E:\test_xu.backup > pg_dump -Fp -Z 9 test > E:\test_xu.backup This is a known problem. It is not specific to PostgreSQL, it affects any software that attempts to use stdin/stdout on Windows via cmd, where it is not 8-bit clean. We don't just refuse to run with stdout as a destination because it's perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far as I know, tell whether it's being invoked by cmd or something else. If you have concrete ideas on how to improve this they'd be welcomed. Is there anywhere you expected to find info in the docs? Do you know of a way to detect in Windows if the output stream is not 8-bit clean from within the application program? ... other? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: >> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh >> <kuntalghosh.2007@gmail.com> wrote: >>> Hello, >>> In Windows, if one needs to take a dump in plain text format (this is >>> the default option, or can be specified using -Fp) with some level of >>> compression (-Z[0-9]), an output file has to >>> be specified. Otherwise, if the output is redirected to stdout, it'll >>> create a corrupted dump (cmd is set to ASCII mode, so it'll put >>> carriage returns in the file). >> To reproduce the issue, please use the following command in windows cmd: >> >> pg_dump -Z 9 test > E:\test_xu.backup >> pg_dump -Fp -Z 9 test > E:\test_xu.backup > > This is a known problem. It is not specific to PostgreSQL, it affects > any software that attempts to use stdin/stdout on Windows via cmd, > where it is not 8-bit clean. > > We don't just refuse to run with stdout as a destination because it's > perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far > as I know, tell whether it's being invoked by cmd or something else. ASAICU, if we use binary mode, output is stored bit by bit. In ASCII mode, cmd pokes its nose and does CR / LF conversions on its own. So, whenever we want compression on a plain-text dump file, we can set the stdout mode to O_BINARY. Is it a wrong approach? > If you have concrete ideas on how to improve this they'd be welcomed. > Is there anywhere you expected to find info in the docs? Do you know > of a way to detect in Windows if the output stream is not 8-bit clean > from within the application program? ... other? Actually, I'm not that familiar with windows environment. But, I couldn't find any note to user in pg_dump documentation regarding the issue. In cmd, if someone needs a plain-text dump in compressed format, they should specify the output file, otherwise they may run into the above problem. However, if a dump is corrupted due to the above issue, a fix for that is provided in [1]. Should we include this in the documentation? [1] http://www.gzip.org/ Use fixgz.c to remove the extra CR (carriage return) bytes. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On Fri, Mar 24, 2017 at 2:17 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: >>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh >>> <kuntalghosh.2007@gmail.com> wrote: >>>> Hello, >>>> In Windows, if one needs to take a dump in plain text format (this is >>>> the default option, or can be specified using -Fp) with some level of >>>> compression (-Z[0-9]), an output file has to >>>> be specified. Otherwise, if the output is redirected to stdout, it'll >>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put >>>> carriage returns in the file). >>> To reproduce the issue, please use the following command in windows cmd: >>> >>> pg_dump -Z 9 test > E:\test_xu.backup >>> pg_dump -Fp -Z 9 test > E:\test_xu.backup >> >> This is a known problem. It is not specific to PostgreSQL, it affects >> any software that attempts to use stdin/stdout on Windows via cmd, >> where it is not 8-bit clean. >> >> We don't just refuse to run with stdout as a destination because it's >> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far >> as I know, tell whether it's being invoked by cmd or something else. > ASAICU, if we use binary mode, output is stored bit by bit. In ASCII > mode, cmd pokes its nose and does CR / LF conversions on its own. So, > whenever we want compression on a plain-text dump file, we can set the > stdout mode to O_BINARY. Is it a wrong approach? With the help from Ashutosh Sharma, I tested this in Windows environment. Sadly, it still doesn't work. :( IMHO, we should document the issue somewhere. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: >> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII >> mode, cmd pokes its nose and does CR / LF conversions on its own. So, >> whenever we want compression on a plain-text dump file, we can set the >> stdout mode to O_BINARY. Is it a wrong approach? > With the help from Ashutosh Sharma, I tested this in Windows > environment. Sadly, it still doesn't work. :( IMHO, we should document > the issue somewhere. Why not? I mean, if there's code there to force the output into binary mode, does that not work for the -Fc case? And if it does work for the -Fc case, then why doesn't it also work for -Z9? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh > <kuntalghosh.2007@gmail.com> wrote: >>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII >>> mode, cmd pokes its nose and does CR / LF conversions on its own. So, >>> whenever we want compression on a plain-text dump file, we can set the >>> stdout mode to O_BINARY. Is it a wrong approach? >> With the help from Ashutosh Sharma, I tested this in Windows >> environment. Sadly, it still doesn't work. :( IMHO, we should document >> the issue somewhere. > > Why not? I mean, if there's code there to force the output into > binary mode, does that not work for the -Fc case? And if it does work > for the -Fc case, then why doesn't it also work for -Z9? > I have re-verified the patch with the help of my colleague 'Neha Sharma' on Windows server 2008 R2 machine and the fix looks to be working fine. With the help of attached patch, i could see that when pg_dump is ran in default mode (i.e. -Fp mode) with compression level > 0 and output redirected to stdout, the dump file generated is not a corrupted file and can be easily decompressed with the help of gzip command and also it can be easily restored without any issues. Thanks. Please note that earlier i was not able to reproduce the issue on my windows setup and had to borrow Neha's machine to validate the fix. Sorry, for the delayed response. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh >> <kuntalghosh.2007@gmail.com> wrote: >>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII >>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So, >>>> whenever we want compression on a plain-text dump file, we can set the >>>> stdout mode to O_BINARY. Is it a wrong approach? >>> With the help from Ashutosh Sharma, I tested this in Windows >>> environment. Sadly, it still doesn't work. :( IMHO, we should document >>> the issue somewhere. >> >> Why not? I mean, if there's code there to force the output into >> binary mode, does that not work for the -Fc case? And if it does work >> for the -Fc case, then why doesn't it also work for -Z9? >> > > I have re-verified the patch with the help of my colleague 'Neha > Sharma' on Windows server 2008 R2 machine and the fix looks to be > working fine. With the help of attached patch, > Did you intended to attach the patch to this e-mail or are you referring to Kuntal's patch up thread? If later, then it is better to mention the link of mail which has a patch that you have verified. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> Hi, >> >> On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh >>> <kuntalghosh.2007@gmail.com> wrote: >>>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII >>>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So, >>>>> whenever we want compression on a plain-text dump file, we can set the >>>>> stdout mode to O_BINARY. Is it a wrong approach? >>>> With the help from Ashutosh Sharma, I tested this in Windows >>>> environment. Sadly, it still doesn't work. :( IMHO, we should document >>>> the issue somewhere. >>> >>> Why not? I mean, if there's code there to force the output into >>> binary mode, does that not work for the -Fc case? And if it does work >>> for the -Fc case, then why doesn't it also work for -Z9? >>> >> >> I have re-verified the patch with the help of my colleague 'Neha >> Sharma' on Windows server 2008 R2 machine and the fix looks to be >> working fine. With the help of attached patch, >> > > Did you intended to attach the patch to this e-mail or are you > referring to Kuntal's patch up thread? If later, then it is better to > mention the link of mail which has a patch that you have verified. > I am referring to Kuntal's patch upthread. The link for the email having the patch is, https://www.postgresql.org/message-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Ashutosh Sharma <ashu.coek88@gmail.com> writes: > On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> Why not? I mean, if there's code there to force the output into >>>> binary mode, does that not work for the -Fc case? And if it does work >>>> for the -Fc case, then why doesn't it also work for -Z9? >>> I have re-verified the patch with the help of my colleague 'Neha >>> Sharma' on Windows server 2008 R2 machine and the fix looks to be >>> working fine. With the help of attached patch, >> Did you intended to attach the patch to this e-mail or are you >> referring to Kuntal's patch up thread? If later, then it is better to >> mention the link of mail which has a patch that you have verified. > I am referring to Kuntal's patch upthread. The link for the email > having the patch is, > https://www.postgresql.org/message-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw%40mail.gmail.com That patch looks reasonable to me, will push. regards, tom lane
I wrote: > That patch looks reasonable to me, will push. On closer inspection, the patch did contain a bug: it tested for compression being active with "compression > 0", which is the wrong thing --- everyplace else in pg_dump tests with "compression != 0". This is important because Z_DEFAULT_COMPRESSION is -1. The net effect is that the patch would have appeared to fix the bug when writing "-Z1".."-Z9", but not when writing just "-Z". Possibly this explains why Kuntal couldn't confirm success at one point upthread. Pushed with that correction and some wordsmithing on the comment. regards, tom lane