Re: pg_upgrade diffs on WIndows - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: pg_upgrade diffs on WIndows
Date
Msg-id 5047B47A.1020706@dunslane.net
Whole thread Raw
In response to Re: pg_upgrade diffs on WIndows  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: pg_upgrade diffs on WIndows
List pgsql-hackers
On 09/05/2012 03:50 PM, Andrew Dunstan wrote:
>
> On 09/05/2012 03:40 PM, Bruce Momjian wrote:
>> On Wed, Sep  5, 2012 at 03:17:40PM -0400, Andrew Dunstan wrote:
>>>> The PG_BINARY_W change has only been verified on a non-buildfarm
>>>> setup on my laptop (Mingw)
>>>>
>>>> Note that while it does look like there's a bug either in
>>>> pg_upgrade or pg_dumpall, it's probably mostly harmless (adding
>>>> some spurious CRs to function code bodies on Windows). I'd feel
>>>> happier if it didn't, and happier still if I knew for sure the
>>>> ultimate origin. Your pg_dumpall discovery above is interesting. I
>>>> might have time later on today to delve into all this. I'm out of
>>>> contact for the next few hours.
>>>
>>> OK, I now have a complete handle on what's going on here, and
>>> withdraw my earlier statement that I am confused on this issue :-)
>>>
>>> First, one lot of CRs is produced because the pg_upgrade test script
>>> calls pg_dumpall without -f and redirects that to a file, which
>>> Windows kindly opens on text mode. The solution to that is to change
>>> the test script to use pg_dumpall -f instead.
>>>
>>> The second lot of CRs (seen in the second dump file in the diff i
>>> previously sent) is produced by pg_upgrade writing its output in
>>> text mode, which turns LF into CRLF. The solution to that is the
>>> patch to dump.c I posted, which, as Bruce observed, does the same
>>> thing that pg_dumpall does. Arguably, it should also open the input
>>> file in binary, so that if there really is a CRLF in the dump it
>>> won't be eaten.
>> So, right now we are only add \r for function bodies, which is mostly
>> harmless, but what if a function body has strings with an embedded
>> newlines?  What about creating a table with newlines in its identifiers:
>>
>> CREATE TABLE "a
>> b" ("c
>> d" int);
>>
>> If \r is added in there, it would be a data corruption problem. Can you
>> test that?
>
> These are among the reasons why I am suggesting opening the file in 
> binary mode. You're right, that would be data corruption.
>
> I can set up a check, but it will take a bit of time.


As expected, we get a difference in field names. Here's the extract from 
the dumps diff (* again represents CR):

     ***************   *** 5220,5228 ****      --
      CREATE TABLE hasnewline (   !     "x      y" integer,   !     "a      b" text      );
   --- 5220,5228 ----      --
      CREATE TABLE hasnewline (   !     "x*      y" integer,   !     "a*      b" text      );

If we open the input and output files in binary mode in pg_upgrade's 
dump.c this disappears.

Given this, I think we have no choice but to apply the patch, all the 
way back to 9.0 in fact.

cheers

andrew





pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade diffs on WIndows
Next
From: Kohei KaiGai
Date:
Subject: Re: [bugfix] sepgsql didn't follow the latest core API changes