Re: Upcoming PG re-releases - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Upcoming PG re-releases
Date
Msg-id 200512062025.jB6KPDK02212@candle.pha.pa.us
Whole thread Raw
In response to Re: Upcoming PG re-releases  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Upcoming PG re-releases  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I have added your suggestions to the 8.1.X release notes.
> > 
> > Did you read the followup discussion?  Recommending -c without a large
> > warning seems a very bad idea.
> 
> Well, I said it would remove invalid sequences.  What else should we
> say?
> 
>     This will remove invalid character sequences.
> 
> I saw no clear solution that allowed sequences to be corrected.

The release note text is:
Some users are having problems loading <literal>UTF8</> data into 8.1.X. This is because previous versions allowed
invalid<literal>UTF8</>sequences to be entered into the database, and this release properlyaccepts only valid
<literal>UTF8</>sequences.    One way to correct adumpfile is to use <command>iconv -c -f UTF-8 -t UTF-8</>. This
willremoveinvalid character sequences. <command>iconv</> reads the entireinput file into memory so it might be
necessaryto <command>split</> thedump into multiple smaller files for processing.
 

One nice solution would be if iconv would report the lines with errors
and you could correct them, but I see no way to do that.  The only thing
you could do is to diff the old and new files to see the problems.  Is
that helpful?  Here is new text I have used:
Some users are having problems loading <literal>UTF8</> data into 8.1.X. This is because previous versions allowed
invalid<literal>UTF8</>sequences to be entered into the database, and this release properlyaccepts only valid
<literal>UTF8</>sequences.  One way to correct adumpfile is to use <command>iconv -c -f UTF-8 -t UTF-8 -o
cleanfile.sqldumpfile.sql</>. The <literal>-c</> option removes invalid charactersequences.  A diff of the two files
willshow the sequences that areinvalid.  <command>iconv</> reads the entire input file into memory soit might be
necessaryto <command>split</> the dump into multiplesmaller files for processing.
 

It highlights the 'diff' idea.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Concurrent CREATE INDEX, try 2 (was Re: Reducing relation locking overhead)
Next
From: Hannu Krosing
Date:
Subject: Re: Concurrent CREATE INDEX, try 2 (was Re: Reducing