Upcoming PG re-releases - Mailing list pgsql-hackers

From Gregory Maxwell
Subject Upcoming PG re-releases
Date
Msg-id e692861c0512040919x56c7b18fva497a198e4195707@mail.gmail.com
Whole thread Raw
In response to Re: Upcoming PG re-releases  (Neil Conway <neilc@samurai.com>)
Responses Re: Upcoming PG re-releases  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On 12/4/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Paul Lindner <lindner@inuus.com> writes:
> > On Sun, Dec 04, 2005 at 11:34:16AM -0500, Tom Lane wrote:
> >> Paul Lindner <lindner@inuus.com> writes:
> >>> iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql
> >>
> >> Is that really a one-size-fits-all solution?  Especially with -c?
>
> > I'd say yes, and the -c flag is needed so iconv strips out the
> > invalid characters.
>
> That's exactly what's bothering me about it.  If we recommend that
> we had better put a large THIS WILL DESTROY YOUR DATA warning first.
> The problem is that the data is not "invalid" from the user's point
> of view --- more likely, it's in some non-UTF8 encoding --- and so
> just throwing away some of the characters is unlikely to make people
> happy.

Nor is it even guarenteed to make the data load: If the column is
unique constrained and the removal of the non-UTF characters makes two
rows have the same data where they didn't before...

The way to preserve the data is to switch the column to be a bytea.


pgsql-hackers by date:

Previous
From: Kevin Brown
Date:
Subject: Re: Reducing relation locking overhead
Next
From: Tom Lane
Date:
Subject: Re: Reducing relation locking overhead