Re: Upcoming PG re-releases - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Upcoming PG re-releases
Date
Msg-id 20051204185505.GB698@svana.org
Whole thread Raw
In response to Upcoming PG re-releases  (Gregory Maxwell <gmaxwell@gmail.com>)
List pgsql-hackers
On Sun, Dec 04, 2005 at 12:19:32PM -0500, Gregory Maxwell wrote:
> > That's exactly what's bothering me about it.  If we recommend that
> > we had better put a large THIS WILL DESTROY YOUR DATA warning first.
> > The problem is that the data is not "invalid" from the user's point
> > of view --- more likely, it's in some non-UTF8 encoding --- and so
> > just throwing away some of the characters is unlikely to make people
> > happy.
>
> Nor is it even guarenteed to make the data load: If the column is
> unique constrained and the removal of the non-UTF characters makes two
> rows have the same data where they didn't before...
>
> The way to preserve the data is to switch the column to be a bytea.

Additionally, it's hard to suggest anything better without specific
knowledge of the characters that are incorrect and how they got there.

The ideal solution would be a way for people to identify problem data
*before* they dump so they have an opportunity to fix it. Something
like a module they can load and say:

select val from table where not utf8_validate(val);

This would allow people to examine the data while the system is still
running and fix it. Maybe we can code something up in plpgsql? Slow as
molasses but you'll be able to run it anywhere.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [PATCHES] snprintf() argument reordering not working
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] snprintf() argument reordering not working