Thread: Re: Ach. Now, Really, really final contact list

Re: Ach. Now, Really, really final contact list

From
"Magnus Hagander"
Date:
> > Something is cearly broken in the encodings of this file :-(
> 
> Which is why I suggested using HTML entities instead of UTF8 chars which
> are likely to get mangled.

UTF8 works just fine,provided you actually encode your files in it. Entities is primarily a workaround for encodings
thatcan't represent all characters.
 

But if you don't deal with the files as utf8, then it'll certainly be a problem.


> > César Villanueva 
> > 
> > Just doesn't seem right to me..
> > 
> > Also, did you intend to change Alvaros name? Looks to me like something's
> > fishy there?
> > <dd>Ã<81>lvaro Herrera
> > 
> > I've put the file up on
> > http://magnus-master.pgadmin.org/about/press/contact
> > 
> > There, Alvaros name looks correct, but not César...
> 
> It may look good to you, on your browser, but it looks broken on mine.

Interesting. Since the codes are actually incorrect I'm not surprised. But just out of interest - what's your browser?


/Magnus



Re: Ach. Now, Really, really final contact list

From
Alvaro Herrera
Date:
Magnus Hagander wrote:

> > Which is why I suggested using HTML entities instead of UTF8 chars which
> > are likely to get mangled.
> 
> UTF8 works just fine,provided you actually encode your files in it.
> Entities is primarily a workaround for encodings that can't represent
> all characters.
> 
> But if you don't deal with the files as utf8, then it'll certainly be
> a problem.

My point.  These days I mostly use UTF8, but presumably the file got
mangled somewhere down the line.  Remember that any decent email program
is supposed to recode the file according to the headers and the local
encoding of the reader -- and it works fine most of the time.  But for
these kind of things it is bound to fail at some point, which is why I
prefer to avoid the UTF8.  I mean, why use the unreliable solution when
the reliable one is just as easy?

> > It may look good to you, on your browser, but it looks broken on mine.
> 
> Interesting. Since the codes are actually incorrect I'm not surprised.
> But just out of interest - what's your browser?

Epiphany -- Gecko-based Gnome browser.  It is detecting the file as UTF8
(probably because the server says so) but it's showing the wrong chars
for the three names.  If I change it to latin1, it shows something even
more bogus.  (My guess is that it was utf8 originally and was passed
through a latin1->utf8 filter somewhere).  If so, iconv -f utf8 -t
latin1 should end up with a valid utf8 file (silly, eh?)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Ach. Now, Really, really final contact list

From
Magnus Hagander
Date:
On Fri, Feb 01, 2008 at 11:04:58AM -0300, Alvaro Herrera wrote:
> Magnus Hagander wrote:
> 
> > > Which is why I suggested using HTML entities instead of UTF8 chars which
> > > are likely to get mangled.
> > 
> > UTF8 works just fine,provided you actually encode your files in it.
> > Entities is primarily a workaround for encodings that can't represent
> > all characters.
> > 
> > But if you don't deal with the files as utf8, then it'll certainly be
> > a problem.
> 
> My point.  These days I mostly use UTF8, but presumably the file got
> mangled somewhere down the line.  Remember that any decent email program
> is supposed to recode the file according to the headers and the local
> encoding of the reader -- and it works fine most of the time.  But for
> these kind of things it is bound to fail at some point, which is why I
> prefer to avoid the UTF8.  I mean, why use the unreliable solution when
> the reliable one is just as easy?

UTF8 really isn't unreliable if oyu know what you're doing. And if you
don't, you can easily mess up entities as well. (we had a problem with that
just this week on the translations for pginstaller, for example)

Also, how is an email program decent if it recodes an attachment? It should
recode it if it's included as a text, but not if it's an attachment..


> > > It may look good to you, on your browser, but it looks broken on mine.
> > 
> > Interesting. Since the codes are actually incorrect I'm not surprised.
> > But just out of interest - what's your browser?
> 
> Epiphany -- Gecko-based Gnome browser.  It is detecting the file as UTF8
> (probably because the server says so) but it's showing the wrong chars
> for the three names.  If I change it to latin1, it shows something even

Interesting. I'm on ffox on Win ATM. Will have to check with ffox on Linux
later.


> more bogus.  (My guess is that it was utf8 originally and was passed
> through a latin1->utf8 filter somewhere).  If so, iconv -f utf8 -t
> latin1 should end up with a valid utf8 file (silly, eh?)

Yeah, it could be that it's doubly-converted.

But - Josh, didn't this stuff come out of a database? Fixing it at the
source is probably a lot better, or we'll have to do it over and over again...

//Magnus


Re: Ach. Now, Really, really final contact list

From
Josh Berkus
Date:
Alvaro,

> My point.  These days I mostly use UTF8, but presumably the file got
> mangled somewhere down the line.  Remember that any decent email program
> is supposed to recode the file according to the headers and the local
> encoding of the reader -- and it works fine most of the time.  But for
> these kind of things it is bound to fail at some point, which is why I
> prefer to avoid the UTF8.  I mean, why use the unreliable solution when
> the reliable one is just as easy?

Let me regenerate that file, and I'll zip it before sending it in, which 
should prevent browser/MUA mangling.

--Josh