Thread: Czech2ASCII with --mb=Latin2

Czech2ASCII with --mb=Latin2

From
Robert
Date:
Hi,

  I have a database in Latin2 encoding (Czech stuff) and Latin2/Win1250
on-the-fly recoding with 'set client_encoding' works smoothly. Now, when
I set client encoding to SQL_ASCII, accented characters are converted to
(hexa) codes. Is there any (simple) way to make this recoding convert
accented characters to just the chars themselves but without accents?
Thanks in advance.

- Robert

P.S. Moreover, the non-Czech speakers tend to search the database with
words without accents, it would be usefull to make this conversion works
in the other direction: name LIKE 'ceske%' would return also names
starting with accented version.

P.S.2 I could do this quite easily in Perl on the application level, but
don't want to start programming before I'm sure there's no standard
postgres solution.



Re: [GENERAL] Czech2ASCII with --mb=Latin2

From
Karel Zak - Zakkr
Date:

On Wed, 15 Dec 1999, Robert wrote:

> Hi,
>
>   I have a database in Latin2 encoding (Czech stuff) and Latin2/Win1250
> on-the-fly recoding with 'set client_encoding' works smoothly. Now, when
> I set client encoding to SQL_ASCII, accented characters are converted to
> (hexa) codes. Is there any (simple) way to make this recoding convert
> accented characters to just the chars themselves but without accents?
> Thanks in advance.
>
> - Robert

 Ahoj :-)

 if I good remember, in PgSQL is not any routine for this (IMHO is it
lang-specific and make any generally (for all langs and encodings..etc)
routine is problem). But you can easy write this in C or Tcl.

                        Karel


Re: [GENERAL] Czech2ASCII with --mb=Latin2

From
Peter Eisentraut
Date:
On 1999-12-15, Robert mentioned:

>   I have a database in Latin2 encoding (Czech stuff) and Latin2/Win1250
> on-the-fly recoding with 'set client_encoding' works smoothly. Now, when
> I set client encoding to SQL_ASCII, accented characters are converted to
> (hexa) codes. Is there any (simple) way to make this recoding convert
> accented characters to just the chars themselves but without accents?

I think this sort of thing has been the dream of many folks using
internationalized software, but it's not that easy. Perhaps one could
write a function that does this sort of conversion, which would have to
keep a gigantic table internally.

However, perhaps in your language it's customary to just leave off the
diacritic marks if they're not available, but in other languages such as
Swedish or German there are rules about converting those to sequences of
other letters. And if you start encoding rules of natural languages into
software, oh boy ...

--
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



************


Re: [GENERAL] Czech2ASCII with --mb=Latin2

From
Peter Eisentraut
Date:
On 1999-12-15, Robert mentioned:

>   I have a database in Latin2 encoding (Czech stuff) and Latin2/Win1250
> on-the-fly recoding with 'set client_encoding' works smoothly. Now, when
> I set client encoding to SQL_ASCII, accented characters are converted to
> (hexa) codes. Is there any (simple) way to make this recoding convert
> accented characters to just the chars themselves but without accents?

I think this sort of thing has been the dream of many folks using
internationalized software, but it's not that easy. Perhaps one could
write a function that does this sort of conversion, which would have to
keep a gigantic table internally.

However, perhaps in your language it's customary to just leave off the
diacritic marks if they're not available, but in other languages such as
Swedish or German there are rules about converting those to sequences of
other letters. And if you start encoding rules of natural languages into
software, oh boy ...

--
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



************