Thread: Re: From latin9 to sql_ascii??
--- Jaime Casanova <systemguards@yahoo.com> escribió: > --- Tom Lane <tgl@sss.pgh.pa.us> escribió: >> Jaime Casanova <systemguards@yahoo.com> writes: >>> => select to_ascii('Jiménez'); >>> will retrieve 'Jimenez' at least it works on >>> Latin1 encoding. >>> >>> Why it not work on Latin9, >> >> Probably because it hasn't got a table for Latin9. >> >> Feel free to contribute one --- see >> src/backend/utils/adt/ascii.c. This page shows the differences between Latin1 & Latin9: http://www.cs.tut.fi/~jkorpela/latin9.html The diffs are: 164: the euro symbol. (sql_ascii = 'E')??? 166: an S with a symbol above (sql_ascii = 'S') 168: the same but lower case (sql_ascii = 's') 180: an Z with a symbol above (sql_ascii = 'Z') 184: the same but lower case (sql_ascii = 'z') 188: it's an O merge with an E (sql_ascii = '')??? 189: the same but lower case (sql_ascii = '')??? 190: an Y with a 2 points above (sql_ascii = 'Y') Comments? regards, Jaime Casanova _________________________________________________________ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com
Attachment
Jaime Casanova <systemguards@yahoo.com> writes: > Why it not work on Latin9, >> Probably because it hasn't got a table for Latin9. >> >> Feel free to contribute one --- see >> src/backend/utils/adt/ascii.c. > This page shows the differences between Latin1 & > Latin9: > http://www.cs.tut.fi/~jkorpela/latin9.html > The diffs are: > 164: the euro symbol. (sql_ascii = 'E')??? > 166: an S with a symbol above (sql_ascii = 'S') > 168: the same but lower case (sql_ascii = 's') > 180: an Z with a symbol above (sql_ascii = 'Z') > 184: the same but lower case (sql_ascii = 'z') > 188: it's an O merge with an E (sql_ascii = '')??? > 189: the same but lower case (sql_ascii = '')??? > 190: an Y with a 2 points above (sql_ascii = 'Y') > Comments? Works for me. Anyone feel this is too big a change to push into 8.0? Strictly speaking it's a new feature, but it sure looks harmless from here. Personally I'd say that the euro symbol should map to ' ' not 'E', but am not set on that. regards, tom lane
Jaime Casanova wrote: > 188: it's an O merge with an E (sql_ascii = '')??? > 189: the same but lower case (sql_ascii = '')??? 'OE' and 'oe', most likely, but someone more familiar with French typography might correct me. -- Peter Eisentraut http://developer.postgresql.org/~petere/
--- Peter Eisentraut <peter_e@gmx.net> escribió: > Jaime Casanova wrote: > > 188: it's an O merge with an E (sql_ascii = > '')??? > > 189: the same but lower case (sql_ascii = > '')??? > > 'OE' and 'oe', most likely, but someone more > familiar with French > typography might correct me. > Something like that, i really doesn't know how to convert to sql_ascii that. Maybe just blank like Tom suggest about the euro symbol regards, Jaime Casanova _________________________________________________________ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com
On Fri, Dec 17, 2004 at 11:09:17PM +0100, Peter Eisentraut wrote: > Jaime Casanova wrote: > > 188: it's an O merge with an E (sql_ascii = '')??? > > 189: the same but lower case (sql_ascii = '')??? > > 'OE' and 'oe', most likely, but someone more familiar with French > typography might correct me. OE and oe would be correct, but we can't do that with the current code. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest "foo" someone someday shall type "supercalifragilisticexpialidocious" (5th Commandment for C programmers)
--- Tom Lane <tgl@sss.pgh.pa.us> escribió: > Jaime Casanova <systemguards@yahoo.com> writes: > > Why it not work on Latin9, > > >> Probably because it hasn't got a table for > Latin9. > >> > >> Feel free to contribute one --- see > >> src/backend/utils/adt/ascii.c. > > > This page shows the differences between Latin1 & > > Latin9: > > http://www.cs.tut.fi/~jkorpela/latin9.html > > > The diffs are: > > > 164: the euro symbol. (sql_ascii = > 'E')??? > > 166: an S with a symbol above (sql_ascii = 'S') > > 168: the same but lower case (sql_ascii = 's') > > 180: an Z with a symbol above (sql_ascii = 'Z') > > 184: the same but lower case (sql_ascii = 'z') > > 188: it's an O merge with an E (sql_ascii = > '')??? > > 189: the same but lower case (sql_ascii = > '')??? > > 190: an Y with a 2 points above (sql_ascii = 'Y') > > > Comments? > > Works for me. Anyone feel this is too big a change > to push into 8.0? > Strictly speaking it's a new feature, but it sure > looks harmless from here. You guys have the code, you guys have the power. I don't think it can cause any problem. :) > > Personally I'd say that the euro symbol should map > to ' ' not 'E', but am not set on that. > Maybe, someone that uses the euro symbol can comment?? if not, and you said that we can just map that symbol to ' '. Here's the *fixed* patch it's up to you wich one to use. regards, Jaime Casanova _________________________________________________________ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com*** src/backend/utils/adt/ascii.c.orig 2004-08-29 00:06:49.000000000 -0500 --- src/backend/utils/adt/ascii.c 2004-12-17 23:02:01.000000000 -0500 *************** *** 53,58 **** --- 53,66 ---- ascii = " A L LS \"SSTZ-ZZ a,l'ls ,sstz\"zzRAAAALCCCEEEEIIDDNNOOOOxRUUUUYTBraaaalccceeeeiiddnnoooo/ruuuuyt."; range = RANGE_160; } + else if (enc == PG_LATIN9) + { + /* + * ISO-8859-15 <range: 160 -- 255> + */ + ascii = " cL YS sCa -R Zu .z EeY?AAAAAAACEEEEIIII NOOOOOxOUUUUYTBaaaaaaaceeeeiiii nooooo/ouuuuyty"; + range = RANGE_160; + } else if (enc == PG_WIN1250) { /*
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Fri, Dec 17, 2004 at 11:09:17PM +0100, Peter Eisentraut wrote: >> 'OE' and 'oe', most likely, but someone more familiar with French >> typography might correct me. > OE and oe would be correct, but we can't do that with the current code. More to the point, there are no such characters in 7-bit ASCII. I think Alvaro might be suggesting that to_ascii() should expand these to the two-character sequences "OE" and "oe", but ISTM that opens a can of worms better left sealed. There are a *lot* of characters that have translations of differing levels of plausibility into ASCII. I'm okay with dropping accent marks but I'm not sure about doing more than that. regards, tom lane
Jaime Casanova <systemguards@yahoo.com> writes: > Here's the *fixed* patch it's up to you wich one to > use. I applied this one. regards, tom lane