Thread: ERROR: translation failed from server encoding to wchar_t
I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : "translation failed from server encoding to wchar_t" My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). In 8.1 with tsearch2 it worked perfectly ... Thanks for you help, ilan
http://pastebin.ca/845670 This url provides a testcase ... fisrt pass : => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; ERROR: translation failed from server encoding to wchar_t second pass : => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 The to_tsvector seems to accept the row at random ... On Jan 7, 9:16 pm, ila...@gmail.com wrote: > I am using tsearch2 with pgsql 8.2.5 and get the following error when > calling to_tsvector : > "translation failed from server encoding to wchar_t" > > My database is UTF8 encoded and the data sent to to_tsvector comes > from a bytea column converted to text with > encode(COLUMN, 'escape'). > > In 8.1 with tsearch2 it worked perfectly ... > > Thanks for you help, > > ilan
http://pastebin.ca/845696 added schema ... On Jan 7, 10:21 pm, ila...@gmail.com wrote: > http://pastebin.ca/845670 > > This url provides a testcase ... > > fisrt pass : > => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', > encode(messageblk, 'escape')) where messageblk_idnr = 12949; > ERROR: translation failed from server encoding to wchar_t > > second pass : > => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', > encode(messageblk, 'escape')) where messageblk_idnr = 12949; > UPDATE 1 > > The to_tsvector seems to accept the row at random ... > > On Jan 7, 9:16 pm, ila...@gmail.com wrote: > > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > calling to_tsvector : > > "translation failed from server encoding to wchar_t" > > > My database is UTF8 encoded and the data sent to to_tsvector comes > > from a bytea column converted to text with > > encode(COLUMN, 'escape'). > > > In 8.1 with tsearch2 it worked perfectly ... > > > Thanks for you help, > > > ilan
Found something interesting with this testcase. update fails after SELECT query. Can anyone confirm this ??? dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 dbname=> select * from dbmail_messageblks where messageblk_idnr = 12949;messageblk_idnr | physmessage_id | messageblk | blocksize | is_header | idxfti -----------------+---------------- +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +-----------+----------- +------------------------------------------------------------------------------------------------- 12949 | 6319 | l'\351quipe de Casinos-park a bien re\347u votre messsage. \012\012Vous aurez une r\351ponse d\350s que l'un de nos responsables aura pris connaissance de votre envoi. \012\012cordialement\012\012l'\351quipe de casinos-park. \012\012====================\012\012The team of Casinos-park received your messsage.\012\012You will have an answer as soon as one of our persons in charge takes note of your sending. \012\012Best regards \012\012The team of casinos-park.\012 | 398 | 0 | '=':3 'e':5 'h':11 'i':2,10 'k':12 'l':1 'o':7 'p':9,16 'r':8 's':6 't':13 'u':15 'y':14 'rk':4 (1 row) dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; ERROR: translation failed from server encoding to wchar_t dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 On Jan 7, 10:21 pm, ila...@gmail.com wrote: > http://pastebin.ca/845670 > > This url provides a testcase ... > > fisrt pass : > => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', > encode(messageblk, 'escape')) where messageblk_idnr = 12949; > ERROR: translation failed from server encoding to wchar_t > > second pass : > => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', > encode(messageblk, 'escape')) where messageblk_idnr = 12949; > UPDATE 1 > > The to_tsvector seems to accept the row at random ... > > On Jan 7, 9:16 pm, ila...@gmail.com wrote: > > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > calling to_tsvector : > > "translation failed from server encoding to wchar_t" > > > My database is UTF8 encoded and the data sent to to_tsvector comes > > from a bytea column converted to text with > > encode(COLUMN, 'escape'). > > > In 8.1 with tsearch2 it worked perfectly ... > > > Thanks for you help, > > > ilan
ilanco@gmail.com writes: > I am using tsearch2 with pgsql 8.2.5 and get the following error when > calling to_tsvector : > "translation failed from server encoding to wchar_t" > My database is UTF8 encoded and the data sent to to_tsvector comes > from a bytea column converted to text with > encode(COLUMN, 'escape'). Two likely theories: 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes some other encoding. 2. The encode() is yielding something that isn't valid UTF-8. PG 8.3 contains checks that should complain about both of these scenarios, but IIRC 8.2 does not. regards, tom lane
On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote: > ila...@gmail.com writes: > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > calling to_tsvector : > > "translation failed from server encoding to wchar_t" > > My database is UTF8 encoded and the data sent to to_tsvector comes > > from a bytea column converted to text with > > encode(COLUMN, 'escape'). > > Two likely theories: > > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes > some other encoding. > > 2. The encode() is yielding something that isn't valid UTF-8. > > PG 8.3 contains checks that should complain about both of these > scenarios, but IIRC 8.2 does not. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majord...@postgresql.org so that your > message can get through to the mailing list cleanly Dear Tom, Thanks for your reply. This is the output of `locale` on my system : # locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= As for your second scenario I guess you are right, it's possible encode does not return all UTF8 characters. But to_tsvector() succeeds and fails at random with this kind of characters... So how can I sanitize output from encode before I pipe it to to_tsvector() ? Regards, Ilan
On Jan 8, 10:43 am, ila...@gmail.com wrote: > On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote: > > > > > ila...@gmail.com writes: > > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > > calling to_tsvector : > > > "translation failed from server encoding to wchar_t" > > > My database is UTF8 encoded and the data sent to to_tsvector comes > > > from a bytea column converted to text with > > > encode(COLUMN, 'escape'). > > > Two likely theories: > > > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes > > some other encoding. > > > 2. The encode() is yielding something that isn't valid UTF-8. > > > PG 8.3 contains checks that should complain about both of these > > scenarios, but IIRC 8.2 does not. > > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majord...@postgresql.org so that your > > message can get through to the mailing list cleanly > > Dear Tom, > > Thanks for your reply. > This is the output of `locale` on my system : > # locale > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_PAPER="en_US.UTF-8" > LC_NAME="en_US.UTF-8" > LC_ADDRESS="en_US.UTF-8" > LC_TELEPHONE="en_US.UTF-8" > LC_MEASUREMENT="en_US.UTF-8" > LC_IDENTIFICATION="en_US.UTF-8" > LC_ALL= > > As for your second scenario I guess you are right, it's possible > encode does not return all UTF8 characters. > But to_tsvector() succeeds and fails at random with this kind of > characters... > So how can I sanitize output from encode before I pipe it to > to_tsvector() ? > > Regards, > > Ilan Tom, To get around the non-UTF8 chars I used following function : CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS "trigger" AS $$ DECLARE BEGIN RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr; BEGIN NEW.idxFTI := to_tsvector('simple', encode($x$E$x$|| NEW.messageblk, 'escape')); RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr; RETURN NEW; EXCEPTION WHEN character_not_in_repertoireTHEN RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %', NEW.messageblk_idnr; NEW.idxFTI := to_tsvector('simple', 'character_not_in_repertoire: This email contains illegal characters.'); RETURN NEW; END; END; $$ LANGUAGE plpgsql; Hope this helps others with DBmail and tsearch2 on postgres 8.2 Thanks for your help Tom, ilan
ilanco@gmail.com wrote: > NEW.idxFTI := to_tsvector('simple', encode($x$E$x$|| > NEW.messageblk, 'escape')); > > I strongly doubt that this does what you think it does - I would check the results if I were you. The $x$E$x$ should almost certainly not be there - if you are trying to get E'foo' behaviour, that is purely for literals. All you are doing here is to prepend a literal 'E' to your value. cheers andrew