Thread: invalid multibyte character for locale
Hi, I am testdriving postgresql 8.0.1. On importing my old dump which works fine on 7.4.6 I am getting errors like this one: CREATE INDEX foobar_uvalue_key ON foobar USING btree (upper((value)::text)); ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. pg_controldatas output (this is an EM64T machine (debian pure64), but the issue also comes on 32bit Linux): pg_control version number: 74 Catalog version number: 200411041 Database system identifier: 4763589314759867390 Database cluster state: in production pg_control last modified: Wed Feb 23 22:07:43 2005 Current log file ID: 1 Next log file segment: 214 Latest checkpoint location: 1/D56163E8 Prior checkpoint location: 1/D56163A0 Latest checkpoint's REDO location: 1/D56163E8 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 3704 Latest checkpoint's NextOID: 50861184 Time of latest checkpoint: Wed Feb 23 21:40:25 2005 Database block size: 8192 Blocks per segment of large relation: 131072 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: C LC_CTYPE: C Should I be using de_DE or de_DE.UTF_8 locale? C locale always worked fine for us (with 7.3 and 7.4). Any hints? Regards, Bjoern
I forgot to say that I am using UNICODE for database encoding (initdb -E UNICODE and createdb -E UNICODE). Bjoern Metzdorf wrote: > Hi, > > I am testdriving postgresql 8.0.1. > > On importing my old dump which works fine on 7.4.6 I am getting errors > like this one: > > CREATE INDEX foobar_uvalue_key ON foobar USING btree > (upper((value)::text)); > ERROR: invalid multibyte character for locale > HINT: The server's LC_CTYPE locale is probably incompatible with the > database encoding. > > pg_controldatas output (this is an EM64T machine (debian pure64), but > the issue also comes on 32bit Linux): > > pg_control version number: 74 > Catalog version number: 200411041 > Database system identifier: 4763589314759867390 > Database cluster state: in production > pg_control last modified: Wed Feb 23 22:07:43 2005 > Current log file ID: 1 > Next log file segment: 214 > Latest checkpoint location: 1/D56163E8 > Prior checkpoint location: 1/D56163A0 > Latest checkpoint's REDO location: 1/D56163E8 > Latest checkpoint's UNDO location: 0/0 > Latest checkpoint's TimeLineID: 1 > Latest checkpoint's NextXID: 3704 > Latest checkpoint's NextOID: 50861184 > Time of latest checkpoint: Wed Feb 23 21:40:25 2005 > Database block size: 8192 > Blocks per segment of large relation: 131072 > Bytes per WAL segment: 16777216 > Maximum length of identifiers: 64 > Maximum number of function arguments: 32 > Date/time type storage: floating-point numbers > Maximum length of locale name: 128 > LC_COLLATE: C > LC_CTYPE: C > > Should I be using de_DE or de_DE.UTF_8 locale? C locale always worked > fine for us (with 7.3 and 7.4). > > Any hints? > > Regards, > Bjoern >
Bjoern Metzdorf <bm@turtle-entertainment.de> writes: > CREATE INDEX foobar_uvalue_key ON foobar USING btree > (upper((value)::text)); > ERROR: invalid multibyte character for locale > HINT: The server's LC_CTYPE locale is probably incompatible with the > database encoding. > I forgot to say that I am using UNICODE for database encoding (initdb -E > UNICODE and createdb -E UNICODE). Well, in that case your encoding is indeed at variance with your locale setting ;-). Perhaps you should declare the encoding as SQL_ASCII. The immediate problem is that mbstowcs() is being called and it evidently doesn't know what to do in C locale. SQL_ASCII (or any single-byte encoding) would bypass this code path and avoid the error. regards, tom lane
Tom Lane wrote: > Well, in that case your encoding is indeed at variance with your locale > setting ;-). Perhaps you should declare the encoding as SQL_ASCII. > The immediate problem is that mbstowcs() is being called and it > evidently doesn't know what to do in C locale. SQL_ASCII (or any > single-byte encoding) would bypass this code path and avoid the > error. The problem is that we need UNICODE encoding in our database. We make heavy use of UTF-8 (our website is multilingual). I now tried with locale set to de_DE.UTF-8, but CREATE INDEX foobar_uvalue_key ON foobar USING btree (upper((value)::text)); still gives me ERROR: invalid multibyte character for locale. With 7.3 and 7.4 this is working fine. May I assume that 7.3 and 7.4 behaviour is buggy? Is 8.0 just stricter or is this just a side effect of your fix for multibyte upper/lower problem for locale != C? If 7.3 and 7.4 behaviour is intended, is there a way to let 8.0 behave the same? Regards, Bjoern
Bjoern Metzdorf <bm@turtle-entertainment.de> writes: > Is 8.0 just stricter or is this just a side effect > of your fix for multibyte upper/lower problem for locale != C? Both those statements are true. > If 7.3 and 7.4 behaviour is intended, is there a way to let 8.0 behave > the same? I don't know what behavior you thought you were getting from upper/lower on UTF-8 data in 7.4, but it was surely not correct. If you want to duplicate that misbehavior, try SQL_ASCII with C locale. This does not stop you from storing UTF-8 in your database, mind you --- it just loses validation of encoding sequences and conversion to other schemes. But having said that, upper() should work if the locale matches the encoding. You might take the trouble to trace down exactly what data value it's barfing on. regards, tom lane
I ended up using SQL_ASCHII and could see lower works fine but on extended chars it does nothing. SO if that's what your shooting for it worked ok for me. I would have preferred to have my encoding Unicode , but don't want to figure out why the odbc driver was not able to insert extended chars. I could insert the extended chars fine in pgadmin and also with .net driver with a encoding = Unicode, but we have lots of asp that will use the odbc driver so I had to go with SQL_ASCHII to avoid getting errors. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Tom Lane Sent: Thursday, February 24, 2005 10:16 AM To: Bjoern Metzdorf Cc: Pgsql-Admin (E-mail) Subject: Re: [ADMIN] invalid multibyte character for locale Bjoern Metzdorf <bm@turtle-entertainment.de> writes: > Is 8.0 just stricter or is this just a side effect > of your fix for multibyte upper/lower problem for locale != C? Both those statements are true. > If 7.3 and 7.4 behaviour is intended, is there a way to let 8.0 behave > the same? I don't know what behavior you thought you were getting from upper/lower on UTF-8 data in 7.4, but it was surely not correct. If you want to duplicate that misbehavior, try SQL_ASCII with C locale. This does not stop you from storing UTF-8 in your database, mind you --- it just loses validation of encoding sequences and conversion to other schemes. But having said that, upper() should work if the locale matches the encoding. You might take the trouble to trace down exactly what data value it's barfing on. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
Tom Lane wrote: > I don't know what behavior you thought you were getting from upper/lower > on UTF-8 data in 7.4, but it was surely not correct. If you want to > duplicate that misbehavior, try SQL_ASCII with C locale. This does not > stop you from storing UTF-8 in your database, mind you --- it just > loses validation of encoding sequences and conversion to other schemes. > But having said that, upper() should work if the locale matches the > encoding. You might take the trouble to trace down exactly what data > value it's barfing on. I want to keep UNICODE encoding in any case. So you say, that 7.x just did not cope at all with multibyte chars and upper() and lower() spit out what the C functions toupper and tolower spit out? I also want to stay with locale C, because of the speed. I have different languages, not only one specific, so changing the locale would not help at all. I assume I could just remove #define USE_WIDE_UPPER_LOWER from oracle_compat.c to emulate the old behaviour. But a cleaner fix would be to check if we are using UNICODE and locale is C or POSIX and only then skip USE_WIDE_UPPER_LOWER. Comments? Regards, Bjoern
Here is my problem. I have a function that is triggered on insert. For simplicity's sake, lets say the function looks like this: CREATE OR REPLACE FUNCTION dostuff_on_insert() RETURNS TRIGGER AS ' DECLARE lockrows RECORD; BEGIN select into lockrows * from table1 where pkey_id = NEW.pkey_id for update on table1; update table1 set active = false where NEW.pkey_id = pkey_id and active; NEW.active := true; END; 'language 'plpgsql'; I have two inserts, lets say insert A and insert B. A new explicit transaction block is started with the intent of executing insert A. begin; insert into table1 (stuff) VALUES (morestuff); At this time another terminal is opened up and insert B is executed in the same fasion: begin; insert into table1 (stuff) VALUES (different_more_stuff); In my two open terminals insert A has completed and insert B is waiting for insert A's transaction to be committed, before it can move on. I commit insert A and check to see how many active row's I have for that ID (there should be 1, the new row). commit; select * from table1; I find that there is one active row. Everything is fine at this point. Now, I commit insert B, that has just finished, because insert A has been committed. I expect to see 1 active row, because the update contained in the function has not been executed, and has therefore not grabbed a snapshot of the table yet. I expect that the new row from insert A will be updated as well. commit; select * from table1; To my surprise, I see 2 active rows. What i'm assuming is happening with the transaction must be flawed. Does the function handle a transaction outside of the one the insert is using? Just trying to figure out what exactly is going on and why. Thanks in advance for the insight. If it would be easier to understand by having me paste what is happening directly from the terminals, let me know. Kris
What transaction level are you using? Evgeny. Kris Kiger wrote: > Here is my problem. I have a function that is triggered on insert. For > simplicity's sake, lets say the function looks like this: > > CREATE OR REPLACE FUNCTION dostuff_on_insert() RETURNS TRIGGER AS ' > DECLARE lockrows RECORD; > BEGIN > select into lockrows * from table1 where pkey_id = NEW.pkey_id for > update on table1; > update table1 set active = false where NEW.pkey_id = pkey_id and active; > NEW.active := true; > END; > 'language 'plpgsql'; > > I have two inserts, lets say insert A and insert B. A new explicit > transaction block is started with the intent of executing insert A. > > begin; > insert into table1 (stuff) VALUES (morestuff); > > > At this time another terminal is opened up and insert B is executed in > the same fasion: > > begin; > insert into table1 (stuff) VALUES (different_more_stuff); > > In my two open terminals insert A has completed and insert B is waiting > for insert A's transaction to be committed, before it can move on. I > commit insert A and check to see how many active row's I have for that > ID (there should be 1, the new row). > > commit; > select * from table1; > > I find that there is one active row. Everything is fine at this point. > Now, I commit insert B, that has just finished, because insert A has > been committed. I expect to see 1 active row, because the update > contained in the function has not been executed, and has therefore not > grabbed a snapshot of the table yet. I expect that the new row from > insert A will be updated as well. > > commit; > select * from table1; > > To my surprise, I see 2 active rows. What i'm assuming is happening > with the transaction must be flawed. Does the function handle a > transaction outside of the one the insert is using? Just trying to > figure out what exactly is going on and why. > Thanks in advance for the insight. If it would be easier to understand > by having me paste what is happening directly from the terminals, let me > know. > > Kris > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
transaction_isolation ----------------------- read committed Running Postgres 7.4 btw Kris Tsirkin Evgeny wrote: > > What transaction level are you using? > Evgeny. > Kris Kiger wrote: > >> Here is my problem. I have a function that is triggered on insert. >> For simplicity's sake, lets say the function looks like this: >> >> CREATE OR REPLACE FUNCTION dostuff_on_insert() RETURNS TRIGGER AS ' >> DECLARE lockrows RECORD; >> BEGIN >> select into lockrows * from table1 where pkey_id = NEW.pkey_id for >> update on table1; >> update table1 set active = false where NEW.pkey_id = pkey_id and >> active; >> NEW.active := true; >> END; >> 'language 'plpgsql'; >> >> I have two inserts, lets say insert A and insert B. A new explicit >> transaction block is started with the intent of executing insert A. >> >> begin; >> insert into table1 (stuff) VALUES (morestuff); >> >> >> At this time another terminal is opened up and insert B is executed >> in the same fasion: >> >> begin; >> insert into table1 (stuff) VALUES (different_more_stuff); >> >> In my two open terminals insert A has completed and insert B is >> waiting for insert A's transaction to be committed, before it can >> move on. I commit insert A and check to see how many active row's I >> have for that ID (there should be 1, the new row). >> >> commit; >> select * from table1; >> >> I find that there is one active row. Everything is fine at this >> point. Now, I commit insert B, that has just finished, because >> insert A has been committed. I expect to see 1 active row, because >> the update contained in the function has not been executed, and has >> therefore not grabbed a snapshot of the table yet. I expect that the >> new row from insert A will be updated as well. >> >> commit; >> select * from table1; >> >> To my surprise, I see 2 active rows. What i'm assuming is happening >> with the transaction must be flawed. Does the function handle a >> transaction outside of the one the insert is using? Just trying to >> figure out what exactly is going on and why. >> Thanks in advance for the insight. If it would be easier to >> understand by having me paste what is happening directly from the >> terminals, let me know. >> >> Kris >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Don't 'kill -9' the postmaster > >
I guess first we should understand why the insert B waits at all,the insert A did not commit ,right ,then how did it found any pkey_id = NEW.pkey_id? That means you have already had those while starting your experiment. So ,insert B wait for those "old" rows not for your insert (i mean an INSERT) to commit.Once the A function commits the old rows are released but the INSERT is not yet done!it will take place only now when the trigger of A is done. This means that you have transaction in a wrong place - place it around the insert not inside the trigger and commit AFTER the insert . All this is an assumption only ,not realy sure if i am right. Evgeny Kris Kiger wrote: > transaction_isolation > ----------------------- > read committed > > Running Postgres 7.4 btw > > Kris > > Tsirkin Evgeny wrote: > >> >> What transaction level are you using? >> Evgeny. >> Kris Kiger wrote: >> >>> Here is my problem. I have a function that is triggered on insert. >>> For simplicity's sake, lets say the function looks like this: >>> >>> CREATE OR REPLACE FUNCTION dostuff_on_insert() RETURNS TRIGGER AS ' >>> DECLARE lockrows RECORD; >>> BEGIN >>> select into lockrows * from table1 where pkey_id = NEW.pkey_id for >>> update on table1; >>> update table1 set active = false where NEW.pkey_id = pkey_id and >>> active; >>> NEW.active := true; >>> END; >>> 'language 'plpgsql'; >>> >>> I have two inserts, lets say insert A and insert B. A new explicit >>> transaction block is started with the intent of executing insert A. >>> >>> begin; >>> insert into table1 (stuff) VALUES (morestuff); >>> >>> >>> At this time another terminal is opened up and insert B is executed >>> in the same fasion: >>> >>> begin; >>> insert into table1 (stuff) VALUES (different_more_stuff); >>> >>> In my two open terminals insert A has completed and insert B is >>> waiting for insert A's transaction to be committed, before it can >>> move on. I commit insert A and check to see how many active row's I >>> have for that ID (there should be 1, the new row). >>> >>> commit; >>> select * from table1; >>> >>> I find that there is one active row. Everything is fine at this >>> point. Now, I commit insert B, that has just finished, because >>> insert A has been committed. I expect to see 1 active row, because >>> the update contained in the function has not been executed, and has >>> therefore not grabbed a snapshot of the table yet. I expect that the >>> new row from insert A will be updated as well. >>> >>> commit; >>> select * from table1; >>> >>> To my surprise, I see 2 active rows. What i'm assuming is happening >>> with the transaction must be flawed. Does the function handle a >>> transaction outside of the one the insert is using? Just trying to >>> figure out what exactly is going on and why. >>> Thanks in advance for the insight. If it would be easier to >>> understand by having me paste what is happening directly from the >>> terminals, let me know. >>> >>> Kris >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Don't 'kill -9' the postmaster >> >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Kris Kiger <kris@musicrebellion.com> writes: > Here is my problem. I have a function that is triggered on insert. For > simplicity's sake, lets say the function looks like this: > CREATE OR REPLACE FUNCTION dostuff_on_insert() RETURNS TRIGGER AS ' > DECLARE lockrows RECORD; > BEGIN > select into lockrows * from table1 where pkey_id = NEW.pkey_id for > update on table1; > update table1 set active = false where NEW.pkey_id = pkey_id and active; > NEW.active := true; > END; > 'language 'plpgsql'; This is awfully vague. What table is the trigger placed on? (If table1 itself, seems like there are more efficient ways to do this.) What events is the trigger fired for, and is it BEFORE or AFTER? regards, tom lane
Hmm.. I was trying simplify my function to get the point across with minimal confusion. If you don't think there is enough detail, let me know what is lacking and I will add the appropriate detail. The function is executed BEFORE insert on table1. Thanks again for the help all Kris Tom Lane wrote: >Kris Kiger <kris@musicrebellion.com> writes: > > >>Here is my problem. I have a function that is triggered on insert. For >>simplicity's sake, lets say the function looks like this: >> >> > > > >>CREATE OR REPLACE FUNCTION dostuff_on_insert() RETURNS TRIGGER AS ' >>DECLARE lockrows RECORD; >>BEGIN >> select into lockrows * from table1 where pkey_id = NEW.pkey_id for >>update on table1; >> update table1 set active = false where NEW.pkey_id = pkey_id and active; >> NEW.active := true; >>END; >>'language 'plpgsql'; >> >> > >This is awfully vague. What table is the trigger placed on? (If table1 >itself, seems like there are more efficient ways to do this.) What >events is the trigger fired for, and is it BEFORE or AFTER? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > >
Kris Kiger <kris@musicrebellion.com> writes: > Hmm.. I was trying simplify my function to get the point across with > minimal confusion. If you don't think there is enough detail, let me > know what is lacking and I will add the appropriate detail. The > function is executed BEFORE insert on table1. Mmm. This might work as you expect in 8.0, but it surely won't in any prior release, because before 8.0 we didn't advance the transaction snapshot between statements of a function. Another issue is that your SELECT FOR UPDATE locks only one of the rows having the target pkey_id (I assume that column is misnamed and isn't actually a primary key?). If it happened to seize on a non-active row then it might not accomplish your goal of blocking until other updaters of the same row set commit. That would allow the UPDATE to start and set its snapshot, which would mean you lose because it wouldn't see the newly inserted row from the other transaction. Even more to the point, if there aren't yet any committed rows at all of the target pkey_id, there is nothing for the SELECT FOR UPDATE to block on at all. You could fix the first problem in various ways but I see no way around this one. Unless you can guarantee that there will always be a suitable row already in existence, I think you have to abandon the idea of using a SELECT FOR UPDATE for locking. One possibility is to create a unique partial index: CREATE UNIQUE INDEX foo ON table1(pkey_id) WHERE active; This will provide an enforcement that you don't have more than one active row at a time. Now you just simplify the trigger to update table1 set active = false where NEW.pkey_id = pkey_id and active; NEW.active := true; Race conditions will end up causing unique-key errors, which you can just retry. regards, tom lane
In your second paragraph, I think that you are saying that SELECT FOR UPDATE only locks one row, even though the select itself may return many. Am I mis-interpreting you? Also, what do you mean by seizing on a non-active row? Your assumption about pkey_id is right, I meant for that to mean partial key, bad naming on my part ;-). In my case, the third paragraph applies to this situation, because I can assume that there will always be an entry in the table that will be active with that pkey_id (lets call this partialKey_id from now on, to avoid further confusion). The alternative you offer is a good idea, I didn't realize that I had the option to create a unique index on a subset of data within the table. Unfortunately, it will not work in this situation. I don't have the option to report failure to the front-end application. I suppose, i'm looking for a method to only allow one invocation of this function, per partialKey_id, at a time. If you have any other alternatives or suggestions, I'm all ears, err eyes... Anyway, thank you ;-) Kris Tom Lane wrote: >Mmm. This might work as you expect in 8.0, but it surely won't in any >prior release, because before 8.0 we didn't advance the transaction >snapshot between statements of a function. > >Another issue is that your SELECT FOR UPDATE locks only one of the >rows having the target pkey_id (I assume that column is misnamed and >isn't actually a primary key?). If it happened to seize on a non-active >row then it might not accomplish your goal of blocking until other >updaters of the same row set commit. That would allow the UPDATE to >start and set its snapshot, which would mean you lose because it >wouldn't see the newly inserted row from the other transaction. > >Even more to the point, if there aren't yet any committed rows at all of >the target pkey_id, there is nothing for the SELECT FOR UPDATE to block >on at all. You could fix the first problem in various ways but I see no >way around this one. Unless you can guarantee that there will always be >a suitable row already in existence, I think you have to abandon the >idea of using a SELECT FOR UPDATE for locking. > >One possibility is to create a unique partial index: > >CREATE UNIQUE INDEX foo ON table1(pkey_id) WHERE active ; > >This will provide an enforcement that you don't have more than one >active row at a time. Now you just simplify the trigger to > update table1 set active = false where NEW.pkey_id = pkey_id and active; > NEW.active := true; >Race conditions will end up causing unique-key errors, which you can just >retry. > > regards, tom lane > >
Kris Kiger <kris@musicrebellion.com> writes: > In your second paragraph, I think that you are saying that SELECT FOR > UPDATE only locks one row, even though the select itself may return > many. Am I mis-interpreting you? No, I'm saying that plpgsql's SELECT INTO operation only reads one row. The fact that the SELECT might have found more rows if allowed to run to completion doesn't enter into it. If the first row read doesn't have active = true then it won't conflict against concurrent UPDATEs, because you are carefully not UPDATEing rows with active = false. It's the combination of those two things that creates the hazard. regards, tom lane
Interesting. That makes sense, though. So, is there a good way to lock a set of rows using SELECT FOR UPDATE in plpgsql? I assume using PERFORM would yield the same problem, because it immediately discards the results. Thanks! Kris Tom Lane wrote: >Kris Kiger <kris@musicrebellion.com> writes: > > >>In your second paragraph, I think that you are saying that SELECT FOR >>UPDATE only locks one row, even though the select itself may return >>many. Am I mis-interpreting you? >> >> > >No, I'm saying that plpgsql's SELECT INTO operation only reads one row. >The fact that the SELECT might have found more rows if allowed to run >to completion doesn't enter into it. If the first row read doesn't have >active = true then it won't conflict against concurrent UPDATEs, because >you are carefully not UPDATEing rows with active = false. It's the >combination of those two things that creates the hazard. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
Kris Kiger <kris@musicrebellion.com> writes: > Interesting. That makes sense, though. So, is there a good way to lock > a set of rows using SELECT FOR UPDATE in plpgsql? I assume using > PERFORM would yield the same problem, because it immediately discards > the results. I think PERFORM would work. The fact that plpgsql ignores the results doesn't mean you don't have lock on the rows. regards, tom lane