Re: invalid byte sequence for encoding "UTF8": 0xf481 - how could this happen? - Mailing list pgsql-admin
From | Rural Hunter |
---|---|
Subject | Re: invalid byte sequence for encoding "UTF8": 0xf481 - how could this happen? |
Date | |
Msg-id | 4F8C310B.6030007@gmail.com Whole thread Raw |
In response to | Re: invalid byte sequence for encoding "UTF8": 0xf481 - how could this happen? ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Responses |
Re: invalid byte sequence for encoding "UTF8": 0xf481 - how could this happen?
|
List | pgsql-admin |
Hi Albe, Yes, I'm using a Chinese tsearch2 plugin called nlpbamboo and the default default_text_search_config is 'chinesecfg'. I tested "select to_tsvector('english',content) from tmp_article" and this works. So it's the problem of nlpbamboo? The result of 'CAST (content AS bytea)' is shown below. btw, what's the meaning of "Please don't top post."? I don't understand but sorry if I caused any problem. \xe6a0b8e5bf83e68f90e7a4baefbc9ae69caae69da5e4b889e5b9b4e5b086e698afe694bfe5ba9ce99c80e6b182e9878ae694beefbc8ce8a18ce4b89ae9ab98e9809fe68890e995bfe79a84e697b6e69c9fefbc8ce585ac e58fb8e79baee5898de582a8e5a487e9a1b9e79baee4bc97e5a49aefbc8ce4b89ae7bba9e5ad98e59ca8e59091e4b88ae8b685e9a284e69c9fe79a84e58fafe883bdefbc8ce5bd93e5898de79a84e4bcb0e580bce6b0b4e5b 9b3e4b88de9ab98efbc8ce68c89e785a73230313220e5b9b43235efbd9e333020e5808de79a84502f45efbc8ce585ace58fb8e4bcb0e580bce58cbae997b4e59ca832322e3439efbd9e32362e393920e58583e58cbae997b4 efbc8ce8be83e79baee5898de882a1e4bbb7e7baa6e69c893425efbd9e323525e79a84e4b88ae58d87e7a9bae997b4efbc8ce7bbb4e68c81e2809ce68ea8e88d90e2809de79a84e68a95e8b584e8af84e7baa7e380820ae8b 083e7a094e58685e5aeb90ae8bf91e697a5efbc8ce58f82e58aa0e4ba86e585ace58fb8e7bb84e7bb87e79a84e68a95e8b584e88085e68ea5e5be85e6b4bbe58aa8efbc8ce4b88ee585ace58fb8e9ab98e7aea1e8bf9be8a1 8ce4ba86e4baa4e6b581e380820ae4b8bbe8a681e8a782e782b90ae2809ce58d81e4ba8cc2b7e4ba94e2809de69c9fe997b4e694bfe5ba9ce99c80e6b182e6988ee698bee5a29ee58aa0efbc8ce585ace58fb8e58aa0e9809 fe8b791e9a9ace59c88e59cb0e380820ae7bb8fe8bf87e58c97e4baace38081e4b88ae6b5b7e38081e69dade5b79ee38081e5ae81e6b3a2e7ad89e58588e8bf9be59f8ee5b882e794b5e5ad90e694bfe58aa1e5bbbae8aebe e79a84e7a4bae88c83e4bd9ce794a8efbc8ce59084e59cb0e694bfe5ba9ce68488e58f91e58585e58886e8aea4e8af86e588b0e695b0e5ad97e58c96e59f8ee5b882e7aea1e79086e7b3bbe7bb9fe5afb9e694bfe5ba9ce59 f8ee5b882e7aea1e79086e6b0b4e5b9b3e68f90e58d87e68980e8b5b7e588b0e79a84e9878de5a4a7e4bd9ce794a8efbc8ce8808ce99a8fe79d80e68890e58a9fe6a188e4be8be79a84e4b88de696ade7a7afe7b4afefbc8c e694bfe5ba9ce887aae8baabe5928ce585ace58fb8e5afb9e99c80e6b182e79a84e79086e8a7a3e4b99fe983bde69bb4e58aa0e6b7b1e585a5e38082e7bb8fe8bf873230313020e5b9b4e5928c3230313120e5b9b4e79a84e 694bfe5ba9ce68da2e5b18ae4b88ee8a784e58892e588b6e5ae9aefbc8c32303132efbd9e3230313420e5b9b4e698afe59cb0e696b9e694bfe5ba9ce2809ce58d81e4ba8cc2b7e4ba94e2809de5bbbae8aebee585b7e4bd93 e5ae9ee696bde5b9b4efbc8ce79baee5898de59bbde58685e7baa63236303020e4b8aae58ebfe7baa7e68896e4bba5e4b88ae79a84e5b882efbc88e58ebfefbc89e4b8adefbc8ce5b7b2e7bb8fe5bbbae8aebee4ba86e695b 0e5ad97e58c96e59f8ee5b882e7aea1e79086e7b3bbe7bb9fe79a84e4bb8533303020e69da5e5aeb6efbc8ce794b1e4ba8ee59cb0e696b9e694bfe5ba9ce59ca8e5bbbae8aebee696b9e99da2e79a84e68a95e585a5e585b7 e69c89e78886e58f91e5bc8fe79a84e789b9e782b9efbc8ce88083e89991e588b0e585ace58fb83230313220e5b9b4e4b880e5ada3e5baa6e4b89ae7bba9e79a84e5a4a7e5b985e5a29ee995bfefbc8ce79bb8e4bfa1e4bb8 ee4bb8ae5b9b4e5bc80e5a78be79a843320e5b9b4efbc8ce5b086e698afe695b4e4b8aae8a18ce4b89ae5bfabe9809fe58f91e5b195e79a84e697b6e69c9fe38082e585ace58fb8e698afe8a18ce4b89ae4b8ade79a84e9be 99e5a4b4e4bc81e4b89aefbc8ce5b882e59cbae58da0e69c89e78e87e59ca8363025e4bba5e4b88ae38082e8808ce4b894e585ace58fb8e59ca8e59084e4bba3e8a1a8e680a7e59f8ee5b882e79a84e9a1b9e79baee585b7e 69c89e6a0b7e69dbfe69588e5ba94efbc8ce58fafe4bba5e4b8bae585ace58fb8e5b8a6e69da5e69bb4e5a49ae79a84e694bfe5ba9ce5aea2e688b7efbc8ce5b9b6e4b894e5ae9ee78eb0e4ba86e99c80e6b182e79086e8a7 a32de99c80e6b182e5ae9ee78eb02de99c80e6b182e5bc95e5afbce79a84e889afe680a7e5beaae78eafe38082e59ba0e6ada4efbc8ce8aea4e4b8bae59ca8e69c80e8bf91e79a84e4b880e4b8a4e5b9b4efbc8ce585ace58 fb8e5b086e4bba5e9ab98e4ba8ee8a18ce4b89ae6b0b4e5b9b3e79a84e5a29ee9809fe8bf85e78c9be58f91e5b195efbc8ce4b99fe58db3e8bf9be585a5e8b791e9a9ace59c88e59cb0efbc8ce5b7a9e59bbae5b882e59cba e58da0e69c89e78e87e79a84e697b6e69c9fe38082e6ada4e5908eefbc8ce99a8fe79d80e694bfe5ba9ce5908ee7bbade68a95e585a5e58fafe883bde587bae78eb0e79a84e591a8e69c9fe680a7e6b3a2e58aa8efbc8ce58 5ace58fb8e9809ae8bf87e59084e9a1b9e4b89ae58aa1e7bb93e69e84e79a84e8b083e88a82e5ae9ee78eb0e5b9b3e7a8b3e8be83e5bfabe9809fe79a84e5a29ee995bfefbc8ce59ca8e8bf99e4b880e998b6e6aeb5efbc8c e9a284e8aea1e5a29ee9809fe887b3e5b091e4b99fe59ca8323025efbd9e333025e58cbae997b4e38082e5a29ee58aa0e7b3bbe7bb9fe99b86e68890e4b89ae58aa1e9878fefbc8ce689a9e5a4a7e694b6e585a5e8a784e6a 8a1e5b9b6e58aa0e5bcbae5aea2e688b7e7b298e680a7e7b297e795a5e4bcb0e7ae97efbc8ce79bb4e8be96e5b882e38081e79c81e4bc9ae7baa7e59f8ee5b882e38081e59cb0e7baa7e5b882efbc88e4b88de590abe79c81 e4bc9ae59f8ee5b882efbc89e38081e5b882e8be96e58cbae38081e58ebfe7baa7e5b882efbc88e58ebfefbc89e695b0e5ad97e58c96e59f8ee5b882e7aea1e79086e7b3bbe7bb9fe79a84e5bbbae8aebee8b4b9e794a8e7b aa6e4b8ba3530303020e4b887e58583e380813330303020e4b887e58583e380813235303020e4b887e58583e380813130303020e4b887e58583e5928c38303020e4b887e58583e38082e68c89e785a7e79baee5898de68891 e59bbde59f8ee5b882e695b0e9878fe8aea1e7ae97efbc8ce4b88de7ae97e5908ee69c9fe58d87e7baa7e7bbb4e68aa4e79a84e8b4b9e794a8efbc8ce695b4e4b8aae5b882e59cbae8a784e6a8a1e59ca832353020e4babfe 4bba5e4b88ae38082e9a284e8aea1e887b33230313520e5b9b4e5ba95efbc8ce695b0e5ad97e58c96e59f8ee5b882e7aea1e79086e7b3bbe7bb9fe5b086e887b3e5b091e8a686e79b96e585a8e983a8e59cb0e7baa7e4bba5 e4b88ae59f8ee5b882e38081333025e79a84e5b882e8be96e58cbae5928c313025e79a84e58ebfe7baa7e5b88228e58ebf29e38082e784b6e8808ce79bb8e5afb9e4ba8ee5a682e6ada4e5a4a7e79a84e5b882e59cbaefbc8 ce585ace58fb8e8bdafe4bbb6e4baa7e59381e79a84e4bbbde9a29de4b990e8a782e4bcb0e8aea1e4bb85e59ca8333025e79a84e6af94e4be8be38082e794b1e4ba8ee694bfe5ba9ce8aea2e58d95e69bb4e5a49ae580bee5 9091e4ba8ee4bba5e7a1ace4bbb6e98787e8b4ade4b8bae789b9e782b9e79a84e7b3bbe7bb9fe99b86e68890efbc8ce585ace58fb8e58f82e4b88ee7b3bbe7bb9fe99b86e68890e4b89ae58aa1e4b880e696b9e99da2e58fa fe4bba5e9809ae8bf87e68f90e4be9be695b4e4bd93e8a7a3e586b3e696b9e6a188e68f90e58d87e7ab9ee4ba89e4bc98e58abfefbc8ce5a29ee58aa0e4b8ade6a087e6a682e78e87efbc8ce4bb8ee8808ce689a9e5a4a7e6 94b6e585a5e8a784e6a8a1efbc8ce58fa6e4b880e696b9e99da2e4b99fe58fafe4bba5e5a29ee58aa0e5aea2e688b7e7b298e680a7efbc8ce7a8b3e5ae9ae69caae69da5e79a84e5a29ee995bfe38082e59ba0e6ada4efbc8 ce9a284e8aea1e585ace58fb8e69caae69da5e4bc9ae58aa0e5bcbae7b3bbe7bb9fe99b86e68890e4b89ae58aa1efbc8ce69bb4e58aa0e58585e58886e79a84e4baabe58f97e5b882e59cbae79a84e5a29ee995bfe380820a e695b0e68daee699aee69fa5e4b88ee4bfa1e681afe98787e99b86e58db3e5b086e78886e58f91efbc8ce69caae69da5e5b086e68890e9878de8a681e588a9e6b6a6e69da5e6ba90e4b98be4b880e38082e887aa323031302 0e5b9b4e5bc80e5a78befbc8ce585ace58fb8e5bc80e5a78be4b8bae694bfe5ba9ce79bb8e585b3e983a8e997a8e68f90e4be9be695b0e5ad97e58c96e59f8ee5b882e7aea1e79086e7b3bbe7bb9fe5bbbae8aebee5928ce8 bf90e890a5e4b8ade79a84e79bb8e585b3e695b0e68daee699aee69fa5e5928ce4bfa1e681afe98787e99b86e69c8de58aa1efbc8ce5b9b6e4b8bae6ada4e68a95e585a53430303020e4bd99e4b887e58583e7949fe4baa73 13020e58fb0e59fbae4ba8e4c6164796275673320e79a84e8bda6e8bdbde6bf80e58589e689abe68f8fe7b3bbe7bb9fe5928c3320e58fb0e4bebfe690bae5bc8fe6bf80e58589e689abe68f8fe7b3bbe7bb9fe38082323031 3120e5b9b4e5b7b2e5ae8ce688903220e58fb0e8bda6efbc8ce9a284e8aea13230313220e5b9b4e4b88ae58d8ae5b9b4e69c8934efbd9e3520e58fb0e8bda6e58fafe68a95e585a53320e7bbb4e5ae9ee699afe6b58be9878 fe79a84e5ba94e794a8e38082e5afb9e4ba8ee8bf99e4b880e696b0e79a84e5b882e59cbaefbc8ce6ada6e6b189e7ab8be5be97e7a9bae997b4e4bd9ce4b8bae58588e8a18ce88085efbc8ce5b882e59cbae58da0e69c89e7 8e87e8bebe363025e5b7a6e58fb3efbc8ce585ace58fb8e58899e4b88de588b0333025efbc8ce4bd86e585ace58fb8e794b1e4ba8ee58fafe68f90e4be9be59fbae4ba8ee695b0e5ad97e59f8ee5b882e7aea1e79086e7b3b be7bb9fe79a84e4b880e4bd93e58c96e8a7a3e586b3e696b9e6a188efbc8ce4b894e59ca8e6b58be9878fe7b2bee5baa6e4b88ae79bb8e6af94e5afb9e6898be585b7e69c89e7ab9ee4ba89e4bc98e58abfefbc8ce9a284e8 aea1e8b685e8b68ae5afb9e6898be68890e4b8bae9be99e5a4b4e4bc81e4b89ae5b086e698afe5be88e5bfabe4bc9ae58f91e7949fe79a84e4ba8be68385e38082e9a284e8aea13230313220e5b9b4e8afa5e4b89ae58aa1e 8bf9be585a5e78886e58f91e69c9fefbc8ce4b894e69caae69da533efbd9e3520e5b9b4e58fafe4bf9de68c81353025e5b7a6e58fb3e79a84e5b9b4e59d87e5a48de59088e5a29ee995bfe380820ae4babae58a9be68890e6 9cace4b88ae58d87e58e8be58a9be69c89e68980e7bc93e8a7a3efbc8ce5a4a7e9a29de9a1b9e79baee6918ae4bd8ee7a094e58f91e68890e69cace38082e794b1e4ba8ee8bf91e4b8a4e5b9b4e9809ae88380e78e87e9ab9 8e4bc81efbc8ce4bba5e58f8ae7a7bbe58aa8e4ba92e88194e7bd91e7ad89e696b0e585b4495420e7bb86e58886e8a18ce4b89ae79a84e585b4e8b5b7efbc8ce5afb9e4bca0e7bb9fe8bdafe4bbb6e4bc81e4b89ae980a0e6 8890e4babae58a9be68890e69cace4b88ae6b6a8e79a84e5b7a8e5a4a7e58e8be58a9befbc8ce585ace58fb8e4babae58a9be68890e69cace4b99fe5a4a7e5b985e4b88ae58d87efbc8ce4bcb0e7ae97e585b6e5a29ee5b98 5e887b3e5b091e59ca8333025e4bba5e4b88ae38082e8808ce99a8fe79d80e5ae8fe8a782e7bb8fe6b58ee5bda2e58abfe79a84e58f98e58c96efbc8ce4bba5e58f8ae585ace58fb8e6ada6e6b189e7a094e58f91e4b8ade5 bf83e79a84e5bbbae8aebee5ae8ce68890efbc8ce9a284e8aea13230313220e5b9b4e4babae58a9be68890e69cace4b88ae58d87e79a84e58e8be58a9be5b086e5be97e588b0e8be83e5a4a7e7bc93e8a7a3e38082e8808ce 794b1e4ba8ee585ace58fb8e8bdafe4bbb6e7b3bbe7bb9fe79a84e4baa7e59381e58c96e7a88be5baa6e8be83e9ab98efbc8ce7a094e58f91e4b88ee5ae9ee696bde79a84e880a6e59088e5baa6e8be83e69dbeefbc8ce99a 8fe79d80e59088e5908ce9a29de79a84e4b88ae58d87efbc8ce8bf99e4b880e4bc98e58abfe5b086e68488e58f91e6988ee698bee79a84e6918ae4bd8ee7a094e58f91e68890e69cace38082e5b0bde7aea1e7b3bbe7bb9fe 99b86e68890e9a1b9e79baee5b086e69c89e68980e5a29ee58aa0efbc8ce4bd86e7bbbce59088e4b88ae8bfb0e4b8a4e696b9e99da2e59ba0e7b4a0efbc8ce4bf9de5ae88e4bcb0e8aea1efbc8ce585ace58fb8e6af9be588 a9e78e87e6b0b4e5b9b3e887b3e5b091e58fafe4bba5e7bb93e69d9fe4b8a4e5b9b4e69da5e79a84e4b88be9998de8b68be58abfe8808ce7a8b3e5ae9ae59ca8363025e4bba5e4b88ae380820af48186bff483bfb2f48393b ff4828dbe203ff484998ff4848ebbf4839583f480aabcf481adaff4838686f4819997203ff4838d90f482b2b4f4848d8ff484b692f48196a1f480abbd0ae69cace69687e794b1e882a1e7a5a8e68a95e8b584e5b7a5e585b7 3ae9b9b0e79cbce4b8aae882a1e699bae883bde8b79fe8b8aae7b3bbe7bb9fe68f90e4be9befbc8ce6aca2e8bf8ee8bdace8bdbdefbc8ce8afb7e6b3a8e6988ee587bae5a484e380820ae79bb8e585b3e696b0e997bbefbc9 a0ae8b4a3e4bbbbe7bc96e8be91efbc9ae5bb96e6b3bde5878c 于 2012/4/16 21:34, Albe Laurenz 写道: > Please don't top post. > > Rural Hunter wrote: >>>> My db is in utf-8, I have a row in my table say tmp_article and I wanted >>>> to generate ts_vector from the article content: >>>> select to_tsvector(content) from tmp_article; >>>> But I got this error: >>>> ERROR: invalid byte sequence for encoding "UTF8": 0xf481 >>>> >>>> I am wondering how this could happen. I think if there was invalid UTF8 >>>> bytes in the content, it shouldn't have been able to inserted into the >>>> tmp_article table as I sometimes see similar errors when inserting >>>> records to tmp_article. Am I right? >>> You are right in theory. A lot depends on your PostgreSQL version, >>> because >>> the efforts to prevent invalid strings from entering the database have >>> led to changes over the versions. Older versions are more permissive. >>> >>> To test the theory that the contents of the table are bad, you can >>> test if the same happens if you >>> >>> SELECT convert_to(content, 'UTF8') FROM tmp_article; >> =>SELECT convert_to(content, 'UTF8') FROM tmp_article; >> This works. My pg is at latest 9.1.3 on ubuntu 10.04 server. We have >> millions of data in the db but this is the only one we met the problem. >> The bad data is inserted in recent days and we upgraded to 9.1.3 right >> after it was released. > Hmm, that is strange. > Could you find the row that causes the problem and post the > result of "CAST (content AS bytea)" for this row? > > Do you use any nonstandard text search parsers or dictionaries? > What is the text search configuration you use (parameter > default_text_search_config)? > > Yours, > Laurenz Albe
pgsql-admin by date: