Thread: LATIN9 - hex in varchar after convert
Postgresql 11.7 - client/server encoding LATIN9
Good morning,
I have made a couple of stupid mistakes.
I have a table with a varchar(5000) that contains general text. The table is typically maintained via a GUI, but on this occasion I received a spreadsheet with data and loaded it - via copy - from a csv extracted from that. The data looked fine in psql, but when looking at the data in the GUI, characters such as single quote marks (') appeared as a series of special characters. I assumed that the spreadsheet then had some different encoding (UTF8?) and that I then needed to 'translate' the characters.
I saw there was a convert_to/from function in postgresql, and thought I'd try that to see what it would do, but (the two mistakes) I didn't start a transaction and where I had intended to run the command on one row, I had a brain freeze and ran the update across the table.
The command was:
update gamespubquiz set question = convert_to(question,'LATIN9');
What I now appear to have in the question column is the hex equivalent of what was there originally.
Two questions:
1) How can I convert the hext back to 'text'?
2) How can I convert the (what appears to be) incorrect coding of the single quote (')?
Thanks in advance,
Steve T
On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote:
Postgresql 11.7 - client/server encoding LATIN9Good morning,I have made a couple of stupid mistakes.I have a table with a varchar(5000) that contains general text. The table is typically maintained via a GUI, but on this occasion I received a spreadsheet with data and loaded it - via copy - from a csv extracted from that. The data looked fine in psql, but when looking at the data in the GUI, characters such as single quote marks (') appeared as a series of special characters. I assumed that the spreadsheet then had some different encoding (UTF8?) and that I then needed to 'translate' the characters.I saw there was a convert_to/from function in postgresql, and thought I'd try that to see what it would do, but (the two mistakes) I didn't start a transaction and where I had intended to run the command on one row, I had a brain freeze and ran the update across the table.The command was:update gamespubquiz set question = convert_to(question,'LATIN9');What I now appear to have in the question column is the hex equivalent of what was there originally.Two questions:1) How can I convert the hext back to 'text'?2) How can I convert the (what appears to be) incorrect coding of the single quote (')?Thanks in advance,Steve T
...for example ....as taken from psql:
Original text: ‘With the lights out, it’s less dangerous. Here we are now, entertain us.’
After convert: \xe280985769746820746865206c6967687473206f75742c206974e2809973206c6573732064616e6765726f75732e204865726520776520617265206e6f772c20656e7465727461696e2075732ee28099
PS apologies for any inconvenience for posting with 'read receipt' request turned on earlier.
"Steve Tucknott (TuSol)" <steve@tusol.co.uk> writes: > I saw there was a convert_to/from function in postgresql, and thought > I'd try that to see what it would do, but (the two mistakes) I didn't > start a transaction and where I had intended to run the command on one > row, I had a brain freeze and ran the update across the table. > The command was: > update gamespubquiz set question = convert_to(question,'LATIN9'); Ooops. > Two questions: > 1) How can I convert the hext back to 'text'? > 2) How can I convert the (what appears to be) incorrect coding of the > single quote (')? Don't have an answer for (2) offhand, but you can undo (1). What bit you is that there's an implicit cast from text to bytea, but not vice versa. regression=# create table foo (f1 text); CREATE TABLE regression=# insert into foo values ('this is a test'); INSERT 0 1 regression=# update foo set f1 = convert_to(f1,'LATIN9'); UPDATE 1 regression=# table foo; f1 -------------------------------- \x7468697320697320612074657374 (1 row) You can't just apply convert_from: regression=# update foo set f1 = convert_from(f1,'LATIN9'); ERROR: function convert_from(text, unknown) does not exist LINE 1: update foo set f1 = convert_from(f1,'LATIN9'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. but forcing it with a cast works: regression=# update foo set f1 = convert_from(f1::bytea,'LATIN9'); UPDATE 1 regression=# table foo; f1 ---------------- this is a test (1 row) As for (2), maybe replace() would help you. regards, tom lane
On Sat, 2020-04-25 at 10:35 -0400, Tom Lane wrote:
"Steve Tucknott (TuSol)" <steve@tusol.co.uk> writes:I saw there was a convert_to/from function in postgresql, and thoughtI'd try that to see what it would do, but (the two mistakes) I didn'tstart a transaction and where I had intended to run the command on onerow, I had a brain freeze and ran the update across the table.The command was:update gamespubquiz set question = convert_to(question,'LATIN9');Ooops.Two questions:1) How can I convert the hext back to 'text'?regression=# update foo set f1 = convert_from(f1::bytea,'LATIN9');UPDATE 1regression=# table foo;f1----------------this is a test(1 row)As for (2), maybe replace() would help you.regards, tom lane
Tom
Thanks. I tried the convert from and got the unknown function error and then tried again and applied the cast, but I applied it after the bracket ie
convert_from(question,'LATIN9')::bytea. I scare myself at times - but my excuse is that I haven't used a database in earnest for a fair few years. Poor excuse I know.
I'll clone the table with the problem before playing with it again to try to fix the quote encoding issue - and I'll make sure I'm in a transaction anyway so I can roll back the attempts.
Thanks again.
On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote: > I have a table with a varchar(5000) that contains general text. The table is typically > maintained via a GUI, but on this occasion I received a spreadsheet with data and > loaded it - via copy - from a csv extracted from that. The data looked fine in psql, > but when looking at the data in the GUI, characters such as single quote marks (') > appeared as a series of special characters. I assumed that the spreadsheet then had > some different encoding (UTF8?) and that I then needed to 'translate' the characters. Very likely, the characters were not really single quotes, but "curly quotes" (UNICODE 201C and 201E) characters. One of the following scenarios must have taken place: 1. The file was encoded in UTF-8, but when you copied the data in, the encoding you specified (or had by default) was a single-gyte encoding like LATIN9. The curly quotes are more than one byte in UTF-8, but each byte was interpreted as a LATIN9 character. The solution would be to specify ENCODING 'UTF8' with COPY. 2. The characters are actually fine in the database, and you loaded them correctly, and your database client encoding is UTF8, but your terminal is in LATIN9. The characters were displayed correctly, but your terminal interpreted each byte as a character. To determine which was the case, look what bytes are in the database: SELECT badcol, badcol::bytea FROM tab WHERE id = 12345; Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
aaa
On Sun, 2020-04-26 at 04:24 +0200, Laurenz Albe wrote:
Laurenz,On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote:I have a table with a varchar(5000) that contains general text. The table is typicallymaintained via a GUI, but on this occasion I received a spreadsheet with data andloaded it - via copy - from a csv extracted from that. The data looked fine in psql,but when looking at the data in the GUI, characters such as single quote marks (')appeared as a series of special characters. I assumed that the spreadsheet then hadsome different encoding (UTF8?) and that I then needed to 'translate' the characters.Very likely, the characters were not really single quotes, but "curly quotes"(UNICODE 201C and 201E) characters.One of the following scenarios must have taken place:1. The file was encoded in UTF-8, but when you copied the data in, the encodingyou specified (or had by default) was a single-gyte encoding like LATIN9.The curly quotes are more than one byte in UTF-8, but each byte was interpreted asa LATIN9 character.The solution would be to specify ENCODING 'UTF8' with COPY.2. The characters are actually fine in the database, and you loaded them correctly,and your database client encoding is UTF8, but your terminal is in LATIN9.The characters were displayed correctly, but your terminal interpreted eachbyte as a character.To determine which was the case, look what bytes are in the database:SELECT badcol, badcol::bytea FROM tab WHERE id = 12345;Yours,Laurenz Albe
Thank you for the reply. After correcting my 'convert' mistake (thanks again to Tom), I decided to drop the rows loaded from the spreadsheet and correct the source data and reload - only as Libreoffice has a simple 'find and replace' - so I could copy the 'naff' quote character into the find field and then replace with a standard quote. Looks fine so far!
Thank you all for your help,
Steve