Re: LATIN9 - hex in varchar after convert - Mailing list pgsql-novice

From Steve Tucknott (TuSol)
Subject Re: LATIN9 - hex in varchar after convert
Date
Msg-id 2c0310b2f49cf950c266334a1c3cfa2ed747395f.camel@tusol.co.uk
Whole thread Raw
In response to Re: LATIN9 - hex in varchar after convert  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-novice
aaa
On Sun, 2020-04-26 at 04:24 +0200, Laurenz Albe wrote:
On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote:
I have a table with a varchar(5000) that contains general text. The table is typically
maintained via a GUI, but on this occasion I received a spreadsheet with data and
loaded it - via copy - from a csv extracted from that. The data looked fine in psql,
but when looking at the data in the GUI, characters such as single quote marks (')
appeared as a series of special characters. I assumed that the spreadsheet then had
some different encoding (UTF8?) and that I then needed to 'translate' the characters.

Very likely, the characters were not really single quotes, but "curly quotes"
(UNICODE 201C and 201E) characters.

One of the following scenarios must have taken place:

1. The file was encoded in UTF-8, but when you copied the data in, the encoding
   you specified (or had by default) was a single-gyte encoding like LATIN9.

   The curly quotes are more than one byte in UTF-8, but each byte was interpreted as
   a LATIN9 character.

   The solution would be to specify ENCODING 'UTF8' with COPY.

2. The characters are actually fine in the database, and you loaded them correctly,
   and your database client encoding is UTF8, but your terminal is in LATIN9.

   The characters were displayed correctly, but your terminal interpreted each
   byte as a character.

To determine which was the case, look what bytes are in the database:

SELECT badcol, badcol::bytea FROM tab WHERE id = 12345;

Yours,
Laurenz Albe

Laurenz,
Thank you for the reply. After correcting my 'convert' mistake (thanks again to Tom), I decided to drop the rows loaded from the spreadsheet and correct the source data and reload - only as Libreoffice has a simple 'find and replace' - so I could copy the 'naff' quote character into the find field and then replace with a standard quote. Looks fine so far!

Thank you all for your help,
Steve

pgsql-novice by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: LATIN9 - hex in varchar after convert
Next
From: JORGE MALDONADO
Date:
Subject: Advice on a table structure