Re: LATIN9 - hex in varchar after convert - Mailing list pgsql-novice

From Laurenz Albe
Subject Re: LATIN9 - hex in varchar after convert
Date
Msg-id 85be3e3e23ec52df22c699af0c5eec022f1ceb51.camel@cybertec.at
Whole thread Raw
In response to LATIN9 - hex in varchar after convert  ("Steve Tucknott (TuSol)" <steve@tusol.co.uk>)
Responses Re: LATIN9 - hex in varchar after convert  ("Steve Tucknott (TuSol)" <steve@tusol.co.uk>)
List pgsql-novice
On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote:
> I have a table with a varchar(5000) that contains general text. The table is typically
> maintained via a GUI, but on this occasion I received a spreadsheet with data and
> loaded it - via copy - from a csv extracted from that. The data looked fine in psql,
> but when looking at the data in the GUI, characters such as single quote marks (')
> appeared as a series of special characters. I assumed that the spreadsheet then had
> some different encoding (UTF8?) and that I then needed to 'translate' the characters.

Very likely, the characters were not really single quotes, but "curly quotes"
(UNICODE 201C and 201E) characters.

One of the following scenarios must have taken place:

1. The file was encoded in UTF-8, but when you copied the data in, the encoding
   you specified (or had by default) was a single-gyte encoding like LATIN9.

   The curly quotes are more than one byte in UTF-8, but each byte was interpreted as
   a LATIN9 character.

   The solution would be to specify ENCODING 'UTF8' with COPY.

2. The characters are actually fine in the database, and you loaded them correctly,
   and your database client encoding is UTF8, but your terminal is in LATIN9.

   The characters were displayed correctly, but your terminal interpreted each
   byte as a character.

To determine which was the case, look what bytes are in the database:

SELECT badcol, badcol::bytea FROM tab WHERE id = 12345;

Yours,
Laurenz Albe

-- 
Cybertec | https://www.cybertec-postgresql.com




pgsql-novice by date:

Previous
From: "Steve Tucknott (TuSol)"
Date:
Subject: Re: LATIN9 - hex in varchar after convert
Next
From: "Steve Tucknott (TuSol)"
Date:
Subject: Re: LATIN9 - hex in varchar after convert