Re: Question regarding UTF-8 data and "C" collation on definition of field of table - Mailing list pgsql-general

From Dionisis Kontominas
Subject Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Date
Msg-id CAB4Evu30cbKTZrYFh=zf-+TizKtQe28hFcdJpMw5wrEex7++fQ@mail.gmail.com
Whole thread Raw
In response to Re: Question regarding UTF-8 data and "C" collation on definition of field of table  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Question regarding UTF-8 data and "C" collation on definition of field of table  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hello Tom,

   Thank you for your response. 

   I suppose that affects the outcome of ORDER BY clauses on the field, along with the content of the indexes. Is this right?

   Assuming that the requirement exists, to store UTF-8 characters on a field that can be from multiple languages, and the database default encoding is UTF8 which is the right thing I suppose (please verify), what do you think should be the values of the Collation and Ctype for the database to behave correctly? I could not find something specific in the documentation.

 What I did find interesting though is the below statement:

24.2.2.1. Standard Collations

"Additionally, the SQL standard collation name ucs_basic is available for encoding UTF8. It is equivalent to C and sorts by Unicode code point."

Is this the right collation in the creation of the database in this use case? If so, what would be the corresponding suitable Ctype?

Regards,
Dionisis

On Mon, 6 Feb 2023 at 00:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dionisis Kontominas <dkontominas@gmail.com> writes:
>   Let's say that the definition is for example as follows:
>     name character varying(8) COLLATE pg_catalog."C" NOT NULL
> and also assume that the database default encoding is UTF8 and also the
> Collate and Ctype is "C"". I plan to store strings of various languages in
> this field.

> Are these the correct settings that I should have used on creation of
> the database?.

Well, it won't crash or anything, but sorting will be according
to byte-by-byte values.  So the sort order of non-ASCII text is
likely to look odd.  How much do you care about that?

                        regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Next
From: Tom Lane
Date:
Subject: Re: Question regarding UTF-8 data and "C" collation on definition of field of table