Re: Open issues for collations - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Open issues for collations
Date
Msg-id 80857DD4-EFBB-4E0E-A7A7-BE9529AF8634@gmail.com
Whole thread Raw
In response to Open issues for collations  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Open issues for collations
List pgsql-hackers
On Mar 26, 2011, at 12:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ** Selecting a field from a record-returning function's output.
> Currently, we'll use the field's declared collation; except that
> if the field has default collation, we'll replace that with the common
> collation of the function's inputs, if any.  Is either part of that
> sane?  Do we need to make this work for functions invoked with other
> syntax than a plain function call, eg operator or cast syntax?

I am not an expert on this topic in any way. That having been said, the first part of that rule seems quite sane. The
secondpart seems less clear, but probably also sane. 

> ** What to do with domains whose declaration includes a COLLATE clause?
> Currently, we'll impute that collation to the result of a cast to the
> domain type --- even if the cast's input expression includes an
> explicit COLLATE clause.

I would have thought that an explicit COLLATE clause would trump any action at a distance.

> * In plpgsql, is it OK for declared local variables to inherit the
> function's input collation?  Should we provide a COLLATE option in
> variable declarations to let that be overridden?  If Oracle understands
> COLLATE, probably we should look at what they do in PL/SQL.

I don't know what Oracle does, but a collate option in variable declarations seems like a very good idea.  Inheriting
theinput collation if not specified seems good too. I also suspect we might need something like COLLATE FROM $1, but
maybethat's a 9.2 feature. 

> * RI triggers should insert COLLATE clauses in generated queries to
> satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
> referenced column's collation.  Right now you may get either table's
> collation depending on which query type is involved.  I think an obvious
> failure may not be possible so long as equality means the same thing in
> all collations, but it's definitely possible that the planner might
> decide it can't use the referenced column's unique index, which would
> suck for performance.  (Note: this rule seems to prove that the
> committee assumes equality can mean different things in different
> collations, else they'd not have felt the need to specify.)

No idea what to do about this.

> * It'd sure be nice if we had some nontrivial test cases that work in
> encodings besides UTF8.  I'm still bothered that the committed patch
> failed to cover single-byte-encoding cases in upper/lower/initcap.

Or this.

> * Remove initdb's warning about useless locales?  Seems like pointless
> noise, or at least something that can be relegated to debug mode.

+1.

> * Is it worth adding a cares-about-collation flag to pg_proc?  Probably
> too late to be worrying about such refinements for 9.1.

Depends how much knock-on work it'll create.

...Robert

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Open issues for collations
Next
From: Tom Lane
Date:
Subject: Re: 9.1 Beta