Open issues for collations - Mailing list pgsql-hackers

Robert Haas <robertmhaas@gmail.com> writes:
> I think some discussion of which of the things on the open
> item lists need to be done before beta might be helpful, and we ought
> to add any items that are not there but are blockers.

Here's a quick enumeration of some things I think need discussion about
the collations patch:

* Are we happy yet with the collation assignment behavior (see
parse_collate.c)?  A couple of specific subtopics:

** Selecting a field from a record-returning function's output.
Currently, we'll use the field's declared collation; except that
if the field has default collation, we'll replace that with the common
collation of the function's inputs, if any.  Is either part of that
sane?  Do we need to make this work for functions invoked with other
syntax than a plain function call, eg operator or cast syntax?

** What to do with domains whose declaration includes a COLLATE clause?
Currently, we'll impute that collation to the result of a cast to the
domain type --- even if the cast's input expression includes an
explicit COLLATE clause.  It's not clear that that's per spec.  If it
is correct, should we behave similarly for functions that are declared
to return a domain type?  Should it matter if the cast-to-domain is
explicit or implicit?  Perhaps it'd be best if domain collations only
mattered for columns declared with that domain type.  Then we'd have
a general rule that collations only come into play in an expression
as a result of (a) the declared type of a column reference or (b)
an explicit COLLATE clause.


* In plpgsql, is it OK for declared local variables to inherit the
function's input collation?  Should we provide a COLLATE option in
variable declarations to let that be overridden?  If Oracle understands
COLLATE, probably we should look at what they do in PL/SQL.

* RI triggers should insert COLLATE clauses in generated queries to
satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
referenced column's collation.  Right now you may get either table's
collation depending on which query type is involved.  I think an obvious
failure may not be possible so long as equality means the same thing in
all collations, but it's definitely possible that the planner might
decide it can't use the referenced column's unique index, which would
suck for performance.  (Note: this rule seems to prove that the
committee assumes equality can mean different things in different
collations, else they'd not have felt the need to specify.)

* It'd sure be nice if we had some nontrivial test cases that work in
encodings besides UTF8.  I'm still bothered that the committed patch
failed to cover single-byte-encoding cases in upper/lower/initcap.

* Remove initdb's warning about useless locales?  Seems like pointless
noise, or at least something that can be relegated to debug mode.

* Is it worth adding a cares-about-collation flag to pg_proc?  Probably
too late to be worrying about such refinements for 9.1.

There are a bunch of other minor issues that I'm still working through,
but these are the ones that seem to merit discussion.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Darren Duncan
Date:
Subject: resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Next
From: "David E. Wheeler"
Date:
Subject: Re: WIP: Allow SQL-language functions to reference parameters by parameter name