Open issues for collations - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Open issues for collations |
Date | |
Msg-id | 29173.1301114203@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Open issues for collations
Re: Open issues for collations Re: Open issues for collations Re: Open issues for collations Re: Open issues for collations Re: Open issues for collations Re: Open issues for collations |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > I think some discussion of which of the things on the open > item lists need to be done before beta might be helpful, and we ought > to add any items that are not there but are blockers. Here's a quick enumeration of some things I think need discussion about the collations patch: * Are we happy yet with the collation assignment behavior (see parse_collate.c)? A couple of specific subtopics: ** Selecting a field from a record-returning function's output. Currently, we'll use the field's declared collation; except that if the field has default collation, we'll replace that with the common collation of the function's inputs, if any. Is either part of that sane? Do we need to make this work for functions invoked with other syntax than a plain function call, eg operator or cast syntax? ** What to do with domains whose declaration includes a COLLATE clause? Currently, we'll impute that collation to the result of a cast to the domain type --- even if the cast's input expression includes an explicit COLLATE clause. It's not clear that that's per spec. If it is correct, should we behave similarly for functions that are declared to return a domain type? Should it matter if the cast-to-domain is explicit or implicit? Perhaps it'd be best if domain collations only mattered for columns declared with that domain type. Then we'd have a general rule that collations only come into play in an expression as a result of (a) the declared type of a column reference or (b) an explicit COLLATE clause. * In plpgsql, is it OK for declared local variables to inherit the function's input collation? Should we provide a COLLATE option in variable declarations to let that be overridden? If Oracle understands COLLATE, probably we should look at what they do in PL/SQL. * RI triggers should insert COLLATE clauses in generated queries to satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the referenced column's collation. Right now you may get either table's collation depending on which query type is involved. I think an obvious failure may not be possible so long as equality means the same thing in all collations, but it's definitely possible that the planner might decide it can't use the referenced column's unique index, which would suck for performance. (Note: this rule seems to prove that the committee assumes equality can mean different things in different collations, else they'd not have felt the need to specify.) * It'd sure be nice if we had some nontrivial test cases that work in encodings besides UTF8. I'm still bothered that the committed patch failed to cover single-byte-encoding cases in upper/lower/initcap. * Remove initdb's warning about useless locales? Seems like pointless noise, or at least something that can be relegated to debug mode. * Is it worth adding a cares-about-collation flag to pg_proc? Probably too late to be worrying about such refinements for 9.1. There are a bunch of other minor issues that I'm still working through, but these are the ones that seem to merit discussion. regards, tom lane
pgsql-hackers by date: