Re: Open issues for collations - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Open issues for collations |
Date | |
Msg-id | 21742.1303137667@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Open issues for collations (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Open issues for collations
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > [ assorted comments on original issues ] I believe that all the collation issues I complained about on 26 March are now resolved, except for the question of getting some more test cases, and the question of adding a cares-about-collation flag to pg_proc. I've added the latter to the TODO list since everybody agreed it was too late to consider it for 9.1. However, I've come across a new issue that maybe requires discussion: what collation should be associated with a multi-row VALUES in FROM? For instance, in SELECT ... FROM (VALUES (1, 'foo'), (2, 'bar' COLLATE "C")) v(a,b), ... what collation should be imputed to references to v.b? The way the code currently works is that the first row of the VALUES list is inspected to determine what collation to report --- so in this example, you'd get default collation, ignoring the COLLATE clause in the second row. There are several problems with this: 1. I think it violates the SQL spec. SQL:2008 7.3 <table value constructor> says that the column types of a VALUES construct are resolved per the rules of section 9.3, and 9.3 is the one with the standard verbiage about resolving a common collation, so it's hard to see how "use the first row and ignore the rest" satisfies the spec. 2. It doesn't seem to satisfy the POLA --- in the above example, ignoring the explicit COLLATE clause is rather surprising. We could document that that's what it does, perhaps, but if you can attach COLLATE to any input of an operator or function and get the same results, it's hard to explain why the same isn't true of a VALUES column. Especially when VALUES resolves data types in a much more symmetrical fashion, eg this works: (VALUES (1, '1'), (2, '2'::real)) v(a,b) 3. It's not hard to imagine people thinking they can get row-by-row- varying collation behavior from something like SELECT ... FROM (VALUES (1, 'foo' COLLATE "en_US"), (2, 'bar' COLLATE "C")) v(a,b), ... So not throwing an error, but rather silently doing something other than what this SQL seems to say, seems pretty unfriendly to me. The reason I'm expending so much verbiage on this is that fixing it seems to require an addition to struct RangeTblEntry, ie, a catversion bump and forced initdb, so that we can store a list of the resolved column collations for an RTE_VALUES RTE. I don't really think there's much choice though. Comments? regards, tom lane
pgsql-hackers by date: