Home > mailing lists

Re: Open issues for collations - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Open issues for collations
Date	April 18, 2011 11:41:14
Msg-id	21742.1303137667@sss.pgh.pa.us Whole thread Raw
In response to	Re: Open issues for collations (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Open issues for collations
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> [ assorted comments on original issues ]

I believe that all the collation issues I complained about on 26 March
are now resolved, except for the question of getting some more test
cases, and the question of adding a cares-about-collation flag to
pg_proc.  I've added the latter to the TODO list since everybody agreed
it was too late to consider it for 9.1.

However, I've come across a new issue that maybe requires discussion:
what collation should be associated with a multi-row VALUES in FROM?
For instance, in
SELECT ... FROM  (VALUES (1, 'foo'), (2, 'bar' COLLATE "C")) v(a,b),  ...

what collation should be imputed to references to v.b?

The way the code currently works is that the first row of the VALUES
list is inspected to determine what collation to report --- so in this
example, you'd get default collation, ignoring the COLLATE clause in the
second row.  There are several problems with this:

1. I think it violates the SQL spec.  SQL:2008 7.3 <table value
constructor> says that the column types of a VALUES construct are
resolved per the rules of section 9.3, and 9.3 is the one with the
standard verbiage about resolving a common collation, so it's hard
to see how "use the first row and ignore the rest" satisfies the spec.

2. It doesn't seem to satisfy the POLA --- in the above example,
ignoring the explicit COLLATE clause is rather surprising.  We could
document that that's what it does, perhaps, but if you can attach
COLLATE to any input of an operator or function and get the same
results, it's hard to explain why the same isn't true of a VALUES
column.  Especially when VALUES resolves data types in a much more
symmetrical fashion, eg this works:
  (VALUES (1, '1'), (2, '2'::real)) v(a,b)

3. It's not hard to imagine people thinking they can get row-by-row-
varying collation behavior from something like
SELECT ... FROM  (VALUES (1, 'foo' COLLATE "en_US"), (2, 'bar' COLLATE "C")) v(a,b),  ...

So not throwing an error, but rather silently doing something other than
what this SQL seems to say, seems pretty unfriendly to me.

The reason I'm expending so much verbiage on this is that fixing it
seems to require an addition to struct RangeTblEntry, ie, a catversion
bump and forced initdb, so that we can store a list of the resolved
column collations for an RTE_VALUES RTE.  I don't really think there's
much choice though.

Comments?
        regards, tom lane

pgsql-hackers by date:

From: Bernd Helmle
Date: 18 April 2011, 11:34:24
Subject: Re: JDBC connections to 9.1

From: Noah Misch
Date: 18 April 2011, 11:48:44
Subject: Re: Typed table DDL loose ends

Re: Open issues for collations - Mailing list pgsql-hackers

Previous

Next