Thread: Custom collations, collation modifiers (eg case-insensitive)

Custom collations, collation modifiers (eg case-insensitive)

From
Craig Ringer
Date:
One of the big appeals of the new COLLATE feature was, to me, the
possibility that we'd be able to support custom collations including
case-insensitive collations in future.

It's something I'd like to tackle one day, and in the mean time want to
pop on the TODO so it's not lost and forgotten. Everyone OK with that?


[TODO] User-defined collations or collation modifiers/filters

[TODO] Provide a built-in case-insensitive collation modifier, i.e.
COLLATE ... CASE INSENSITIVE, or current-collation case insensitive as
COLLATE CASE INSENSITIVE.


Case insensitive collation could be implemented as a collation filter,
where both inputs are transformed using the supplied function before
being passed to the system collation support. The most obvious being
"lower() both inputs".

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Custom collations, collation modifiers (eg case-insensitive)

From
Craig Ringer
Date:
On 12/30/2013 08:49 AM, Craig Ringer wrote:
> One of the big appeals of the new COLLATE feature was, to me, the
> possibility that we'd be able to support custom collations including
> case-insensitive collations in future.
> 
> It's something I'd like to tackle one day, and in the mean time want to
> pop on the TODO so it's not lost and forgotten. Everyone OK with that?
> 
> 
> [TODO] User-defined collations or collation modifiers/filters
> 
> [TODO] Provide a built-in case-insensitive collation modifier, i.e.
> COLLATE ... CASE INSENSITIVE, or current-collation case insensitive as
> COLLATE CASE INSENSITIVE.

The SQL 20xx draft I'm working from does not appear to specify
subclauses to COLLATE, and vendors that use COLLATE for case insensitive
comparisions appear to do so with extended collation names. So any kind
of "CASE INSENSITIVE" modifier would be an extension.

It has CREATE CHARACTER SET (11.41) and CREATE COLLATION (11.43). CREATE
COLLATION offers control over space padding, but not case.

It might be better to just offer variants of all supported collations
that ignore (a) case, (b) case and accents, or (c) just accents. The
only problem with that is that you can't say "give me the case
insensitive variant of whatever the current locale's default collation
is", you have to specify the collation.

(I intensely dislike the idea of ignoring accents, but it's something
some people appear to need/want, and is supported by other vendors).

Anyway, I just wanted to raise this as a future TODO. Back to trying to
work out issues in updatable s.b. views.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Custom collations, collation modifiers (eg case-insensitive)

From
Alvaro Herrera
Date:
Craig Ringer wrote:

> (I intensely dislike the idea of ignoring accents, but it's something
> some people appear to need/want, and is supported by other vendors).

FWIW at least in spanish, users always want to search for entries
ignoring accents.  Nowadays I think most turn to unaccent(), but before
that was written people resorted to calling transform() on both the
stored data and the user-entered query, so that all pertinent matches
would be found.  People would even create indexes on the unaccented data
to speed this up.  It's an important feature, not a corner case by any
means.

Not sure about other languages.  For instance perhaps in French they
would be interested in ignoring some accents but not others.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Custom collations, collation modifiers (eg case-insensitive)

From
Pavel Stehule
Date:
<p dir="ltr">This feature is interesting for Czech language too. Lot of applications allows accent free searching due
possibleissues in original data or some client limits - missing Czech keyboard and similar<div class="gmail_quote">Dne
21.1.201417:17 "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>
napsal(a):<brtype="attribution" /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">Craig Ringer wrote:<br /><br /> > (I intensely dislike the idea of ignoring accents, but
it'ssomething<br /> > some people appear to need/want, and is supported by other vendors).<br /><br /> FWIW at least
inspanish, users always want to search for entries<br /> ignoring accents.  Nowadays I think most turn to unaccent(),
butbefore<br /> that was written people resorted to calling transform() on both the<br /> stored data and the
user-enteredquery, so that all pertinent matches<br /> would be found.  People would even create indexes on the
unaccenteddata<br /> to speed this up.  It's an important feature, not a corner case by any<br /> means.<br /><br />
Notsure about other languages.  For instance perhaps in French they<br /> would be interested in ignoring some accents
butnot others.<br /><br /> --<br /> Álvaro Herrera                <a href="http://www.2ndQuadrant.com/"
target="_blank">http://www.2ndQuadrant.com/</a><br/> PostgreSQL Development, 24x7 Support, Training & Services<br
/><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote></div>