Thread: Custom collations, collation modifiers (eg case-insensitive)
One of the big appeals of the new COLLATE feature was, to me, the possibility that we'd be able to support custom collations including case-insensitive collations in future. It's something I'd like to tackle one day, and in the mean time want to pop on the TODO so it's not lost and forgotten. Everyone OK with that? [TODO] User-defined collations or collation modifiers/filters [TODO] Provide a built-in case-insensitive collation modifier, i.e. COLLATE ... CASE INSENSITIVE, or current-collation case insensitive as COLLATE CASE INSENSITIVE. Case insensitive collation could be implemented as a collation filter, where both inputs are transformed using the supplied function before being passed to the system collation support. The most obvious being "lower() both inputs". -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/30/2013 08:49 AM, Craig Ringer wrote: > One of the big appeals of the new COLLATE feature was, to me, the > possibility that we'd be able to support custom collations including > case-insensitive collations in future. > > It's something I'd like to tackle one day, and in the mean time want to > pop on the TODO so it's not lost and forgotten. Everyone OK with that? > > > [TODO] User-defined collations or collation modifiers/filters > > [TODO] Provide a built-in case-insensitive collation modifier, i.e. > COLLATE ... CASE INSENSITIVE, or current-collation case insensitive as > COLLATE CASE INSENSITIVE. The SQL 20xx draft I'm working from does not appear to specify subclauses to COLLATE, and vendors that use COLLATE for case insensitive comparisions appear to do so with extended collation names. So any kind of "CASE INSENSITIVE" modifier would be an extension. It has CREATE CHARACTER SET (11.41) and CREATE COLLATION (11.43). CREATE COLLATION offers control over space padding, but not case. It might be better to just offer variants of all supported collations that ignore (a) case, (b) case and accents, or (c) just accents. The only problem with that is that you can't say "give me the case insensitive variant of whatever the current locale's default collation is", you have to specify the collation. (I intensely dislike the idea of ignoring accents, but it's something some people appear to need/want, and is supported by other vendors). Anyway, I just wanted to raise this as a future TODO. Back to trying to work out issues in updatable s.b. views. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer wrote: > (I intensely dislike the idea of ignoring accents, but it's something > some people appear to need/want, and is supported by other vendors). FWIW at least in spanish, users always want to search for entries ignoring accents. Nowadays I think most turn to unaccent(), but before that was written people resorted to calling transform() on both the stored data and the user-entered query, so that all pertinent matches would be found. People would even create indexes on the unaccented data to speed this up. It's an important feature, not a corner case by any means. Not sure about other languages. For instance perhaps in French they would be interested in ignoring some accents but not others. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<p dir="ltr">This feature is interesting for Czech language too. Lot of applications allows accent free searching due possibleissues in original data or some client limits - missing Czech keyboard and similar<div class="gmail_quote">Dne 21.1.201417:17 "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>> napsal(a):<brtype="attribution" /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Craig Ringer wrote:<br /><br /> > (I intensely dislike the idea of ignoring accents, but it'ssomething<br /> > some people appear to need/want, and is supported by other vendors).<br /><br /> FWIW at least inspanish, users always want to search for entries<br /> ignoring accents. Nowadays I think most turn to unaccent(), butbefore<br /> that was written people resorted to calling transform() on both the<br /> stored data and the user-enteredquery, so that all pertinent matches<br /> would be found. People would even create indexes on the unaccenteddata<br /> to speed this up. It's an important feature, not a corner case by any<br /> means.<br /><br /> Notsure about other languages. For instance perhaps in French they<br /> would be interested in ignoring some accents butnot others.<br /><br /> --<br /> Álvaro Herrera <a href="http://www.2ndQuadrant.com/" target="_blank">http://www.2ndQuadrant.com/</a><br/> PostgreSQL Development, 24x7 Support, Training & Services<br /><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote></div>