Re: ICU integration - Mailing list pgsql-hackers

From Doug Doole
Subject Re: ICU integration
Date
Msg-id CAP6UvaMTJYCxSBqhOnMwTS-vu=u7wvut-3k6TQ4eddtnSd4a1Q@mail.gmail.com
Whole thread Raw
In response to Re: ICU integration  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
This isn't a problem for Postgres, or at least wouldn't be right now,
because we don't have case insensitive collations.

I was wondering if Postgres might be that way. It does avoid the RI constraint problem, but there are still troubles with range based predicates. (My previous project wanted case/accent insensitive collations, so we got to deal with it all.)
 
So, we use a strcmp()/memcmp() tie-breaker when strcoll() indicates equality, while also making the general notion of text equality actually mean binary equality.

We used a similar tie breaker in places. (e.g. Index keys needed to be identical, not just equal. We also broke ties in sort to make its behaviour more deterministic.)

I would like to get case insensitive collations some day, and was
really hoping that ICU would help. That being said, the need for a
strcmp() tie-breaker makes that hard. Oh well.

Prior to adding ICU to my previous project, it had the assumption that equal meant identical as well. It turned out to be a lot easier to break this assumption than I expected, but that code base had religiously used its own string comparison function for user data - strcmp()/memcmp() was never called for user data. (I don't know if the same can be said for Postgres.) We found that very few places needed to be aware of values that were equal but not identical. (Index and sort were the big two.)

Hopefully Postgres will be the same.

--
Doug Doole

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: SELECT FOR UPDATE regression in 9.5
Next
From: Robert Haas
Date:
Subject: Re: Optimization for lazy_scan_heap