Re: Camel case identifiers and folding - Mailing list pgsql-general

From Morris de Oryx
Subject Re: Camel case identifiers and folding
Date
Msg-id CAKqnccgk9V79jyGOV=0O4PEbx8r97OWMdRTsJTQXrwGe1haLVQ@mail.gmail.com
Whole thread Raw
In response to Re: Camel case identifiers and folding  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: Camel case identifiers and folding  (Morris de Oryx <morrisdeoryx@gmail.com>)
Re: Camel case identifiers and folding  (Rob Sargent <robjsargent@gmail.com>)
List pgsql-general
Sounds like I may have touched a nerve with some. If so, no offense intended!

There are cases where case-sensitivity is required or desirable, it would be silly to argue otherwise. Where you have such cases, then case-sensitive queries are great. Some RDBMS systems default to case-sensitive searches, others default to case-blind searches. I'll take that to mean that either choice has merit, and it's not foolish to prefer either. I need case-blind searches vanishingly close to 100% of the time, but other people have different conditions and may find that they almost never need case-blind searches. I find this extreme hard to imagine, but find it easy to imagine people who need case-sensitive searches quite often.

What I've been thinking about most are user-driven searches on user-oriented data. Users, at least any user I've ever had, don't want case-sensitive searches. They also don't care about diacritical characters, at least in English. I worked for a French company for many years. Diacritical searches were not always preferred, but sometimes were. It depends on your user community and their requirements and norms. Sometimes it comes down to an individual user. Options are good! I was really just trying to warn someone coming from a base-blind default about Postgres behavior because, well, it hurts if you aren't expecting it. That doesn't make Postgres wrong (it's not that kind of a choice), but it is important to know about.

I'm new to Postgres (only about a year in), and it's great. But I'm used to a case-blind search as a default. And, honestly, I can *never* remember a case when a user asked for a case-sensitive search. Ever. In 30+ years. Maybe it's just me. Just kidding, it's not just me. If you're presenting users with a search interface, you can find out by asking them. Or you can AB test a search UI where there is the option of case-sensitive/blind searching, but you randomly flip which is the default. For users, case-sensitive searches are assumed. That's what Google does. Seriously, Google === Search. It's not a hard test to run. If you find that with a case-blind search, 30% of user tick the box to make it case-sensitive, then you've got users that often do care about case-sensitive search.

And since it seems to be unclear at a few places in the discussion above: It absolutely makes sense to store data in its original form and to allow for case-sensitive searches, when required. It would be very weird to store

     Call me back Ishmael, I've gotta go...

and get back anything else, be that

     call me back ishmael, i've gotta go...

or

     CALL ME BACK ISHMAEL, I'VE GOTTA GO...

As far as I understand it in Postgres, you can:

* Use something like UPPER or LOWER in Every. Single. Search.

* Fold text to one case consistently in an index to make searches case-blind.

* Use citext.

* Teach and convince every user to enter and search for data case-sensitively correctly Every. Single. Time.

On that last point, good luck. Here's an example, I'm keen on birds. Do you write it:

Black-shouldered Kite
Black-Shouldered Kite


On Sun, Mar 17, 2019 at 8:46 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2019-03-16 14:00:34 -0600, Rob Sargent wrote:
>         What sort of content is in your field of type text?  Certainly, in
>         English
>         prose, “rob” is different than “Rob”
>
>
>     I disagree. While the grammar for written English has rules when to
>     write "rob" and when to write "Rob", that distinction usually carries no
>     semantic difference. Consider:
[...]
> I don’t think it’s solely about the semantics.  One might be contractually
> obligated to always spell a name in some exact way including it capitalization.
> For instance if referring to "Rob Sargent” as a quote or accreditation, then
> it’s not okay to let a typo “rob Sargent” go through.

1) Such contracts might exist, but they are only binding to the signing
parties, they don't affect what is commonly understood as "the English
language". Everybody else will see it as an obvious typo and won't
assume that this refers to some "rob Sargent" who is a different person
than "Rob Sargent".

2) I don't think the OP was talking about spell-checking. And in any
case spell-checking is more complicated than simply comparing strings
byte by byte.

        hp


--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlyNbrwACgkQ8g5IURL+
KF2mKhAAq8b9RLFBihsO+dQf3pxjLt3bWoD9mhgMEy8GvxADuxAF4dLo1SqfX2S0
aIZFEyQgMoNKWOaiylj7TYPiBRilfVSCxggisPsirUKLKpXr4Bw9oIGiPBiE+21m
ajlaONOZNiaM9D+BFthFkPM0TcjR2FHTaXOch0HbFnPnDWMgEPwY9yyDeN8ZPCOn
7G002EB3wxHcnhoFm8jGO2E8SL9l0NLU6+CVlCPAenGI0H6gbFatHi1LKTmNe+PH
d32Il7Pf4GE0o/k92X0Yil3oqMSU/vV08LSrMVGxUDVAz+cL0/W40V/3BthELi95
l7kGJRJfgic7LU7IK84yszxR1ZJt1aBLyFIUpEoEsPl6XuSQsTOm/sFt/1C0L8Io
iLYNTnLh2CveKt/bldITAAPlHZ1rF92BUsAJRA2Wci5yv5V7orykDPgE8HzhB5U9
m1cHfjInqMi1ogGgSLw0sPH6ZDlT76p1H8RjjQQhD9eAFNYt29xFOUARyyWHVEut
ZyWMIeHzWpLRr/h+5y6uJ+LhH4NVMONc3unE6fy1MgIZtIcmeFrbx8+7XGKmv/kD
RY12aNOrCF85HdiIRd/ImXOsCWWiq7u7itZQrsqhQVRXNMnV/7M4Hlpe1rBeQ9V+
k2FXuFM/eEtyEO+KKYwct+1iYerTTKY6hyCQwBINVewe/Fohd+A=
=8B/x
-----END PGP SIGNATURE-----

pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: s3 was: Where to store Blobs?
Next
From: Morris de Oryx
Date:
Subject: Re: Camel case identifiers and folding