Thread: Fault with initcap
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.
Anyway to get around this problem?
Regards,
David
On 10/12/21 09:31, Shaozhong SHI wrote: > I tried initcap and found a major problem with it. What Postgres version? In version 12 and 14 I get: > > Initcap of notemachine is NoteMachine. select initcap('notemachine'); initcap ------------- Notemachine > > Initcap of Sainsbury's Bank is Sainsbury'S bank. select initcap('Sainsbury''s Bank'); initcap ------------------ Sainsbury'S Bank Which follows the definition here: https://www.postgresql.org/docs/14/functions-string.html initcap ( text ) → text Converts the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. > > This is not expected. What is the encoding, collate, ctype for the database? Can be found in psql using: \l db_name > > Anyway to get around this problem? > > Regards, > > David -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/12/21 09:31, Shaozhong SHI wrote:
> I tried initcap and found a major problem with it.
What Postgres version?
In version 12 and 14 I get:
>
> Initcap of notemachine is NoteMachine.
select initcap('notemachine');
initcap
-------------
Notemachine
>
> Initcap of Sainsbury's Bank is Sainsbury'S bank.
select initcap('Sainsbury''s Bank');
initcap
------------------
Sainsbury'S Bank
Which follows the definition here:
https://www.postgresql.org/docs/14/functions-string.html
initcap ( text ) → text
Converts the first letter of each word to upper case and the rest to
lower case. Words are sequences of alphanumeric characters separated by
non-alphanumeric characters.
Hi, Adrian Klaver,
It looks like that you replicated the error.
There must be a way to do the following.
a column contains a list of words. Only the first letter of each word should be capitalised. INITCAP can not do that. How to create a function just to capitalised each word (substring) in a list of words/strings. This will be very useful and create great impact.
Regards,
David
Am Tue, Oct 12, 2021 at 09:50:16PM +0100 schrieb Shaozhong SHI: > There must be a way to do the following. > > [...] Only the first letter of each word should be capitalised. Indeed, there is. It is called "human brain in cultural context". "AI" is close nowadays, but, hopefully, not quite there yet. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On 10/12/21 13:50, Shaozhong SHI wrote: > > > On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com > > Which follows the definition here: > > https://www.postgresql.org/docs/14/functions-string.html > <https://www.postgresql.org/docs/14/functions-string.html> > > initcap ( text ) → text > > Converts the first letter of each word to upper case and the rest to > lower case. Words are sequences of alphanumeric characters separated by > non-alphanumeric characters. > > Hi, Adrian Klaver, > > > It looks like that you replicated the error. There is no error, initcap is doing what it is documented to. notemachine is not two words anymore then 'online', 'bluebell', 'network' are. > > There must be a way to do the following. Maybe, but as Karsten says it would involve an AI. One that understands the mutt language that is English. > > a column contains a list of words. Only the first letter of each word > should be capitalised. INITCAP can not do that. How to create a > function just to capitalised each word (substring) in a list of > words/strings. This will be very useful and create great impact. From here: https://www.grammarly.com/blog/14-of-the-longest-words-in-english/ uncopyrightable where would you split that into words?: Some 'words' I see: un unc copy copyright right table able > > Regards, > > David -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 12 Oct 2021 at 23:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/12/21 13:50, Shaozhong SHI wrote:
>
>
> On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com
>
> Which follows the definition here:
>
> https://www.postgresql.org/docs/14/functions-string.html
> <https://www.postgresql.org/docs/14/functions-string.html>
>
> initcap ( text ) → text
>
> Converts the first letter of each word to upper case and the rest to
> lower case. Words are sequences of alphanumeric characters separated by
> non-alphanumeric characters.
>
> Hi, Adrian Klaver,
>
>
> It looks like that you replicated the error.
There is no error, initcap is doing what it is documented to.
notemachine is not two words anymore then 'online', 'bluebell',
'network' are.
>
> There must be a way to do the following.
Maybe, but as Karsten says it would involve an AI. One that understands
the mutt language that is English.
>
> a column contains a list of words. Only the first letter of each word
> should be capitalised. INITCAP can not do that. How to create a
> function just to capitalised each word (substring) in a list of
> words/strings. This will be very useful and create great impact.
From here:
https://www.grammarly.com/blog/14-of-the-longest-words-in-english/
uncopyrightable
where would you split that into words?:
Some 'words' I see:
un
unc
copy
copyright
right
table
able
>
> Regards,
>
> David
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi, Adrian Klaver,
In Python, there is a capwords. Do we have an equivalent in Postgres?
Regards, David
On 10/12/21 16:03, Shaozhong SHI wrote: > > > > Hi, Adrian Klaver, > > In Python, there is a capwords. Do we have an equivalent in Postgres? https://docs.python.org/3/library/string.html?highlight=capwords#string.capwords string.capwords(s, sep=None) Split the argument into words using str.split(), capitalize each word using str.capitalize(), and join the capitalized words using str.join(). If the optional second argument sep is absent or None, runs of whitespace characters are replaced by a single space and leading and trailing whitespace are removed, otherwise sep is used to split and join the words. That is not going to do what you are proposing either as it still comes down to deciding where to split the 'words': import string string.capwords('notemachine') 'Notemachine' There are similar functions to split strings here: https://www.postgresql.org/docs/14/functions-string.html Though again they depend on some delimiter or regexp to make the split. There is no function that just 'knows' that 'notemachine' is two words and should become 'NoteMachine'. > Regards, David -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, Oct 12, 2021 at 8:28 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
There is no function that just 'knows' that 'notemachine' is two words
and should become 'NoteMachine'.
Any chance that 'notemachine' is stored with a zero-width space (Unicode U+200B)? This is common for compound words where the programmer wants to be able to "know" that they are distinct words that are smashed together.
select initcap('notemachine'), initcap('note' || U&'\200B' || 'machine');
initcap |initcap |
-----------+------------+
Notemachine|NoteMachine|
-----------+------------+
Notemachine|NoteMachine|
You could check the length() of the string. If it is 12, there is an invisible character in there. This would explain the "unexpected" behavior that no one seems to be able to replicate.
Best,
--Lee
Lee Hachadoorian
Assistant Professor of Instruction, Geography and Urban Studies
Assistant Director, Professional Science Master's in GIS
Temple University
Assistant Professor of Instruction, Geography and Urban Studies
Assistant Director, Professional Science Master's in GIS
Temple University