Home > mailing lists

Thread: Fault with initcap

Fault with initcap

From

Shaozhong SHI

Date:

12 October 2021, 16:31:02

I tried initcap and found a major problem with it.

Initcap of notemachine is NoteMachine.

Initcap of Sainsbury's Bank is Sainsbury'S bank.

This is not expected.

Anyway to get around this problem?

Regards,

David

Re: Fault with initcap

From

Adrian Klaver

Date:

12 October 2021, 19:34:16

On 10/12/21 09:31, Shaozhong SHI wrote:
> I tried initcap and found a major problem with it.

What Postgres version?

In version 12 and 14 I get:

> 
> Initcap of notemachine is NoteMachine.

select initcap('notemachine');
    initcap
-------------
  Notemachine

> 
> Initcap of Sainsbury's Bank is Sainsbury'S bank.

select initcap('Sainsbury''s Bank');
      initcap
------------------
  Sainsbury'S Bank

Which follows the definition here:

https://www.postgresql.org/docs/14/functions-string.html

initcap ( text ) → text

Converts the first letter of each word to upper case and the rest to 
lower case. Words are sequences of alphanumeric characters separated by 
non-alphanumeric characters.

> 
> This is not expected.

What is the encoding, collate, ctype for the database?

Can be found in psql using:

\l db_name

> 
> Anyway to get around this problem?
> 
> Regards,
> 
> David

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Fault with initcap

From

Shaozhong SHI

Date:

12 October 2021, 20:50:16

On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 10/12/21 09:31, Shaozhong SHI wrote:
> I tried initcap and found a major problem with it.

What Postgres version?

In version 12 and 14 I get:

>
> Initcap of notemachine is NoteMachine.

select initcap('notemachine');
initcap
-------------
Notemachine

>
> Initcap of Sainsbury's Bank is Sainsbury'S bank.

select initcap('Sainsbury''s Bank');
initcap
------------------
Sainsbury'S Bank

Which follows the definition here:

https://www.postgresql.org/docs/14/functions-string.html

initcap ( text ) → text

Converts the first letter of each word to upper case and the rest to
lower case. Words are sequences of alphanumeric characters separated by
non-alphanumeric characters.

Hi, Adrian Klaver,

It looks like that you replicated the error.

There must be a way to do the following.

a column contains a list of words. Only the first letter of each word should be capitalised. INITCAP can not do that. How to create a function just to capitalised each word (substring) in a list of words/strings. This will be very useful and create great impact.

Regards,

David

Re: Fault with initcap

From

Karsten Hilbert

Date:

12 October 2021, 21:09:40

Am Tue, Oct 12, 2021 at 09:50:16PM +0100 schrieb Shaozhong SHI:

> There must be a way to do the following.
>
> [...] Only the first letter of each word should be capitalised.

Indeed, there is. It is called "human brain in cultural
context". "AI" is close nowadays, but, hopefully, not quite
there yet.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B

Re: Fault with initcap

From

Adrian Klaver

Date:

12 October 2021, 22:02:55

On 10/12/21 13:50, Shaozhong SHI wrote:
> 
> 
> On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com 

> 
>     Which follows the definition here:
> 
>     https://www.postgresql.org/docs/14/functions-string.html
>     <https://www.postgresql.org/docs/14/functions-string.html>
> 
>     initcap ( text ) → text
> 
>     Converts the first letter of each word to upper case and the rest to
>     lower case. Words are sequences of alphanumeric characters separated by
>     non-alphanumeric characters.
> 
>     Hi, Adrian Klaver,
> 
> 
> It looks like that you replicated the error.

There is no error, initcap is doing what it is documented to.

notemachine is not two words anymore then 'online', 'bluebell', 
'network' are.


> 
> There must be a way to do the following.

Maybe, but as Karsten says it would involve an AI. One that understands 
the mutt language that is English.

> 
> a column contains a list of words.  Only the first letter of each word 
> should be capitalised.  INITCAP can not do that.  How to create a 
> function just to capitalised each word (substring) in a list of 
> words/strings.  This will be very useful and create great impact.

 From here:

https://www.grammarly.com/blog/14-of-the-longest-words-in-english/

uncopyrightable

where would you split that into words?:

Some 'words' I see:

un
unc
copy
copyright
right
table
able

> 
> Regards,
> 
> David


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Fault with initcap

From

Shaozhong SHI

Date:

12 October 2021, 23:03:07

On Tue, 12 Oct 2021 at 23:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 10/12/21 13:50, Shaozhong SHI wrote:
>
>
> On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com

>
> Which follows the definition here:
>
> https://www.postgresql.org/docs/14/functions-string.html
> <https://www.postgresql.org/docs/14/functions-string.html>
>
> initcap ( text ) → text
>
> Converts the first letter of each word to upper case and the rest to
> lower case. Words are sequences of alphanumeric characters separated by
> non-alphanumeric characters.
>
> Hi, Adrian Klaver,
>
>
> It looks like that you replicated the error.

There is no error, initcap is doing what it is documented to.

notemachine is not two words anymore then 'online', 'bluebell',
'network' are.

>
> There must be a way to do the following.

Maybe, but as Karsten says it would involve an AI. One that understands
the mutt language that is English.

>
> a column contains a list of words. Only the first letter of each word
> should be capitalised. INITCAP can not do that. How to create a
> function just to capitalised each word (substring) in a list of
> words/strings. This will be very useful and create great impact.

From here:

https://www.grammarly.com/blog/14-of-the-longest-words-in-english/

uncopyrightable

where would you split that into words?:

Some 'words' I see:

un
unc
copy
copyright
right
table
able

>
> Regards,
>
> David

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi, Adrian Klaver,

In Python, there is a capwords. Do we have an equivalent in Postgres?

Regards, David

Re: Fault with initcap

From

Adrian Klaver

Date:

13 October 2021, 00:27:51

On 10/12/21 16:03, Shaozhong SHI wrote:
> 
> 

> 
> Hi, Adrian Klaver,
> 
> In Python, there is  a capwords.  Do we have an equivalent in Postgres?

https://docs.python.org/3/library/string.html?highlight=capwords#string.capwords


string.capwords(s, sep=None)

     Split the argument into words using str.split(), capitalize each 
word using str.capitalize(), and join the capitalized words using 
str.join(). If the optional second argument sep is absent or None, runs 
of whitespace characters are replaced by a single space and leading and 
trailing whitespace are removed, otherwise sep is used to split and join 
the words.

That is not going to do what you are proposing either as it still comes 
down to deciding where to split the 'words':

import string

string.capwords('notemachine') 
 


'Notemachine'


There are similar functions to split strings here:

https://www.postgresql.org/docs/14/functions-string.html

Though again they depend on some delimiter or regexp to make the split.

There is no function that just 'knows' that 'notemachine' is two words 
and should become 'NoteMachine'.

> Regards, David


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Fault with initcap

From

Lee Hachadoorian

Date:

13 October 2021, 03:55:08

On Tue, Oct 12, 2021 at 8:28 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

There is no function that just 'knows' that 'notemachine' is two words
and should become 'NoteMachine'.

Any chance that 'notemachine' is stored with a zero-width space (Unicode U+200B)? This is common for compound words where the programmer wants to be able to "know" that they are distinct words that are smashed together.

select initcap('notemachine'), initcap('note' || U&'\200B' || 'machine');

initcap |initcap |
-----------+------------+
Notemachine|NoteMachine|

You could check the length() of the string. If it is 12, there is an invisible character in there. This would explain the "unexpected" behavior that no one seems to be able to replicate.

Best,

--Lee

Lee Hachadoorian
Assistant Professor of Instruction, Geography and Urban Studies
Assistant Director, Professional Science Master's in GIS
Temple University