Thread: Capitalization of localized month and day names (to_char() with 'TMmonth', 'TMday', etc.)

Hello and thank you very much for the best open source database engine!

There's just this tiny but seemingly obvious issue that I can't believe I
haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in an
English locale (en_US.UTF-8 at least). Names of months and weekdays are proper
nouns and as such *always* capitalized in English, so that seems wrong to me.

If you want to build an internationalized application, you want 'TMmonth' to
return a month name that can be used in the middle of a sentence, capitalized
or not depending on locale, and 'TMMonth' to return an always capitalized
month name that can be used at the start of a sentence.

This was discussed back in 2008:

https://www.postgresql.org/message-id/flat/
47C34A98.7050102%40timbira.com#9593d90487976d28e2b612cff576545d

There is talk about how PostgreSQL has to do what Oracle does, but does it
really have to replicate bugs at this level of detail? Localized date and
number formats are only for presentation, and not meant to be machine-
readable.

I imagine that the reason it works the way it does is that the unlocalized
formats exist, and there wouldn't be any difference between 'month' and
'Month' if 'month' also capitalized month names due to English language rules,
and as long as you're not building an internationalized application you can
always use 'Month' to get it right.

(This is the situation in PostgreSQL 13, at least. I haven't tried PostgreSQL
14, but there are no mentions of to_char() or localization in the release
notes, nothing in the documentation of to_char() suggesting any change, and I
also haven't found any more recent discussions.)

Thoughts?

--
Magnus Holmgren
MILLNET AB, Teknikringen 6, 583 30 Linköping




--
Vid e-postkontakt med Millnet är det normalt att åtminstone vissa
personuppgifter sparas om dig. Du kan läsa mer om vilka uppgifter som
sparas och hur vi hanterar dem på https://www.millnet.se/integritetspolicy/
<https://www.millnet.se/integritetspolicy/>.



Re: Capitalization of localized month and day names (to_char() with 'TMmonth', 'TMday', etc.)

From
Juan José Santamaría Flecha
Date:

On Wed, Oct 6, 2021 at 11:09 AM Magnus Holmgren <magnus.holmgren@millnet.se> wrote:

There's just this tiny but seemingly obvious issue that I can't believe I
haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in an
English locale (en_US.UTF-8 at least). Names of months and weekdays are proper
nouns and as such *always* capitalized in English, so that seems wrong to me.

IMHO, the patterns of TO_CHAR() do as promised in the documentation [1]:

MONTH full upper case month name (blank-padded to 9 chars)
Month full capitalized month name (blank-padded to 9 chars)
month full lower case month name (blank-padded to 9 chars)

What you are proposing looks more like a new feature than a bug.


Regards,

Juan José Santamaría Flecha
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> On Wed, Oct 6, 2021 at 11:09 AM Magnus Holmgren <magnus.holmgren@millnet.se>
> wrote:
>> There's just this tiny but seemingly obvious issue that I can't believe I
>> haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in
>> an
>> English locale (en_US.UTF-8 at least). Names of months and weekdays are
>> proper
>> nouns and as such *always* capitalized in English, so that seems wrong to
>> me.

> IMHO, the patterns of TO_CHAR() do as promised in the documentation [1]:
> MONTH full upper case month name (blank-padded to 9 chars)
> Month full capitalized month name (blank-padded to 9 chars)
> month full lower case month name (blank-padded to 9 chars)

> What you are proposing looks more like a new feature than a bug.

Yeah, this is operating as designed and documented.  The idea that
there should be a way to get "month name as it'd be spelled mid-sentence"
is an interesting one, but I really doubt that anyone would thank us for
changing TMmonth to act that way.  (Perhaps a new format code or modifier
would be easier to swallow?)

I also wonder exactly how the code would figure out what to do ---
language-specific conventions for this are not information available
from the libc locale APIs, AFAIR.

            regards, tom lane



onsdag 6 oktober 2021 kl. 16:01:49 CEST skrev du:
> =?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=
<juanjo.santamaria@gmail.com> writes:
> > On Wed, Oct 6, 2021 at 11:09 AM Magnus Holmgren
> > <magnus.holmgren@millnet.se>>
> > wrote:
> >> There's just this tiny but seemingly obvious issue that I can't believe I
> >> haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in
> >> an
> >> English locale (en_US.UTF-8 at least). Names of months and weekdays are
> >> proper
> >> nouns and as such *always* capitalized in English, so that seems wrong to
> >> me.
> >
> > IMHO, the patterns of TO_CHAR() do as promised in the documentation [1]:
> > MONTH full upper case month name (blank-padded to 9 chars)
> > Month full capitalized month name (blank-padded to 9 chars)
> > month full lower case month name (blank-padded to 9 chars)
> >
> > What you are proposing looks more like a new feature than a bug.
>
> Yeah, this is operating as designed and documented.  The idea that
> there should be a way to get "month name as it'd be spelled mid-sentence"
> is an interesting one, but I really doubt that anyone would thank us for
> changing TMmonth to act that way.  (Perhaps a new format code or modifier
> would be easier to swallow?)

Yes, I see that it's working as designed and documented, but I contend that
the design is flawed for the reason I gave. I mean, you can't deny that names
of months and weekdays are always capitalized in English and certain other
languages, whereas in another set of languages they are not, can you? Perhaps
this is a conscious design choice with some reason behind it, but if so,
neither the PostgreSQL nor the Oracle documentation (https://docs.oracle.com/
cd/B12037_01/server.101/b10759/sql_elements004.htm#i34510) reveal it. What is
the use case for linguistically incorrectly lowercased localized month and day
names? What would such a change break?

I still suspect that whoever designed this didn't consider locale switching.
(Interestingly, "month", "mon", "day", and "dy" are locale-specific by
themselves; there is no "TM" prefix needed.

> I also wonder exactly how the code would figure out what to do ---
> language-specific conventions for this are not information available
> from the libc locale APIs, AFAIR.

I checked the code, and it looks like cache_locale_time() in src/backend/
utils/adt/pg_locale.c uses strftime(3) to produce the correctly capitalized
day and month names and abbreviations (format codes %A, %B, %a, and %b). All
that would be needed is not to force them to lowercase in DCH_to_char() in
src/backend/utils/adt/formatting.c.

What could a new, separate format code that doesn't do this look like?

--
Magnus Holmgren, developer
MILLNET AB



--
Vid e-postkontakt med Millnet är det normalt att åtminstone vissa
personuppgifter sparas om dig. Du kan läsa mer om vilka uppgifter som
sparas och hur vi hanterar dem på https://www.millnet.se/integritetspolicy/
<https://www.millnet.se/integritetspolicy/>.