Thread: to_char and i18n

to_char and i18n

From
Manuel Sugawara
Date:
Now that Oracle supports i18n dependant behavior in its to_char
formatting functions (at least for its 10g release) I was wondering if
a patch to support this in PostgreSQL will get accepted. I was hoping
to work on this now that I have some spare time.

Regards,
Manuel.



Re: to_char and i18n

From
"Qingqing Zhou"
Date:
"Manuel Sugawara" <masm@fciencias.unam.mx> wrote
> Now that Oracle supports i18n dependant behavior in its to_char
> formatting functions (at least for its 10g release) I was wondering if
> a patch to support this in PostgreSQL will get accepted. I was hoping
> to work on this now that I have some spare time.
>

Can you give a small introduction of i18n and what's your plan in 
PostgreSQL?

Regards,
Qingqing 




Re: to_char and i18n

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> Now that Oracle supports i18n dependant behavior in its to_char
> formatting functions (at least for its 10g release) I was wondering if
> a patch to support this in PostgreSQL will get accepted.

I thought to_char already had i18n behavior.  What exactly are you
thinking of changing?
        regards, tom lane


Re: to_char and i18n

From
Manuel Sugawara
Date:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:

> Can you give a small introduction of i18n and what's your plan in
> PostgreSQL?

i18n == Internationalization (maybe I should say l10n ==
localization). This means that to_char functions might lead to
different results depending on the i18n settings. For instance,
nowadays, select to_char(now(), 'dd-mon-yy') returns 21-dec-05
regardless of the i18n settings. This should lead 21-dic-05 in the
es_MX localization. This also applies to the concurrency symbol,
thousand separator, etc.

(Some time ago I proposed an--incomplete--patch and it was rejectd by
Karel arguing that to_char functions should behave *exactly* the same
way that they do in Oracle.)

Regards,
Manuel.




Re: to_char and i18n

From
Manuel Sugawara
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I thought to_char already had i18n behavior.  What exactly are you
> thinking of changing?

The modifiers that are suitable to localize. Month and day names comes
to mind and maybe others, I'm not sure what the state of the code is,
but I can say that, at least, the 'month' and 'day' modifiers does not
behave in a localized way.

Regards,
Manuel.



Re: to_char and i18n

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> (Some time ago I proposed an--incomplete--patch and it was rejectd by
> Karel arguing that to_char functions should behave *exactly* the same
> way that they do in Oracle.)

That is the accepted plan for to_char ... of course, if Oracle changes
to_char every so often, it'll get more interesting to decide what to do.
        regards, tom lane


Re: to_char and i18n

From
"Qingqing Zhou"
Date:
"Manuel Sugawara" <masm@fciencias.unam.mx> wrote
>
> i18n == Internationalization (maybe I should say l10n ==
> localization).

Good hint, I got it :-) Just like a crossword puzzle. 18 means there are 18 
characters between 'i' and 'n' ...

Regards,
Qingqing 




Re: to_char and i18n

From
Bruce Momjian
Date:
Qingqing Zhou wrote:
> 
> "Manuel Sugawara" <masm@fciencias.unam.mx> wrote
> >
> > i18n == Internationalization (maybe I should say l10n ==
> > localization).
> 
> Good hint, I got it :-) Just like a crossword puzzle. 18 means there are 18 
> characters between 'i' and 'n' ...

Huh?  I don't understand.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: to_char and i18n

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I thought to_char already had i18n behavior.  What exactly are you
>> thinking of changing?

> The modifiers that are suitable to localize. Month and day names comes
> to mind and maybe others, I'm not sure what the state of the code is,
> but I can say that, at least, the 'month' and 'day' modifiers does not
> behave in a localized way.

Can we spell the names differently but keep to the same field widths?
I can see where it might cause problems to change the widths --- other
than that, no objection.
        regards, tom lane


Re: to_char and i18n

From
Manuel Sugawara
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Can we spell the names differently but keep to the same field widths?
> I can see where it might cause problems to change the widths --- other
> than that, no objection.

Quite impossible. But if someone is relaying in the current behavior
of to_char she might set lc_time accordingly as this parameter is not
used in the code, AFAIK

Regards,
Manuel.


Re: to_char and i18n

From
Gavin Sherry
Date:
On Wed, 21 Dec 2005, Tom Lane wrote:

> Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > (Some time ago I proposed an--incomplete--patch and it was rejectd by
> > Karel arguing that to_char functions should behave *exactly* the same
> > way that they do in Oracle.)
>
> That is the accepted plan for to_char ... of course, if Oracle changes
> to_char every so often, it'll get more interesting to decide what to do.

There's some functionality in 10g which PostgreSQL does not have:

TZD - returns the short timezone string with daylight saving information,
eg: PDT

TZM - timezone offset minutes part

TZH - timezone offset hours part

TZR -  timezone region (US/Pacific, for example)

RR/RRRR - accept 'rounded' years, eg 99-1-1 = 1999-1-1 (kind of pointless)

FF - specify how many digits to the right of the decimal place to display,
when looking at factions of seconds. Eg: HH:MM:SS.FF3 would produce
15:56:22.123

X - the local radix character. Eg: HH:MM:SSXFF would produce 15:56:22.123

E - Era name (like, Japanese Imperial) (kind of pointless)
EE - Full era name

DS - Locale formatted short date. For example, DD/MM/YYYY for the Brits,
MM/DD/YYYY for the Yanks

DL - Locale formatted long date. Eg: fmDay, dd. Month yyyy in Germany

SCC - Like 'CC', but will carry a - (minus) for BC dates (I'm not sure if
this implies that Oracle wants BC dates to be marked 'BC'. I don't have
an Oracle system around at the moment to check though :-()

TS - Locale formatted short time.

YEAR - Year in words

SYEAR - Year in words, prefixed by minus sign for BC dates

SYYYY - YYYY, prefixed by minus sign for BC dates

Gavin


Re: to_char and i18n

From
Manuel Sugawara
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:

> There's some functionality in 10g which PostgreSQL does not have:

Good to know. I'm not an Oracle expert, actually I knew this reading
an article in a past issue of the Oracle's magazine about i18n;
essentially they were talking about how easy was for an Oracle
database to get i18n as each parameter in the to_char functions will
behave accordingly to the i18n settings.

Regards,
Manuel.




Re: to_char and i18n

From
Karel Zak
Date:
On Wed, 2005-12-21 at 23:50 -0500, Tom Lane wrote:
> Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> I thought to_char already had i18n behavior.  What exactly are you
> >> thinking of changing?
> 
> > The modifiers that are suitable to localize. Month and day names comes
> > to mind and maybe others, I'm not sure what the state of the code is,
> > but I can say that, at least, the 'month' and 'day' modifiers does not
> > behave in a localized way.

The names for months and days are hardcoded to to_char code and it's in
English only.

> Can we spell the names differently but keep to the same field widths?

That's important point. How resolve this problem Oracle? Maybe we can
say (in docs) that with non-English locales it works with days/months
names as in FM (fill) mode.

# select length( to_char(now(), 'Day') ) as Normal,        length( to_char(now(), 'FMDay') ) as FM;normal | fm
--------+----     9 |  8

It means 'FM' uses variable size of Day/Month field -- without FM is the
size fixed to 9 chars.

I think that for backward compatibility the locale sensitive to_char()
should be implemented as separate call "to_char(datetime, format,
locale)" or we should add new modifiers to the current to_char,
something like "to_char(datetime, "LCMonth") or both.

I don't have any time to work on to_char(), I can help to review patches
only.
Karel

-- 
Karel Zak <zakkr@zf.jcu.cz>



Re: to_char and i18n

From
Euler Taveira de Oliveira
Date:
--- Karel Zak <zakkr@zf.jcu.cz> escreveu:

I have a patch like this. But this was for 7.4.x. I have to take a look
at it.

> That's important point. How resolve this problem Oracle? Maybe we can
> say (in docs) that with non-English locales it works with days/months
> names as in FM (fill) mode.
> 
Yeah. We could make the new mode (TM?) ignores the FX mode and write a
note in docs.

> I think that for backward compatibility the locale sensitive
> to_char()
> should be implemented as separate call "to_char(datetime, format,
> locale)" or we should add new modifiers to the current to_char,
> something like "to_char(datetime, "LCMonth") or both.
> 
I vote for another modifier (TM?). That's more flexible than another
function overload because to_char() implements modifiers yet.

> I don't have any time to work on to_char(), I can help to review
> patches
> only.
> 
OK. I'll send a revised patch ASAP.


Euler Taveira de Oliveira
euler[at]yahoo_com_br




    
_______________________________________________________ 
Yahoo! doce lar. Faça do Yahoo! sua homepage. 
http://br.yahoo.com/homepageset.html 



Re: to_char and i18n

From
Euler Taveira de Oliveira
Date:
--- Euler Taveira de Oliveira <eulerto@yahoo.com.br> escreveu:

> I have a patch like this. But this was for 7.4.x. I have to take a
> look
> at it.
>
The patch is attached. It implements day and month i18n. I fixed a few
misspelling comments. Docs is attached too.

template1=# select to_char(now(), 'Day, DD Month YYYY');
           to_char
------------------------------
 Sunday   , 25 December  2005
(1 registro)

template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
          to_char
---------------------------
 Domingo, 25 Dezembro 2005
(1 registro)

template1=#


Comments?

Euler Taveira de Oliveira
euler[at]yahoo_com_br








_______________________________________________________
Yahoo! doce lar. Faça do Yahoo! sua homepage.
http://br.yahoo.com/homepageset.html

Attachment

Re: [PATCHES] to_char and i18n

From
Karel Zak
Date:
On Sun, 2005-12-25 at 17:56 -0300, Euler Taveira de Oliveira wrote:
> --- Euler Taveira de Oliveira <eulerto@yahoo.com.br> escreveu:
>
> > I have a patch like this. But this was for 7.4.x. I have to take a
> > look
> > at it.
> >
> The patch is attached. It implements day and month i18n. I fixed a few
> misspelling comments. Docs is attached too.
>
> template1=# select to_char(now(), 'Day, DD Month YYYY');
>            to_char
> ------------------------------
>  Sunday   , 25 December  2005
> (1 registro)
>
> template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
>           to_char
> ---------------------------
>  Domingo, 25 Dezembro 2005
> (1 registro)
>
> template1=#
>
>
> Comments?

I think it looks like a good patch. There's small problem that the
current to_char() output is possible use as argument for to_timestamp()
or to_date() function. It means you should implement vice-versa
conversion from string with TMMonth/TMDay to timestamp.

   to_timestamp('Domingo, 25 Dezembro 2005', 'TMDay, DD TMMonth YYYY')

Or.. at least describe in the docs that this way is unsupported
for 'TM' prefix.

    Karel

--
Karel Zak <zakkr@zf.jcu.cz>


Re: to_char and i18n

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Euler Taveira de Oliveira wrote:
> --- Euler Taveira de Oliveira <eulerto@yahoo.com.br> escreveu:
> 
> > I have a patch like this. But this was for 7.4.x. I have to take a
> > look
> > at it.
> >
> The patch is attached. It implements day and month i18n. I fixed a few
> misspelling comments. Docs is attached too.
> 
> template1=# select to_char(now(), 'Day, DD Month YYYY');
>            to_char            
> ------------------------------
>  Sunday   , 25 December  2005
> (1 registro)
> 
> template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
>           to_char          
> ---------------------------
>  Domingo, 25 Dezembro 2005
> (1 registro)
> 
> template1=# 
> 
> 
> Comments?
> 
> Euler Taveira de Oliveira
> euler[at]yahoo_com_br
> 
> 
>     
> 
> 
> 
>     
>         
> _______________________________________________________ 
> Yahoo! doce lar. Fa?a do Yahoo! sua homepage. 
> http://br.yahoo.com/homepageset.html 

Content-Description: 1242239392-i18n-date.diff

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: to_char and i18n

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


Euler Taveira de Oliveira wrote:
> --- Euler Taveira de Oliveira <eulerto@yahoo.com.br> escreveu:
>
> > I have a patch like this. But this was for 7.4.x. I have to take a
> > look
> > at it.
> >
> The patch is attached. It implements day and month i18n. I fixed a few
> misspelling comments. Docs is attached too.
>
> template1=# select to_char(now(), 'Day, DD Month YYYY');
>            to_char
> ------------------------------
>  Sunday   , 25 December  2005
> (1 registro)
>
> template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
>           to_char
> ---------------------------
>  Domingo, 25 Dezembro 2005
> (1 registro)
>
> template1=#
>
>
> Comments?
>
> Euler Taveira de Oliveira
> euler[at]yahoo_com_br
>
>
>
>
>
>
>
>
> _______________________________________________________
> Yahoo! doce lar. Fa?a do Yahoo! sua homepage.
> http://br.yahoo.com/homepageset.html

Content-Description: 1242239392-i18n-date.diff

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] to_char and i18n

From
Bruce Momjian
Date:
Added to TODO:

    * Allow to_date() and to_timestamp() accept localized month names

Comment added to the C code to show where it has to happen.

---------------------------------------------------------------------------

Karel Zak wrote:
> On Sun, 2005-12-25 at 17:56 -0300, Euler Taveira de Oliveira wrote:
> > --- Euler Taveira de Oliveira <eulerto@yahoo.com.br> escreveu:
> >
> > > I have a patch like this. But this was for 7.4.x. I have to take a
> > > look
> > > at it.
> > >
> > The patch is attached. It implements day and month i18n. I fixed a few
> > misspelling comments. Docs is attached too.
> >
> > template1=# select to_char(now(), 'Day, DD Month YYYY');
> >            to_char
> > ------------------------------
> >  Sunday   , 25 December  2005
> > (1 registro)
> >
> > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
> >           to_char
> > ---------------------------
> >  Domingo, 25 Dezembro 2005
> > (1 registro)
> >
> > template1=#
> >
> >
> > Comments?
>
> I think it looks like a good patch. There's small problem that the
> current to_char() output is possible use as argument for to_timestamp()
> or to_date() function. It means you should implement vice-versa
> conversion from string with TMMonth/TMDay to timestamp.
>
>    to_timestamp('Domingo, 25 Dezembro 2005', 'TMDay, DD TMMonth YYYY')
>
> Or.. at least describe in the docs that this way is unsupported
> for 'TM' prefix.
>
>     Karel
>
> --
> Karel Zak <zakkr@zf.jcu.cz>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

Re: to_char and i18n

From
Bruce Momjian
Date:
Added to TODO:
* Add missing parameter handling in to_char()     http://archives.postgresql.org/pgsql-hackers/2005-12/msg00948.php

I added a URL in TODO because it is a single message of detail I need to
reference.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Wed, 21 Dec 2005, Tom Lane wrote:
> 
> > Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > > (Some time ago I proposed an--incomplete--patch and it was rejectd by
> > > Karel arguing that to_char functions should behave *exactly* the same
> > > way that they do in Oracle.)
> >
> > That is the accepted plan for to_char ... of course, if Oracle changes
> > to_char every so often, it'll get more interesting to decide what to do.
> 
> There's some functionality in 10g which PostgreSQL does not have:
> 
> TZD - returns the short timezone string with daylight saving information,
> eg: PDT
> 
> TZM - timezone offset minutes part
> 
> TZH - timezone offset hours part
> 
> TZR -  timezone region (US/Pacific, for example)
> 
> RR/RRRR - accept 'rounded' years, eg 99-1-1 = 1999-1-1 (kind of pointless)
> 
> FF - specify how many digits to the right of the decimal place to display,
> when looking at factions of seconds. Eg: HH:MM:SS.FF3 would produce
> 15:56:22.123
> 
> X - the local radix character. Eg: HH:MM:SSXFF would produce 15:56:22.123
> 
> E - Era name (like, Japanese Imperial) (kind of pointless)
> EE - Full era name
> 
> DS - Locale formatted short date. For example, DD/MM/YYYY for the Brits,
> MM/DD/YYYY for the Yanks
> 
> DL - Locale formatted long date. Eg: fmDay, dd. Month yyyy in Germany
> 
> SCC - Like 'CC', but will carry a - (minus) for BC dates (I'm not sure if
> this implies that Oracle wants BC dates to be marked 'BC'. I don't have
> an Oracle system around at the moment to check though :-()
> 
> TS - Locale formatted short time.
> 
> YEAR - Year in words
> 
> SYEAR - Year in words, prefixed by minus sign for BC dates
> 
> SYYYY - YYYY, prefixed by minus sign for BC dates
> 
> Gavin
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 

--  Bruce Momjian   http://candle.pha.pa.us SRA OSS, Inc.   http://www.sraoss.com
 + If your life is a hard drive, Christ can be your backup. +


Re: to_char and i18n

From
Gavin Sherry
Date:
> Gavin Sherry wrote:
> > On Wed, 21 Dec 2005, Tom Lane wrote:
> >
> > > Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > > > (Some time ago I proposed an--incomplete--patch and it was rejectd by
> > > > Karel arguing that to_char functions should behave *exactly* the same
> > > > way that they do in Oracle.)
> > >
> > > That is the accepted plan for to_char ... of course, if Oracle changes
> > > to_char every so often, it'll get more interesting to decide what to do.
> >
> > There's some functionality in 10g which PostgreSQL does not have:
> >
> > TZD - returns the short timezone string with daylight saving information,
> > eg: PDT

This is the same as TZ and it is easy to implement.

> >
> > TZM - timezone offset minutes part

Trivial

> >
> > TZH - timezone offset hours part

Trivial

> >
> > TZR -  timezone region (US/Pacific, for example)

We don't currently have an offset -> region name lookup table but it
should be easy enough to implement...


> >
> > RR/RRRR - accept 'rounded' years, eg 99-1-1 = 1999-1-1 (kind of pointless)
> >
> > FF - specify how many digits to the right of the decimal place to display,
> > when looking at factions of seconds. Eg: HH:MM:SS.FF3 would produce
> > 15:56:22.123

Trivial

> >
> > X - the local radix character. Eg: HH:MM:SSXFF would produce 15:56:22.123
> >

I don't know how to get this character... is it included in the locale
data some where (and where, specifically)

> > E - Era name (like, Japanese Imperial) (kind of pointless)
> > EE - Full era name

No idea where to get this data.

> >
> > DS - Locale formatted short date. For example, DD/MM/YYYY for the Brits,
> > MM/DD/YYYY for the Yanks

Is this desirable? It may lead to confusion with datestyle.

> >
> > DL - Locale formatted long date. Eg: fmDay, dd. Month yyyy in Germany

Should be straight forward - if the underlying library will honour locale.

> >
> > SCC - Like 'CC', but will carry a - (minus) for BC dates (I'm not sure if
> > this implies that Oracle wants BC dates to be marked 'BC'. I don't have
> > an Oracle system around at the moment to check though :-()

Thoughts?

> >
> > TS - Locale formatted short time.

Should be straight forward - if the underlying library will honour locale.

> >
> > YEAR - Year in words

Hmmm. This would be hard to do if we want to support local language
settings.

> >
> > SYEAR - Year in words, prefixed by minus sign for BC dates

As above.

> >
> > SYYYY - YYYY, prefixed by minus sign for BC dates

Should be straight forward.

Any comments on the above?

Gavin


Re: to_char and i18n

From
Christopher Kings-Lynne
Date:
>>> E - Era name (like, Japanese Imperial) (kind of pointless)
>>> EE - Full era name

Some stuff here:

http://java.sun.com/javase/6/docs/guide/intl/calendar.doc.html