Thread: using strxfrm for having multi locale/please vote for adding this function in contribution

using strxfrm for having multi locale/please vote for adding this function in contribution

From
Mahmoud Taghizadeh
Date:
there was a discussion in postgresql mailing list about using strxfrm function to add support multi locale.
most of developers agree that this method is a correct but not perfect solution for having multiple locale in postgresql and some people suggest to add such functions in contribution.
 
the main problem of such functions is the overhead of setlocale. (that can be ommited when you run postgresql in a OS with advanced GLIBC and strxfrm_l function)
 
I summerized discussion and add one implementation of such functions, I try to convince tom lane to add this function to the contribution but I failed.
 
maybe you are not interested to this subject but I kindly ask you to say your idea in list.
please tell clearly that you are agree/disagree for add this function in contribution or not.
 
 
I am thankful in advance.
 


With Regards,
--taghi


Do you Yahoo!?
Yahoo! Search presents - Jib Jab's 'Second Term'
Attachment

Re: using strxfrm for having multi locale/please vote for

From
Bruce Momjian
Date:
This has been saved for the 8.1 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches2

---------------------------------------------------------------------------

Mahmoud Taghizadeh wrote:
> there was a discussion in postgresql mailing list about using strxfrm function to add support multi locale.
> most of developers agree that this method is a correct but not perfect solution for having multiple locale in
postgresqland some people suggest to add such functions in contribution. 
>
> the main problem of such functions is the overhead of setlocale. (that can be ommited when you run postgresql in a OS
withadvanced GLIBC and strxfrm_l function) 
>
> I summerized discussion and add one implementation of such functions, I try to convince tom lane to add this function
tothe contribution but I failed. 
>
> maybe you are not interested to this subject but I kindly ask you to say your idea in list.
> please tell clearly that you are agree/disagree for add this function in contribution or not.
>
>
> I am thankful in advance.
>
>
>
>
> With Regards,
> --taghi
>
> ---------------------------------
> Do you Yahoo!?
>  Yahoo! Search presents - Jib Jab's 'Second Term'

Content-Description: nls_sort.tgz

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: using strxfrm for having multi locale/please vote for

From
Bruce Momjian
Date:
I think we have concluded that the use of the ICU library is the way we
are going to accomplish multi-locale support in the future.

---------------------------------------------------------------------------

Mahmoud Taghizadeh wrote:
> there was a discussion in postgresql mailing list about using strxfrm function to add support multi locale.
> most of developers agree that this method is a correct but not perfect solution for having multiple locale in
postgresqland some people suggest to add such functions in contribution. 
>
> the main problem of such functions is the overhead of setlocale. (that can be ommited when you run postgresql in a OS
withadvanced GLIBC and strxfrm_l function) 
>
> I summerized discussion and add one implementation of such functions, I try to convince tom lane to add this function
tothe contribution but I failed. 
>
> maybe you are not interested to this subject but I kindly ask you to say your idea in list.
> please tell clearly that you are agree/disagree for add this function in contribution or not.
>
>
> I am thankful in advance.
>
>
>
>
> With Regards,
> --taghi
>
> ---------------------------------
> Do you Yahoo!?
>  Yahoo! Search presents - Jib Jab's 'Second Term'

Content-Description: nls_sort.tgz

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> I think we have concluded that the use of the ICU library is the way we
> are going to accomplish multi-locale support in the future.

You did? It really seemed like there was one crowd pushing ICU and hardly
anyone else interested in piling a huge library dependency on Postgres. It
seemed like ICU was only really necessary if you wanted some esoteric
functionality that wasn't entirely explained.

I'm having no trouble handling multi-locale already using the strxfrm
implementation that was posted and refined by several people on the mailing
list.

Yes it's true that on some OSes it wouldn't be tolerably efficient but on
glibc it's more than tolerable. If better solutions (strxfrm_l) become
available at some point in the future then it would be about as efficient as
it could be on platforms where those features are available.

--
greg

Re: using strxfrm for having multi locale/please vote for

From
Bruce Momjian
Date:
Greg Stark wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
> > I think we have concluded that the use of the ICU library is the way we
> > are going to accomplish multi-locale support in the future.
>
> You did? It really seemed like there was one crowd pushing ICU and hardly
> anyone else interested in piling a huge library dependency on Postgres. It
> seemed like ICU was only really necessary if you wanted some esoteric
> functionality that wasn't entirely explained.

I thought we were willing to require the library for multi-locale
builds.

> I'm having no trouble handling multi-locale already using the strxfrm
> implementation that was posted and refined by several people on the mailing
> list.
>
> Yes it's true that on some OSes it wouldn't be tolerably efficient but on
> glibc it's more than tolerable. If better solutions (strxfrm_l) become
> available at some point in the future then it would be about as efficient as
> it could be on platforms where those features are available.

There are some things I think ICU can fix for us like indexing non-C
localed columns.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: using strxfrm for having multi locale/please vote for

From
Alvaro Herrera
Date:
On Mon, Jun 06, 2005 at 10:11:15PM -0400, Bruce Momjian wrote:
> Greg Stark wrote:

> > Yes it's true that on some OSes it wouldn't be tolerably efficient but on
> > glibc it's more than tolerable. If better solutions (strxfrm_l) become
> > available at some point in the future then it would be about as efficient as
> > it could be on platforms where those features are available.
>
> There are some things I think ICU can fix for us like indexing non-C
> localed columns.

Huh, we already do that, don't we?

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."

Re: using strxfrm for having multi locale/please vote for

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> On Mon, Jun 06, 2005 at 10:11:15PM -0400, Bruce Momjian wrote:
> > Greg Stark wrote:
>
> > > Yes it's true that on some OSes it wouldn't be tolerably efficient but on
> > > glibc it's more than tolerable. If better solutions (strxfrm_l) become
> > > available at some point in the future then it would be about as efficient as
> > > it could be on platforms where those features are available.
> >
> > There are some things I think ICU can fix for us like indexing non-C
> > localed columns.
>
> Huh, we already do that, don't we?

Sorry, I meant LIKE index usage for non-C columns.  We can do that now
with a special LIKE indexing method, but this would allow normal indexes
to work.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: using strxfrm for having multi locale/please vote for

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Alvaro Herrera wrote:
>>> There are some things I think ICU can fix for us like indexing non-C
>>> localed columns.
>>
>> Huh, we already do that, don't we?

> Sorry, I meant LIKE index usage for non-C columns.  We can do that now
> with a special LIKE indexing method, but this would allow normal indexes
> to work.

Sounds like pie in the sky to me.  Exactly how do you think that ICU
will magically mask the fundamental semantic inconsistency?

            regards, tom lane

Re: using strxfrm for having multi locale/please vote for

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Alvaro Herrera wrote:
> >>> There are some things I think ICU can fix for us like indexing non-C
> >>> localed columns.
> >>
> >> Huh, we already do that, don't we?
>
> > Sorry, I meant LIKE index usage for non-C columns.  We can do that now
> > with a special LIKE indexing method, but this would allow normal indexes
> > to work.
>
> Sounds like pie in the sky to me.  Exactly how do you think that ICU
> will magically mask the fundamental semantic inconsistency?

I am hoping ICU will allow us to see the next greatest value for that
character.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: using strxfrm for having multi locale/please vote for

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> > Sounds like pie in the sky to me.  Exactly how do you think that
> > ICU will magically mask the fundamental semantic inconsistency?
>
> I am hoping ICU will allow us to see the next greatest value for that
> character.

As Tom says, it's a semantic inconsistency, not a lack of features.
Collation (sorting of strings) takes the entire string into account,
pattern matching compares character by character.  For example, some
collations compare strings from back to front, whereas a pattern
matching expression could never make sense of that.  The SQL standard
actually does not draw that distinction, but, well, it's broken.

Using separate operator classes for separate semantic interpretations of
data seems to be exactly the right solution.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/