Re: [HACKERS] indexable and locale - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] indexable and locale
Date
Msg-id 199911300152.UAA20942@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] indexable and locale  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Here is Tom's comment on the patch.

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> Attached is a patch to the old problem discussed feverly before 6.5.
> 
> > ... I think your pacthes break
> > non-ascii multi-byte character sets data and should be surrounded by
> > #ifdef LOCALE rather than replacing current codes surrounded by
> > #ifndef LOCALE.
> 
> I am worried about this patch too.  Under MULTIBYTE could it
> generate invalid characters?  Also, do all non-ASCII locales sort
> codes 0-126 in the same order as ASCII?  I didn't think they do,
> but I'm not an expert.
> 
> The approach I was considering for fixing the problem was to use a
> loop that would repeatedly try to generate a string greater than the
> prefix string.  The basic loop step would increment the rightmost
> byte as Goran has done (or, if it's already up to the limit, chop
> it off and increment the next character position).  Then test to
> see whether the '<' operator actually believes the result is
> greater than the given prefix, and repeat if not.  This avoids making
> any strong assumptions about the sort order of different character
> codes.  However, there are two significant issues that would have
> to be surmounted to make it work reliably:
> 
> 1. In MULTIBYTE mode incrementing the rightmost byte might yield
> an illegal multibyte character.  Some way to prevent or detect this
> would be needed, lest it confuse the comparison operator.  I think
> we have some multibyte routines that could be used to check for
> a valid result, but I haven't looked into it.
> 
> 2. I think there are some locales out there that have context-
> sensitive sorting rules, ie, a given character string may sort
> differently than you'd expect from considering the characters in
> isolation.  For example, in German isn't "ss" treated specially?
> If "pqrss" does not sort between "pqrs" and "pqrt" then the entire
> premise of *both* sides of the LIKE optimization falls apart,
> because you can't be sure what will happen when comparing a prefix
> string like "pqrs" against longer strings from the database.
> I do not know if this is really a problem, nor what we could do
> to avoid it if it is.
> 
>             regards, tom lane
> 
> ************
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: indexable and locale
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] sort on huge table