Re: An idea on faster CHAR field indexing - Mailing list pgsql-hackers

From Randall Parker
Subject Re: An idea on faster CHAR field indexing
Date
Msg-id 01425928136378@mail.nls.net
Whole thread Raw
In response to An idea on faster CHAR field indexing  ("Randall Parker" <randall@nls.net>)
Responses Re: An idea on faster CHAR field indexing  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: An idea on faster CHAR field indexing  (Giles Lean <giles@nemeton.com.au>)
List pgsql-hackers
Giles,

On Thu, 22 Jun 2000 11:12:54 +1000, Giles Lean wrote:

>Yes.  Some locales want strings to be ordered first by ignoring any
>accents on chracters, then using a tie-break on equal strings by doing
>a comparison that includes the accents.

I guess I don't see how this is really any different. Why order first by the character and second by the accent? For
instance,
 
if you know the relative order of the various forms of "o" then just give them all successive numbers and do a single
pass
 
sort. You just have to make sure that all the numbers in that set of numbers are greater than the number you assign to
"m"
 
and less than the number you assign to "p".

>To take another of your points out of order: this is an obstacle that
>Unicode doesn't resolve.  Unicode gives you a character set capable of
>representing characters from many different locales, but collation
>order will remain locale specific.

With Unicode you have to have a collation order that cuts across what use to be separate character sets in separate
code
 
pages. 

>... but due to the increased memory/disk space, this is likely not an
>optimisation.  Measurements needed, I'd suggest.

But why is there increased memory and disk space? Do the fields that go into an index not now already get stored twice?

Does the index just contain a series of references to records and that is it? 







pgsql-hackers by date:

Previous
From: "Mikheev, Vadim"
Date:
Subject: RE: Big 7.1 open items
Next
From: "Randall Parker"
Date:
Subject: Re: Thoughts on multiple simultaneous code page support