Thread: BUG #1972: index error with space character

BUG #1972: index error with space character

From
"Eduardo Soares"
Date:
The following bug has been logged online:

Bug reference:      1972
Logged by:          Eduardo Soares
Email address:      edurbs@gmail.com
PostgreSQL version: 8.0.3
Operating system:   Linux Fedora 4
Description:        index error with space character
Details:

In above example the "AZTES Z" should be after the "AZTESA". It happens with
any encoding type. The DB not see the space character. The index shoulb see
the space and put "AZTES Z" together with "AZTES". In above list the
"AZTESA" should be the first.

Thanks for the help.

table=# insert into edu values ('AZTES Z');
INSERT 133634 1
table=# insert into edu values ('AZTESA');
INSERT 133635 1
table=# SELECT * FROM EDU ORDER BY NOME DESC;
     nome
---------------
 AZTES Z
 AZTESA
 AZTES
 ÃNTES
 ANTES
(8 registros)

Re: BUG #1972: index error with space character

From
Richard Huxton
Date:
Eduardo Soares wrote:
> Operating system:   Linux Fedora 4
> Description:        index error with space character
> Details:
>
> In above example the "AZTES Z" should be after the "AZTESA". It happens with
> any encoding type. The DB not see the space character. The index shoulb see
> the space and put "AZTES Z" together with "AZTES". In above list the
> "AZTESA" should be the first.

Sorting order is determined by your locale, and is different from your
encoding. For example, en_GB ignores spaces but C doesn't:

$ LC_COLLATE=en_GB.UTF-8 sort unsorted.txt
aa a
aaaa
aaab
aa b

$ LC_COLLATE=C sort unsorted.txt
aa a
aa b
aaaa
aaab

See "man locale" for details on how to find out what locales are setup
on your machine. See the documentation for details on how to set locale
on a database cluster.

HTH
--
   Richard Huxton
   Archonet Ltd

Fwd: BUG #1972: index error with space character

From
Eduardo RBS
Date:
---------- Forwarded message ----------
From: Eduardo RBS <edurbs@gmail.com>
Date: 18/10/2005 13:31
Subject: Re: [BUGS] BUG #1972: index error with space character
To: Richard Huxton <dev@archonet.com>


Hellow..
Thank you very much for the attention.
I need a locate that not ignores the space chracater and also sort
accents like =E1 or =E3.
I made configuration using locale... see it..

Using C is almost good because it sort correctly and see the spaces..
but it does not sort the portuguese accents.. note the last two
lines..
$ LC_COLLATE=3DC sort b.txt
aa a
aa z
aaaa
aaaz
aaz
eado
eza
=E9dina
=E9master

and with the "my" locale pt_BR it sort correctly the accents but
ignores the space chracater....
LC_COLLATE=3Dpt_BR.utf8 sort b.txt
aa a
aaaa
aaab
aaaz
aaz
aa z
eado
=E9dina
=E9master
eza

What i need should a merge of C and pt_BR.. i mean.. a locale that see
the spaces like C but sort accents like pt_BR..
I tried several others locales.. and only C see the space character.

Thanks for the attention.

--
[]'s
Eduardo RBS
http://linuxstok.sourceforge.net


2005/10/18, Richard Huxton <dev@archonet.com>:
> Eduardo Soares wrote:
> > Operating system:   Linux Fedora 4
> > Description:        index error with space character
> > Details:
> >
> > In above example the "AZTES Z" should be after the "AZTESA". It happens=
 with
> > any encoding type. The DB not see the space character. The index shoulb=
 see
> > the space and put "AZTES Z" together with "AZTES". In above list the
> > "AZTESA" should be the first.
>
> Sorting order is determined by your locale, and is different from your
> encoding. For example, en_GB ignores spaces but C doesn't:
>
> $ LC_COLLATE=3Den_GB.UTF-8 sort unsorted.txt
> aa a
> aaaa
> aaab
> aa b
>
> $ LC_COLLATE=3DC sort unsorted.txt
> aa a
> aa b
> aaaa
> aaab
>
> See "man locale" for details on how to find out what locales are setup
> on your machine. See the documentation for details on how to set locale
> on a database cluster.
>
> HTH
> --
>    Richard Huxton
>    Archonet Ltd
>


--
[]'s
Eduardo RBS
http://linuxstok.sourceforge.net

Re: Fwd: BUG #1972: index error with space character

From
Tom Lane
Date:
Eduardo RBS <edurbs@gmail.com> writes:
> I need a locate that not ignores the space chracater and also sort
> accents like á or ã.

I'm afraid you'll have to learn how to build your own locale definition.
AFAIK, the "C" locale is the *only* common locale in which spaces aren't
second-class citizens.

I know that it is possible to write your own locale definition, but
this is not the place to ask about how.  Try a glibc support forum.

            regards, tom lane