Thread: Issue on Varchar Ordering

Issue on Varchar Ordering

From
"Rodrigo Hjort"
Date:
Pg-Hackers,<br /><br />I'm having the following issue:<br /><br />create table aluno (id serial, nome varchar(60));<br
/><br/>rodrigo=# select * from aluno order by nome offset 35 limit 5;<br /> id |               nome<br
/>----+-----------------------------------<br /> 36 | GABRIELA HELEDA DE SOUZA<br /> 37 | GABRIELA JACOBY NOS<br /> 38
|GABRIEL ALCIDES KLIM PERONDI<br /> 39 | GABRIELA LETICIA BATISTA NUNES<br /> 40 | GABRIEL ALEXANDRE DA SILVA MANICA<br
/>(5registros)<br /><br />The problem is that records with id 38 and 40 might come before the others. It seems that the
spacingisn't considered on ordering. <br />The encoding (LATIN1) is correct according to Brazilian Portuguese, and the
settingsare listed below.<br /><br />rodrigo=# select name, setting from pg_settings where name ~
'encoding|^lc_|version';<br/>      name       | setting <br />-----------------+---------<br /> client_encoding |
LATIN1<br/> lc_collate      | pt_BR<br /> lc_ctype        | pt_BR<br /> lc_messages     | pt_BR<br /> lc_monetary     |
pt_BR<br/> lc_numeric      | pt_BR<br /> lc_time         | pt_BR <br /> server_encoding | LATIN1<br /> server_version 
|8.1.3<br />(9 registros)<br clear="all" /><br />The server is a Debian 3.1 GNU/Linux with locales pt_BR.<br /><br
/>Whatcould be wrong? Thanks in advance.<br /><br />-- <br />Regards,<br /><br />Rodrigo Hjort<br />GTI - Projeto
PostgreSQL<br/>CELEPAR - Cia de Informática do Paraná<br /><a href="http://www.pr.gov.br">http://www.pr.gov.br</a><br
/><br/> 

Re: Issue on Varchar Ordering

From
"Jonah H. Harris"
Date:
I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:

edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZA

edb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgresid |               nome
----+-----------------------------------36 | GABRIELA HELEDA DE SOUZA37 | GABRIELA JACOBY NOS38 | GABRIEL ALCIDES KLIM
PERONDI39| GABRIELA LETICIA BATISTA NUNES40 | GABRIEL ALEXANDRE DA SILVA MANICA 
(5 rows)


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


Re: Issue on Varchar Ordering

From
"Rodrigo Hjort"
Date:
Yeah, I think there's a problem on Linux locales.

Using the C locale, it works as expected:

rodrigo@asgard:~$ export LC_ALL=C && cat test.txt | sort
GABRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIELA LETICIA BATISTA NUNES

But when using Brazilian Portuguese, it gives:

rodrigo@asgard:~$ export LC_ALL=pt_BR && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

rodrigo@asgard:~$ export LC_ALL=pt_BR.UTF-8 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

rodrigo@asgard:~$ export LC_ALL=pt_BR.ISO-8859-1 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

The fact is that I can't raise PostgreSQL with C encoding, as special characters are used on most tables.

Did anyone ever had the same problem on Linux?


2006/3/29, Jonah H. Harris <jonah.harris@gmail.com>:
I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:

edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZA

edb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgres
id |               nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 rows)

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


--
Regards,

Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br

Re: Issue on Varchar Ordering

From
Tom Lane
Date:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> The fact is that I can't raise PostgreSQL with C encoding, as special
> characters are used on most tables.

You're confusing locale and encoding, which are two different (though
interrelated) things.

I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.
        regards, tom lane


Re: Issue on Varchar Ordering

From
"Rodrigo Hjort"
Date:
Thanks all. I'm gonna try other locales and encodings.


2006/3/29, Tom Lane <tgl@sss.pgh.pa.us>:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> The fact is that I can't raise PostgreSQL with C encoding, as special
> characters are used on most tables.

You're confusing locale and encoding, which are two different (though
interrelated) things.

I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.

                        regards, tom lane


--
Regards,

Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br