Thread: Issue on Varchar Ordering
Pg-Hackers,<br /><br />I'm having the following issue:<br /><br />create table aluno (id serial, nome varchar(60));<br /><br/>rodrigo=# select * from aluno order by nome offset 35 limit 5;<br /> id | nome<br />----+-----------------------------------<br /> 36 | GABRIELA HELEDA DE SOUZA<br /> 37 | GABRIELA JACOBY NOS<br /> 38 |GABRIEL ALCIDES KLIM PERONDI<br /> 39 | GABRIELA LETICIA BATISTA NUNES<br /> 40 | GABRIEL ALEXANDRE DA SILVA MANICA<br />(5registros)<br /><br />The problem is that records with id 38 and 40 might come before the others. It seems that the spacingisn't considered on ordering. <br />The encoding (LATIN1) is correct according to Brazilian Portuguese, and the settingsare listed below.<br /><br />rodrigo=# select name, setting from pg_settings where name ~ 'encoding|^lc_|version';<br/> name | setting <br />-----------------+---------<br /> client_encoding | LATIN1<br/> lc_collate | pt_BR<br /> lc_ctype | pt_BR<br /> lc_messages | pt_BR<br /> lc_monetary | pt_BR<br/> lc_numeric | pt_BR<br /> lc_time | pt_BR <br /> server_encoding | LATIN1<br /> server_version |8.1.3<br />(9 registros)<br clear="all" /><br />The server is a Debian 3.1 GNU/Linux with locales pt_BR.<br /><br />Whatcould be wrong? Thanks in advance.<br /><br />-- <br />Regards,<br /><br />Rodrigo Hjort<br />GTI - Projeto PostgreSQL<br/>CELEPAR - Cia de Informática do Paraná<br /><a href="http://www.pr.gov.br">http://www.pr.gov.br</a><br /><br/>
I don't think PostgreSQL's sorting it wrong... here's the output of Linux's sort utility: edb82@linux:~> cat test.txt GABRIEL ALEXANDRE DA SILVA MANICA GABRIELA LETICIA BATISTA NUNES GABRIEL ALCIDES KLIM PERONDI GABRIELA JACOBY NOS GABRIELA HELEDA DE SOUZA edb82@linux:~> cat test.txt | sort GABRIELA HELEDA DE SOUZA GABRIELA JACOBY NOS GABRIEL ALCIDES KLIM PERONDI GABRIELA LETICIA BATISTA NUNES GABRIEL ALEXANDRE DA SILVA MANICA edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgresid | nome ----+-----------------------------------36 | GABRIELA HELEDA DE SOUZA37 | GABRIELA JACOBY NOS38 | GABRIEL ALCIDES KLIM PERONDI39| GABRIELA LETICIA BATISTA NUNES40 | GABRIEL ALEXANDRE DA SILVA MANICA (5 rows) -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324
Yeah, I think there's a problem on Linux locales.
Using the C locale, it works as expected:
rodrigo@asgard:~$ export LC_ALL=C && cat test.txt | sort
GABRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIELA LETICIA BATISTA NUNES
But when using Brazilian Portuguese, it gives:
rodrigo@asgard:~$ export LC_ALL=pt_BR && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
rodrigo@asgard:~$ export LC_ALL=pt_BR.UTF-8 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
rodrigo@asgard:~$ export LC_ALL=pt_BR.ISO-8859-1 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
The fact is that I can't raise PostgreSQL with C encoding, as special characters are used on most tables.
Did anyone ever had the same problem on Linux?
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br
Using the C locale, it works as expected:
rodrigo@asgard:~$ export LC_ALL=C && cat test.txt | sort
GABRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIELA LETICIA BATISTA NUNES
But when using Brazilian Portuguese, it gives:
rodrigo@asgard:~$ export LC_ALL=pt_BR && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
rodrigo@asgard:~$ export LC_ALL=pt_BR.UTF-8 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
rodrigo@asgard:~$ export LC_ALL=pt_BR.ISO-8859-1 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
The fact is that I can't raise PostgreSQL with C encoding, as special characters are used on most tables.
Did anyone ever had the same problem on Linux?
2006/3/29, Jonah H. Harris <jonah.harris@gmail.com>:
I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:
edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZA
edb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgres
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 rows)
--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes: > The fact is that I can't raise PostgreSQL with C encoding, as special > characters are used on most tables. You're confusing locale and encoding, which are two different (though interrelated) things. I suspect the right choice for you will be to use C locale with UTF8 encoding, or possibly one of the LATINn encodings. regards, tom lane
Thanks all. I'm gonna try other locales and encodings.
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br
2006/3/29, Tom Lane <tgl@sss.pgh.pa.us>:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> The fact is that I can't raise PostgreSQL with C encoding, as special
> characters are used on most tables.
You're confusing locale and encoding, which are two different (though
interrelated) things.
I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.
regards, tom lane
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br