Re: problem with query - Mailing list pgsql-general

From Giuseppe Broccolo
Subject Re: problem with query
Date
Msg-id 52332613.2020904@2ndquadrant.it
Whole thread Raw
In response to problem with query  (Roberto Scattini <roberto.scattini@gmail.com>)
Responses Re: problem with query  (Roberto Scattini <roberto.scattini@gmail.com>)
List pgsql-general
Il 12/09/2013 22:34, Roberto Scattini ha scritto:
hi, today we discovered that this query doesn't return the expected values:

SELECT DISTINCT
 p.id, p.apellido AS "Apellido", p.nombre AS "Nombre", pf.nombre AS "Funcion", to_char(da.f_ingreso_pg, 'dd/mm/yyyy') AS "Fecha Ingreso PG", e.estado AS "Estado", to_char(pe.fecha, 'dd/mm/yyyy') AS "Fecha Estado"
 FROM personal.personas AS p
  LEFT JOIN personal.personas_da_pg AS da ON p.id=da.id_persona
  LEFT JOIN personal.personas_estado AS pe ON da.estado_id=pe.id
  LEFT JOIN personal.estados AS e ON pe.id_estado=e.id
  LEFT JOIN procu_departamento AS d ON d.id=da.id_departamento
  LEFT JOIN procu_departamento AS dto ON left(d.c_organigrama, 4)||'000'=dto.c_organigrama
  LEFT JOIN procu_departamento AS dir ON left(d.c_organigrama, 3)||'0000'=dir.c_organigrama
  LEFT JOIN procu_departamento AS dg ON left(d.c_organigrama, 2)||'00000'=dg.c_organigrama
  LEFT JOIN personal.funciones AS pf ON pf.id=da.funcion_id
  LEFT JOIN personal.profesiones AS pp ON pp.id=p.id_profesion
 WHERE p.apellido ilike '%nuñez%'
 ORDER BY "Apellido"
The problem is the encoding: SQL_ASCII encodes only the first 128 characters, so 'ñ' and 'Ñ' cannot be encoded in ASCII. If you insert text 'ñ' or 'Ñ' in a table inside a database (e.g. 'ascii_test') with SQL_ASCII encoding, they will be seen as two distinct characters:

ascii_test=# SELECT length(E'ñ');
 length
--------
      2
(1 row)
ascii_test=# SELECT length(E'Ñ');
 length
--------
      2
(1 row)
ascii_test=# SELECT 'ñ'::bytea;
 bytea 
--------
 \xc3b1
(1 row)
ascii_test=# SELECT 'Ñ'::bytea;
 bytea 
--------
 \xc391
(1 row)

Hexadecimals 'c3', 'b1' and '91' does not decode anything in ASCII, even if terminal show pairs as 'ñ' or 'Ñ'. So ILIKE operator cannot apply case insensitive pattern matching to data which does not encode any string (in the SQL_ASCII encoding) and works as a normal LIKE.

Even if the client which insert data has 8-bit encoding (UTF8, where 'ñ' and 'Ñ' are decoded), SQL_ASCII database cannot convert strings anyway, and problem remains.

The best way is to work with a UTF8 encoded database. Is there a particular reason to work with SQL_ASCII encoding?

Giuseppe.
-- 
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe.broccolo@2ndQuadrant.it | www.2ndQuadrant.it

pgsql-general by date:

Previous
From: Chris Travers
Date:
Subject: Re: Best way to populate nested composite type from JSON`
Next
From: Roberto Scattini
Date:
Subject: Re: problem with query