Thread: case insensitive regex clause with some latin1 characters fails
Hi, I'm not sure if this is a bug or if I'm doing something wrong. I have a database encoded with ISO-8859-1, aka LATIN1. When I do something like: SELECT 'Ä' ~* 'ä'; it returns false. If i do: SELECT 'A' ~* 'a'; I get true. According to specification, both should return true. Anyone knows what the problem might be? /Ragnar
"Ragnar Österlund" <ragoster@gmail.com> writes: > I'm not sure if this is a bug or if I'm doing something wrong. I have > a database encoded with ISO-8859-1, aka LATIN1. When I do something > like: > SELECT '�' ~* '�'; > it returns false. Check the database's locale setting (LC_CTYPE). It has to be one that expects LATIN1 encoding. The current regex code is generally not able to deal with locale-specific behaviors in UTF8 encoding, but it should work for single-byte encodings as long as you've got the locale setting right. regards, tom lane
My environment setup as: show lc_ctype; lc_ctype ------------- fr_CA.UTF-8 (1 row) fis=> SELECT 'Ä' ~* 'ä'; ?column? ---------- f (1 row) fis=> SELECT 'Ä' ilike 'ä'; ?column? ---------- f (1 row) I got the same result: false > "Ragnar Österlund" <ragoster@gmail.com> writes: >> I'm not sure if this is a bug or if I'm doing something wrong. I have >> a database encoded with ISO-8859-1, aka LATIN1. When I do something >> like: > >> SELECT 'Ä' ~* 'ä'; > >> it returns false. > > Check the database's locale setting (LC_CTYPE). It has to be one that > expects LATIN1 encoding. > > The current regex code is generally not able to deal with locale-specific > behaviors in UTF8 encoding, but it should work for single-byte encodings > as long as you've got the locale setting right. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings