Thread: BUG #12458: Comparison with CHAR is inconsistent between string types
BUG #12458: Comparison with CHAR is inconsistent between string types
From
tneumann@users.sourceforge.net
Date:
The following bug has been logged on the website: Bug reference: 12458 Logged by: Thomas Neumann Email address: tneumann@users.sourceforge.net PostgreSQL version: 9.4.0 Operating system: Linux 3.16 Description: The following two queries return different results: select 'A'::char(1)='A '::text; select 'A'::char(1)='A '::varchar(10); I would expect both queries to return "true", and this is also suggested by the documentation (section 8.3.), but the text comparison returns false.
On Thu, Jan 8, 2015 at 6:39 AM, <tneumann@users.sourceforge.net> wrote: > The following two queries return different results: > > select 'A'::char(1)='A '::text; > select 'A'::char(1)='A '::varchar(10); > > I would expect both queries to return "true", and this is also suggested by > the documentation (section 8.3.), but the text comparison returns false. > Quick inspection of pg_operator suggests there is no '=' operator that takes varchar as either of the operands. Not sure why that is so. It appears the comparison proceeds using =(char, char) operator with the varchar operand appropriately coerced. Hence the result. Thanks, Amit
Amit Langote <amitlangote09@gmail.com> writes: > On Thu, Jan 8, 2015 at 6:39 AM, <tneumann@users.sourceforge.net> wrote: >> The following two queries return different results: >> >> select 'A'::char(1)='A '::text; >> select 'A'::char(1)='A '::varchar(10); >> >> I would expect both queries to return "true", and this is also suggested by >> the documentation (section 8.3.), but the text comparison returns false. I see nothing in 8.3 addressing the question of which type has precedence for cross-type comparisons. > Quick inspection of pg_operator suggests there is no '=' operator that > takes varchar as either of the operands. Not sure why that is so. It > appears the comparison proceeds using =(char, char) operator with the > varchar operand appropriately coerced. Hence the result. The available operators are char=char and text=text (varchar has no operators of its own, it just uses text's operators). For an input of the form char=text, the text=text operator wins on the grounds of text being a preferred type; see http://www.postgresql.org/docs/9.4/static/typeconv-oper.html rule 3d. So we coerce char to text and compare using text's rules, in which trailing spaces are significant. But if the input is char=varchar, only the char=char operator has any exact matches, so it wins at rule 3c. Then we'll coerce varchar to char and compare using char's rules, in which trailing spaces are not significant. So it's behaving as expected and documented. Whether this behavior is desirable is a different question of course. I read the SQL spec as requiring us to use PAD SPACE semantics for char=varchar comparisons, so that case is forced. char=text is outside the bounds of the spec (since text isn't in the spec) so we don't have to do it the same way; and we choose not to because text's semantics are generally more natural. regards, tom lane
>>> select 'A'::char(1)='A '::text; >>> select 'A'::char(1)='A '::varchar(10); >>> >>> I would expect both queries to return "true", and this is also suggested by >>> the documentation (section 8.3.), but the text comparison returns false. > > I see nothing in 8.3 addressing the question of which type has precedence > for cross-type comparisons. you are right, I did not read 8.3 careful enough. It says that "trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character", but nothing about mixed types. The SQL standard 8,2 <comparison predicate>, Rule 3, b) says that "" If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>. "" which would seem to suggest that spaces have to be ignore when comparing CHAR with anything else (as the other string type is free to choose any pad character, but CHAR will always pad with spaces as needed). But of course you can deviate from that if you wish. I just found it surprising that text behaved different from varchar, which can lead to very subtle bugs. Thomas