Thread: Bug in ILIKE?
Good: regression=# select 'T' ilike 't';?column? ----------t (1 row) Not so good: regression=# select 'T' ilike E'\\t';?column? ----------f (1 row) ISTM backslash is only supposed to turn off the pattern-language specialness of characters, not render them case sensitive. The reason this happens is that the backslash case in MatchText() checks for exact equality. I think it should be checking for TCHAR() equality, same as when it is just checking two ordinary characters. Comments? Is this a backpatchable bug fix, or should we only change the behavior for 8.4 and beyond? regards, tom lane
Tom Lane wrote: > Good: > > regression=# select 'T' ilike 't'; > ?column? > ---------- > t > (1 row) > > Not so good: > > regression=# select 'T' ilike E'\\t'; > ?column? > ---------- > f > (1 row) > > ISTM backslash is only supposed to turn off the pattern-language > specialness of characters, not render them case sensitive. The reason > this happens is that the backslash case in MatchText() checks for exact > equality. I think it should be checking for TCHAR() equality, same as > when it is just checking two ordinary characters. Comments? Is this a > backpatchable bug fix, or should we only change the behavior for 8.4 and > beyond? > > Hmm. A quick test shows this working the other way in 8.2, so I assume this is a byproduct of the work we did to improve the efficiency of LIKE/ILIKE in 8.3 ;-( The docs actually don't state what are the semantics of escape followed by something that is not escape or a metachar. Does the spec say anything about that? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > The docs actually don't state what are the semantics of escape followed > by something that is not escape or a metachar. Does the spec say > anything about that? The spec says it's an error, per the SQL92 excerpt I quoted in the previous thread. (SQL99 says about the same with more notation; I didn't bother looking in the later specs.) I find that position too restrictive, mainly because of this consideration: suppose some future version of the spec invents additional metacharacters. To be concrete, suppose ? means something special in SQL2010. Now how do you make a pattern that works in both older and newer servers? \? means literal ? to the newer server, but if it throws an error on the older, you're stuck. So I'm for the definition that escape-anything means exactly anything, without any special treatment that it would otherwise have. And in the case of ILIKE it seems like "no special treatment" should mean "case insensitive match". regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> The docs actually don't state what are the semantics of escape followed >> by something that is not escape or a metachar. Does the spec say >> anything about that? >> > > The spec says it's an error, per the SQL92 excerpt I quoted in the > previous thread. (SQL99 says about the same with more notation; > I didn't bother looking in the later specs.) > > I find that position too restrictive, mainly because of this > consideration: suppose some future version of the spec invents > additional metacharacters. To be concrete, suppose ? means > something special in SQL2010. Now how do you make a pattern that > works in both older and newer servers? \? means literal ? to the > newer server, but if it throws an error on the older, you're stuck. > > So I'm for the definition that escape-anything means exactly anything, > without any special treatment that it would otherwise have. And in > the case of ILIKE it seems like "no special treatment" should mean > "case insensitive match". > Well, it looks to me to be pretty easily fixable. Given the above it seems to me marginal to call it a bug, though. I don't have very strong feelings, but I am generally opposed to changing the visible behaviour in stable releases unless something is clearly a bug. cheers andrew > regards, tom lane > >
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> So I'm for the definition that escape-anything means exactly anything, >> without any special treatment that it would otherwise have. And in >> the case of ILIKE it seems like "no special treatment" should mean >> "case insensitive match". > Well, it looks to me to be pretty easily fixable. Given the above it > seems to me marginal to call it a bug, though. I don't have very strong > feelings, but I am generally opposed to changing the visible behaviour > in stable releases unless something is clearly a bug. Well, there are two possible interpretations: (1) it's a bug, or (2) it's an intentional, about-to-be-documented feature that backslashing a letter makes it case-sensitive in ILIKE. I do not care for interpretation #2 ... especially in view of your observation (confirmed here) that pre-8.3 releases didn't do this. I think it's just a bug in 8.3. regards, tom lane
On Sep 25, 2008, at 20:45, Tom Lane wrote: > Well, there are two possible interpretations: (1) it's a bug, or (2) > it's an intentional, about-to-be-documented feature that > backslashing a > letter makes it case-sensitive in ILIKE. > > I do not care for interpretation #2 ... especially in view of your > observation (confirmed here) that pre-8.3 releases didn't do this. > I think it's just a bug in 8.3. +1 I think it would be very difficult to come up with a justification for backslashes making a comparison case-sensitive. It's just…weird. Best, David
Tom Lane wrote: > I think it's just a bug in 8.3. > > > Well, here's a patch that I think fixes it. If you're happy I'll apply it to HEAD and 8.3. cheers andrew ? .deps Index: like_match.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/like_match.c,v retrieving revision 1.21 diff -c -r1.21 like_match.c *** like_match.c 1 Mar 2008 03:26:34 -0000 1.21 --- like_match.c 26 Sep 2008 22:38:26 -0000 *************** *** 96,105 **** { if (*p == '\\') { ! /* Next byte must match literally, whatever it is */ NextByte(p, plen); ! if ((plen <= 0) || *p != *t) return LIKE_FALSE; } else if (*p == '%') { --- 96,108 ---- { if (*p == '\\') { ! /* Next char must match literally, whatever it is */ NextByte(p, plen); ! if ((plen <= 0) || TCHAR(*p) != TCHAR(*t)) return LIKE_FALSE; + NextChar(t,tlen); + NextChar(p,plen); + continue; } else if (*p == '%') {
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> I think it's just a bug in 8.3. > Well, here's a patch that I think fixes it. If you're happy I'll apply > it to HEAD and 8.3. That patch isn't gonna apply to HEAD ;-). Also the introduction of NextChar is simply broken, as it will skip additional bytes of a multibyte char without having compared 'em. All you need AFAICS is to put TCHAR()s into the *p != *t comparison, so that it matches the case for ordinary characters. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> I think it's just a bug in 8.3. >>> > > >> Well, here's a patch that I think fixes it. If you're happy I'll apply >> it to HEAD and 8.3. >> > > That patch isn't gonna apply to HEAD ;-). Also the introduction of > NextChar is simply broken, as it will skip additional bytes of a > multibyte char without having compared 'em. All you need AFAICS is > to put TCHAR()s into the *p != *t comparison, so that it matches the > case for ordinary characters. > > > I'll have another look. What happens in that case though if you have escape+x where x is a multibyte char? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I'll have another look. What happens in that case though if you have > escape+x where x is a multibyte char? TCHAR()'s just a no-op in multibyte encodings --- we handle ILIKE using a preliminary downcasing pass in that case. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I'll have another look. What happens in that case though if you have >> escape+x where x is a multibyte char? >> > > TCHAR()'s just a no-op in multibyte encodings --- we handle ILIKE > using a preliminary downcasing pass in that case. > > > Ah. Of course. Will fix. cheers andrew