Thread: BUG #4999: select 'a' < 'A' is true, but should be false . . .
The following bug has been logged online: Bug reference: 4999 Logged by: Brian Ceccarelli Email address: ceccareb@talussoftware.com PostgreSQL version: 8.4.0 to 8.2.13 Operating system: Linux and XP Description: select 'a' < 'A' is true, but should be false . . . Details: since the < and > comparison operators seem to be case insensitive: select 'a' < 'Z'; -- true select 'a' < 'z'; -- true select 'A' < 'Z'; -- true select 'A' < 'z'; -- true select 'z' < 'A'; -- false select 'z' < 'a'; -- false select 'Z' < 'A'; -- false select 'Z' < 'a'; -- false Any case A is < any case Z implies case-insensitive compare. Which would imply that 'a' = 'A', but 'a' < 'A' is true. - - - The operator equals is case sensitive. The operator < and > are case-insensitive. This is not consistent. Most of the time, operator < and > are case-insenstive, until you compare the upper and lower cases of the same letter.
"Brian Ceccarelli" <ceccareb@talussoftware.com> writes: > since the < and > comparison operators seem to be case insensitive: They are not, unless you have managed to find a case-insensitive locale somewhere. In any case we would not think that 'a' = 'A'. regards, tom lane
On tor, 2009-08-20 at 20:24 +0000, Brian Ceccarelli wrote: > since the < and > comparison operators seem to be case insensitive: > > select 'a' < 'Z'; -- true > select 'a' < 'z'; -- true > select 'A' < 'Z'; -- true > select 'A' < 'z'; -- true > > select 'z' < 'A'; -- false > select 'z' < 'a'; -- false > select 'Z' < 'A'; -- false > select 'Z' < 'a'; -- false > > Any case A is < any case Z implies case-insensitive compare. Which would > imply that 'a' = 'A', but 'a' < 'A' is true. No, they are not "case insensitive". The way this works is with a multipass comparison algorithm: First, the letters are compared independent of case, then the case is compared. There is also an additional pass for comparing accents, but I forget at the moment which pass that is. Search for Unicode collation algorithm, if you are interested.
Yes. I understand. It has all to do with Unicode collating sequence. I need to somehow remove this bug from the list, since it is not a bug. -----Original Message----- From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Peter Eisentraut Sent: Monday, August 24, 2009 4:54 AM To: Brian Ceccarelli Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #4999: select 'a' < 'A' is true, but should be false . . . On tor, 2009-08-20 at 20:24 +0000, Brian Ceccarelli wrote: > since the < and > comparison operators seem to be case insensitive: > > select 'a' < 'Z'; -- true > select 'a' < 'z'; -- true > select 'A' < 'Z'; -- true > select 'A' < 'z'; -- true > > select 'z' < 'A'; -- false > select 'z' < 'a'; -- false > select 'Z' < 'A'; -- false > select 'Z' < 'a'; -- false > > Any case A is < any case Z implies case-insensitive compare. Which would > imply that 'a' = 'A', but 'a' < 'A' is true. No, they are not "case insensitive". The way this works is with a multipass comparison algorithm: First, the letters are compared independent of case, then the case is compared. There is also an additional pass for comparing accents, but I forget at the moment which pass that is. Search for Unicode collation algorithm, if you are interested. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Aug 24, 2009 at 8:24 AM, Brian Ceccarelli<ceccareb@talusmusic.com> wrote: > Yes. =A0 I understand. =A0 It has all to do with Unicode collating sequen= ce. =A0I > need to somehow remove this bug from the list, since it is not a bug. Heh. As far as I understand, there is no list... one thing that I have been worrying about is the lack of any apparent mechanism to make sure that every bug report gets a response, even if the response is just "this is not a bug" or "you didn't provide enough detail" or "go file this bug against the correct product". A few recent ones that have not (yet): #5003 #4986 #4943 #4942 #4940 #4909 #4904 ...Robert
Robert Haas escribió: > On Mon, Aug 24, 2009 at 8:24 AM, Brian > Ceccarelli<ceccareb@talusmusic.com> wrote: > > Yes. I understand. It has all to do with Unicode collating sequence. I > > need to somehow remove this bug from the list, since it is not a bug. > > Heh. As far as I understand, there is no list... one thing that I > have been worrying about is the lack of any apparent mechanism to make > sure that every bug report gets a response, even if the response is > just "this is not a bug" or "you didn't provide enough detail" or "go > file this bug against the correct product". Bruce used to say proudly that we didn't need a bugtracker because *he* was the bugtracker. Back in those days he had a lot more time to keep on top of his PG mailbox. Today it is seeming like we need something better. Something simple though, because the previous discussions involving bugzilla led nowhere (or actually -- they led us right back where we started). Personally I still think debbugs would suit us perfectly, but 1. I don't have time to handle it, 2. nobody else believes this, 3. the debbugs developers are not very interested in helping us use it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Aug 24, 2009 at 8:03 PM, Alvaro Herrera<alvherre@commandprompt.com> wrote: > Personally I still think debbugs would suit us perfectly, but 1. I don't > have time to handle it, 2. nobody else believes this, 3. the debbugs > developers are not very interested in helping us use it. I've been shouting about debbugs forever too. It's completely email based so we could just treat it as a mailing list without having to go visit a web interface to stay up to date. We could add CVS/whatever hooks so whenever a commit message says it closes a bug it gets closed automatically. Effectively it would require no operational changes for developers who would just participate on a mailing list and commit changes like usual. It also has a web interface so you can go see the history to see the pending bugs to work on and their history of course. But that's not how we're accustomed to working and I fear anything like bugzilla would require dedicated bug-wranglers like Bruce to keep the connection between the email discussion and the bug tracker going because the developers would ignore the web site and the bug reporters would be unaware of any mailing list discussion. -- greg http://mit.edu/~gsstark/resume.pdf
On Mon, Aug 24, 2009 at 3:10 PM, Greg Stark<gsstark@mit.edu> wrote: > On Mon, Aug 24, 2009 at 8:03 PM, Alvaro > Herrera<alvherre@commandprompt.com> wrote: >> Personally I still think debbugs would suit us perfectly, but 1. I don't >> have time to handle it, 2. nobody else believes this, 3. the debbugs >> developers are not very interested in helping us use it. > > I've been shouting about debbugs forever too. > > It's completely email based so we could just treat it as a mailing > list without having to go visit a web interface to stay up to date. We > could add CVS/whatever hooks so whenever a commit message says it > closes a bug it gets closed automatically. =A0Effectively it would > require no operational changes for developers who would just > participate on a mailing list and commit changes like usual. > > It also has a web interface so you can go see the history to see the > pending bugs to work on and their history of course. But that's not > how we're accustomed to working and I fear anything like bugzilla > would require dedicated bug-wranglers like Bruce to keep the > connection between the email discussion and the bug tracker going > because the developers would ignore the web site and the bug reporters > would be unaware of any mailing list discussion. Well, I think we're dropping a lot of the bugs early in the process, before they even get a response. Of course, a lot of the reason for that is because many of the "bugs" are actually usage questions, user error, completely lacking in relevant detail, problems with products other than Postgres, and/or feature requests dressed up as bugs to make us feel guilty about them. It seems to me based on my short tenure reading this mailing list that when someone provides a reproducible test case of Postgres verifiably DTWT it usually attracts plenty of attention and gets dealt with relatively quickly, usually with a friendly "thanks for the report!". There's nothing technological that prevents someone from quickly finding a list of the bugs that haven't gotten a response at all, as evidenced by the fact that I just did it upthread in less than 5 minutes going back to bug ~4900. The details of how to reproduce this are left as an exercise to the student (hint: threaded mail reader). Finding the bugs that were discussed but not resolved is harder, but probably a lot less worthwhile. A lot of those are exchanges of the form: Report: your product has a huge problem that would affect nearly every user Response: uh, we doubt it, because we'd've noticed that. can you provide a test case? <still waiting> I think there's definitely room for some better bug wrangling, but given the number of garbage bugs, the effort/reward ratio is likely to be pretty high. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > ... It seems to me based on my short > tenure reading this mailing list that when someone provides a > reproducible test case of Postgres verifiably DTWT it usually attracts > plenty of attention and gets dealt with relatively quickly, usually > with a friendly "thanks for the report!". > ... > I think there's definitely room for some better bug wrangling, but > given the number of garbage bugs, the effort/reward ratio is likely to > be pretty high. Yeah. I think that bugzilla, and probably also debbugs (though I've got no experience with the latter), are designed for workflows where a bug remains live for some considerable period of time and/or is handed off to different people during its lifespan. That doesn't really describe our handling of bugs. We try to close bugs immediately if at all possible, and if not they end up as items on the TODO list. As Robert says, we could be better about making sure that we provide some kind of response to every bug submission; but I think what's lacking there is the will and the cycles to do so, not so much a tool. regards, tom lane
On mån, 2009-08-24 at 20:10 +0100, Greg Stark wrote: > It's completely email based so we could just treat it as a mailing > list without having to go visit a web interface to stay up to date. We > could add CVS/whatever hooks so whenever a commit message says it > closes a bug it gets closed automatically. Effectively it would > require no operational changes for developers who would just > participate on a mailing list and commit changes like usual. Well, all you'd really need is that if you close a bug, you indicate that say via an email header X-PG-Bugs-Close: 12345 and then spice up the archives display to show that somehow. But the chances of getting people to use that consistently is pretty small. Some kind of simple request tracker where you just forward all bugs that you think need saving would also help. You wouldn't need to copy all the follow-up there; it would only serve as a pointer into the archives. In fact, all you need is CREATE TABLE bugs ( nr int ); and an interface to display that and add to that.
On Mon, Aug 24, 2009 at 10:16 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > > Well, all you'd really need is that if you close a bug, you indicate > that say via an email header > > X-PG-Bugs-Close: 12345 > > and then spice up the archives display to show that somehow. =A0But the > chances of getting people to use that consistently is pretty small. If all it takes is adding one line to an email or cvs commit you were going to write anyways, like in debbugs, I think it would be easy to get buy-in. It doesn't change the workflow at all. Where I think processes like bugzilla and the like fall down is that it requires people to then go visit a web site -- essentially it switches their whole process focus from their mail to the web site and splitting between the two sucks. > Some kind of simple request tracker where you just forward all bugs that > you think need saving would also help. =A0You wouldn't need to copy all > the follow-up there; it would only serve as a pointer into the archives. I can't tell if you're basically describing rewriting debbugs or describing something sucky like using Bruce's mailbox as a bug tracker. We don't just want a pile of references to hundreds of old mail threads that we have to go through later to figure out what's still relevant. It turns out you do need a _few_ bells and whistles. You want to be able to merge tickets, set priorities -- if only to set all the "wishlist" bugs to another category until someone adds them to the todo or decides they're not good todos. We don't want to spend our time wrangling bugs instead of fixing them which is what bugzilla is designed for, but we do want to be able to track the actual bugs. If you look at debian packages most of them have only a half dozen or fewer bugs -- all the postgres 8.4 packages combined currently has two normal and one wishlist bug for example. That's because their workflow is a lot like ours -- when a bug comes in it comes in by email and people are used to responding to emails immediately. When the thread resolves you put a line in your email saying to close the bug or refile it or whatever you need to do. I think it's a conceit to think we always fix bugs immediately. Of course the critical ones get fixed, but we've had plenty of bugs reported that are known "would be nice" bugs that ought to be fixed when we get a round tuit. Many of them have been dropped over time because we just forget about them. Sometimes people raise them again and we end up fixing them sometime, sometimes Tom pulls stuff out of his archives because he gets a bee in his bonnet, sometimes we just hope nobody notices. It would be useful to have a list of them if only so when people report them we can say it's a known issue. --=20 greg http://mit.edu/~gsstark/resume.pdf
On mån, 2009-08-24 at 23:07 +0100, Greg Stark wrote: > I think it's a conceit to think we always fix bugs immediately. I completely agree with that one. The claim that we don't need a bug tracker because most bugs get fixed immediately is bogus because a) it's not true, and b) it doesn't help people *track* bugs.
On Mon, Aug 24, 2009 at 11:38 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > On m=E5n, 2009-08-24 at 23:07 +0100, Greg Stark wrote: >> I think it's a conceit to think we always fix bugs immediately. > > I completely agree with that one. =A0The claim that we don't need a bug > tracker because most bugs get fixed immediately is bogus because a) it's > not true, and b) it doesn't help people *track* bugs. Well to be fair I think when people say that they're thinking of projects like Mozilla which have thousands and thousands of bugs in their bugzilla which keep getting moved from one release target to the next until someone is tasked with closing them all because the code's been rewritten five times and probably doesn't still have the problem. That's mostly a problem with GUI apps where users submit tons of bugs which will never be reproducible and may be caused by all kinds of desktop ui interactions -- even things like viruses. I do think some bug trackers make the problem worse by being designed around that workflow, and I want to avoid seeing us being pushed down that road. But we do have pending bugs. Sometimes Tom posts a patch but doesn't commit it because he doesn't really like the fix, sometimes we aren't sure how to solve it or there are just more important things going on. The recent "avoid redundant detoasting" had been a pending problem without a good solution for a long time. Tom didn't forget about that one but there are probably a dozen or so like it and I certainly don't remember them all. And my list probably isn't the same as Tom's list. I would expect us to be like debian where we have a handful of bugs which come in each week. We exchange a few emails and we close the bug. But every now and then one comes in that we neither want to close nor fix right away. When it comes time for a point release or a major release we check that list to see if any of them are things we really want to commit before releasing. --=20 greg http://mit.edu/~gsstark/resume.pdf
Alvaro Herrera <alvherre@commandprompt.com> writes: > Personally I still think debbugs would suit us perfectly, but 1. I don't > have time to handle it, 2. nobody else believes this, 3. the debbugs > developers are not very interested in helping us use it. What is it that we'd need their help for? regards, tom lane
On mån, 2009-08-24 at 19:56 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Personally I still think debbugs would suit us perfectly, but 1. I don't > > have time to handle it, 2. nobody else believes this, 3. the debbugs > > developers are not very interested in helping us use it. > > What is it that we'd need their help for? They don't really make stable code releases. It's a continuous flow up coding and deploying small upgrades.
Actually Greg . . . After a long and careful study of this, this is a Postgres bug. Postgres is boogering up the sort. The operators < >, and the "order by" in the select statement are both case-sensitive and case-insensitive. It is inconsistent. It does not matter what my collating sequence is. This is the behavior I am seeing: If the words are the same words but letters have different case, then the operator is case-sensitive. If the words are not the same words, then the operator is case-insensitive until the operator reaches the character position in both strings where the letters become different. It is a combination of both case-sensitive and case-insenstive. Completely bogus. It does not matter what computer I am using. I get the same bogus behavior on all my operating systems and Postgres versions. Brian Ceccarelli Talus Software 4605 Woodmill Run Apex, NC 27539 -----Original Message----- From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark Sent: Thursday, August 20, 2009 5:50 PM To: Brian Ceccarelli Subject: Re: BUG #4999: select 'a' < 'A' is true, but should be false . . . On Thu, Aug 20, 2009 at 9:24 PM, Brian Ceccarelli<ceccareb@talussoftware.com> wrote: > The operator equals is case sensitive. > The operator < and > are case-insensitive. > This is not consistent. > > Most of the time, operator < and > are case-insenstive, until you compare > the upper and lower cases of the same letter. This is all down to your collation locale. en_US and most other non-C locales sort case insensitively. You'll find the Unix command "sort" and other utilities behave similarly. You can set your lc_collate to C if you want strict ascii binary byte order. In 8.3 and previous this was a cluster-wide setting but in 8.4 you can set it per-database. If you can find any cases that are actually inconsistent (the above examples are not -- you need to find three values where a < b < c but a >= c) then this is a serious problem and you should not use that locale with postgres or it will result in corrupt indexes. -- greg http://mit.edu/~gsstark/resume.pdf
"Brian Ceccarelli" <ceccareb@talusmusic.com> writes: > After a long and careful study of this, this is a Postgres bug. > Postgres is boogering up the sort. The operators < >, and the "order by" > in the select statement are both case-sensitive and case-insensitive. It > is inconsistent. "Consistency" is not a hallmark of the sort ordering in most non-C locales :-(. If this behavior troubles you, use C locale. regards, tom lane
On Tue, Aug 25, 2009 at 8:38 PM, Brian Ceccarelli<ceccareb@talusmusic.com> wrote: > > If the words are the same words but letters have different case, then the > operator is case-sensitive. > If the words are not the same words, then the operator is case-insensitive > until the operator reaches the character position in both strings where t= he > letters become different. > this is completely non-sense. can you present a test case that proves what you're claiming? --=20 Atentamente, Jaime Casanova Soporte y capacitaci=C3=B3n de PostgreSQL Asesor=C3=ADa y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Brian Ceccarelli wrote: > If the words are the same words but letters have different case, then the > operator is case-sensitive. > If the words are not the same words, then the operator is case-insensitive > until the operator reaches the character position in both strings where the > letters become different. This is *exactly* how the collation is supposed to work (read: how it is defined by the C library). It is not a trivial single-pass comparison. It is definitely not simply a case-sensitive or case-insensitive comparison. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.