Thread: BUG #4999: select 'a' < 'A' is true, but should be false . . .

BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
"Brian Ceccarelli"
Date:
The following bug has been logged online:

Bug reference:      4999
Logged by:          Brian Ceccarelli
Email address:      ceccareb@talussoftware.com
PostgreSQL version: 8.4.0 to 8.2.13
Operating system:   Linux and XP
Description:        select 'a' < 'A' is true, but should be false . . .
Details:

since the < and > comparison operators seem to be case insensitive:

select 'a' < 'Z';    -- true
select 'a' < 'z';    -- true
select 'A' < 'Z';    -- true
select 'A' < 'z';    -- true

select 'z' < 'A';    -- false
select 'z' < 'a';    -- false
select 'Z' < 'A';    -- false
select 'Z' < 'a';    -- false

Any case A is < any case Z implies case-insensitive compare.    Which would
imply that 'a' = 'A', but 'a' < 'A' is true.

- - -

The operator equals is case sensitive.
The operator < and > are case-insensitive.
This is not consistent.

Most of the time, operator < and > are case-insenstive, until you compare
the upper and lower cases of the same letter.

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Tom Lane
Date:
"Brian Ceccarelli" <ceccareb@talussoftware.com> writes:
> since the < and > comparison operators seem to be case insensitive:

They are not, unless you have managed to find a case-insensitive locale
somewhere.  In any case we would not think that 'a' = 'A'.

            regards, tom lane

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Peter Eisentraut
Date:
On tor, 2009-08-20 at 20:24 +0000, Brian Ceccarelli wrote:
> since the < and > comparison operators seem to be case insensitive:
>
> select 'a' < 'Z';    -- true
> select 'a' < 'z';    -- true
> select 'A' < 'Z';    -- true
> select 'A' < 'z';    -- true
>
> select 'z' < 'A';    -- false
> select 'z' < 'a';    -- false
> select 'Z' < 'A';    -- false
> select 'Z' < 'a';    -- false
>
> Any case A is < any case Z implies case-insensitive compare.    Which would
> imply that 'a' = 'A', but 'a' < 'A' is true.

No, they are not "case insensitive".  The way this works is with a
multipass comparison algorithm: First, the letters are compared
independent of case, then the case is compared.  There is also an
additional pass for comparing accents, but I forget at the moment which
pass that is.  Search for Unicode collation algorithm, if you are
interested.

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
"Brian Ceccarelli"
Date:
Yes.   I understand.   It has all to do with Unicode collating sequence.  I
need to somehow remove this bug from the list, since it is not a bug.

-----Original Message-----
From: pgsql-bugs-owner@postgresql.org
[mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Peter Eisentraut
Sent: Monday, August 24, 2009 4:54 AM
To: Brian Ceccarelli
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4999: select 'a' < 'A' is true, but should be false
. . .

On tor, 2009-08-20 at 20:24 +0000, Brian Ceccarelli wrote:
> since the < and > comparison operators seem to be case insensitive:
>
> select 'a' < 'Z';    -- true
> select 'a' < 'z';    -- true
> select 'A' < 'Z';    -- true
> select 'A' < 'z';    -- true
>
> select 'z' < 'A';    -- false
> select 'z' < 'a';    -- false
> select 'Z' < 'A';    -- false
> select 'Z' < 'a';    -- false
>
> Any case A is < any case Z implies case-insensitive compare.    Which
would
> imply that 'a' = 'A', but 'a' < 'A' is true.

No, they are not "case insensitive".  The way this works is with a
multipass comparison algorithm: First, the letters are compared
independent of case, then the case is compared.  There is also an
additional pass for comparing accents, but I forget at the moment which
pass that is.  Search for Unicode collation algorithm, if you are
interested.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Robert Haas
Date:
On Mon, Aug 24, 2009 at 8:24 AM, Brian
Ceccarelli<ceccareb@talusmusic.com> wrote:
> Yes. =A0 I understand. =A0 It has all to do with Unicode collating sequen=
ce. =A0I
> need to somehow remove this bug from the list, since it is not a bug.

Heh.  As far as I understand, there is no list...  one thing that I
have been worrying about is the lack of any apparent mechanism to make
sure that every bug report gets a response, even if the response is
just "this is not a bug" or "you didn't provide enough detail" or "go
file this bug against the correct product".

A few recent ones that have not (yet):

#5003
#4986
#4943
#4942
#4940
#4909
#4904

...Robert

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Mon, Aug 24, 2009 at 8:24 AM, Brian
> Ceccarelli<ceccareb@talusmusic.com> wrote:
> > Yes.   I understand.   It has all to do with Unicode collating sequence.  I
> > need to somehow remove this bug from the list, since it is not a bug.
>
> Heh.  As far as I understand, there is no list...  one thing that I
> have been worrying about is the lack of any apparent mechanism to make
> sure that every bug report gets a response, even if the response is
> just "this is not a bug" or "you didn't provide enough detail" or "go
> file this bug against the correct product".

Bruce used to say proudly that we didn't need a bugtracker because *he*
was the bugtracker.  Back in those days he had a lot more time to keep
on top of his PG mailbox.  Today it is seeming like we need something
better.  Something simple though, because the previous discussions
involving bugzilla led nowhere (or actually -- they led us right back
where we started).

Personally I still think debbugs would suit us perfectly, but 1. I don't
have time to handle it, 2. nobody else believes this, 3. the debbugs
developers are not very interested in helping us use it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Greg Stark
Date:
On Mon, Aug 24, 2009 at 8:03 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:
> Personally I still think debbugs would suit us perfectly, but 1. I don't
> have time to handle it, 2. nobody else believes this, 3. the debbugs
> developers are not very interested in helping us use it.

I've been shouting about debbugs forever too.

It's completely email based so we could just treat it as a mailing
list without having to go visit a web interface to stay up to date. We
could add CVS/whatever hooks so whenever a commit message says it
closes a bug it gets closed automatically.  Effectively it would
require no operational changes for developers who would just
participate on a mailing list and commit changes like usual.

It also has a web interface so you can go see the history to see the
pending bugs to work on and their history of course. But that's not
how we're accustomed to working and I fear anything like bugzilla
would require dedicated bug-wranglers like Bruce to keep the
connection between the email discussion and the bug tracker going
because the developers would ignore the web site and the bug reporters
would be unaware of any mailing list discussion.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Robert Haas
Date:
On Mon, Aug 24, 2009 at 3:10 PM, Greg Stark<gsstark@mit.edu> wrote:
> On Mon, Aug 24, 2009 at 8:03 PM, Alvaro
> Herrera<alvherre@commandprompt.com> wrote:
>> Personally I still think debbugs would suit us perfectly, but 1. I don't
>> have time to handle it, 2. nobody else believes this, 3. the debbugs
>> developers are not very interested in helping us use it.
>
> I've been shouting about debbugs forever too.
>
> It's completely email based so we could just treat it as a mailing
> list without having to go visit a web interface to stay up to date. We
> could add CVS/whatever hooks so whenever a commit message says it
> closes a bug it gets closed automatically. =A0Effectively it would
> require no operational changes for developers who would just
> participate on a mailing list and commit changes like usual.
>
> It also has a web interface so you can go see the history to see the
> pending bugs to work on and their history of course. But that's not
> how we're accustomed to working and I fear anything like bugzilla
> would require dedicated bug-wranglers like Bruce to keep the
> connection between the email discussion and the bug tracker going
> because the developers would ignore the web site and the bug reporters
> would be unaware of any mailing list discussion.

Well, I think we're dropping a lot of the bugs early in the process,
before they even get a response.  Of course, a lot of the reason for
that is because many of the "bugs" are actually usage questions, user
error, completely lacking in relevant detail, problems with products
other than Postgres, and/or feature requests dressed up as bugs to
make us feel guilty about them.  It seems to me based on my short
tenure reading this mailing list that when someone provides a
reproducible test case of Postgres verifiably DTWT it usually attracts
plenty of attention and gets dealt with relatively quickly, usually
with a friendly "thanks for the report!".

There's nothing technological that prevents someone from quickly
finding a list of the bugs that haven't gotten a response at all, as
evidenced by the fact that I just did it upthread in less than 5
minutes going back to bug ~4900.  The details of how to reproduce this
are left as an exercise to the student (hint: threaded mail reader).
Finding the bugs that were discussed but not resolved is harder, but
probably a lot less worthwhile.  A lot of those are exchanges of the
form:

Report: your product has a huge problem that would affect nearly every user
Response: uh, we doubt it, because we'd've noticed that.  can you
provide a test case?
<still waiting>

I think there's definitely room for some better bug wrangling, but
given the number of garbage bugs, the effort/reward ratio is likely to
be pretty high.

...Robert

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ...  It seems to me based on my short
> tenure reading this mailing list that when someone provides a
> reproducible test case of Postgres verifiably DTWT it usually attracts
> plenty of attention and gets dealt with relatively quickly, usually
> with a friendly "thanks for the report!".
> ...
> I think there's definitely room for some better bug wrangling, but
> given the number of garbage bugs, the effort/reward ratio is likely to
> be pretty high.

Yeah.  I think that bugzilla, and probably also debbugs (though I've
got no experience with the latter), are designed for workflows where a
bug remains live for some considerable period of time and/or is handed
off to different people during its lifespan.  That doesn't really
describe our handling of bugs.  We try to close bugs immediately if at
all possible, and if not they end up as items on the TODO list.

As Robert says, we could be better about making sure that we provide
some kind of response to every bug submission; but I think what's
lacking there is the will and the cycles to do so, not so much a tool.

            regards, tom lane

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Peter Eisentraut
Date:
On mån, 2009-08-24 at 20:10 +0100, Greg Stark wrote:
> It's completely email based so we could just treat it as a mailing
> list without having to go visit a web interface to stay up to date. We
> could add CVS/whatever hooks so whenever a commit message says it
> closes a bug it gets closed automatically.  Effectively it would
> require no operational changes for developers who would just
> participate on a mailing list and commit changes like usual.

Well, all you'd really need is that if you close a bug, you indicate
that say via an email header

X-PG-Bugs-Close: 12345

and then spice up the archives display to show that somehow.  But the
chances of getting people to use that consistently is pretty small.

Some kind of simple request tracker where you just forward all bugs that
you think need saving would also help.  You wouldn't need to copy all
the follow-up there; it would only serve as a pointer into the archives.

In fact, all you need is

CREATE TABLE bugs (
   nr int
);

and an interface to display that and add to that.

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Greg Stark
Date:
On Mon, Aug 24, 2009 at 10:16 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
>
> Well, all you'd really need is that if you close a bug, you indicate
> that say via an email header
>
> X-PG-Bugs-Close: 12345
>
> and then spice up the archives display to show that somehow. =A0But the
> chances of getting people to use that consistently is pretty small.

If all it takes is adding one line to an email or cvs commit you were
going to write anyways, like in debbugs, I think it would be easy to
get buy-in. It doesn't change the workflow at all.

Where I think processes like bugzilla and the like fall down is that
it requires people to then go visit a web site -- essentially it
switches their whole process focus from their mail to the web site and
splitting between the two sucks.

> Some kind of simple request tracker where you just forward all bugs that
> you think need saving would also help. =A0You wouldn't need to copy all
> the follow-up there; it would only serve as a pointer into the archives.

I can't tell if you're basically describing rewriting debbugs or
describing something sucky like using Bruce's mailbox as a bug
tracker. We don't just want a pile of references to hundreds of old
mail threads that we have to go through later to figure out what's
still relevant.

It turns out you do need a _few_ bells and whistles. You want to be
able to merge tickets, set priorities -- if only to set all the
"wishlist" bugs to another category until someone adds them to the
todo or decides they're not good todos.

We don't want to spend our time wrangling bugs instead of fixing them
which is what bugzilla is designed for, but we do want to be able to
track the actual bugs. If you look at debian packages most of them
have only a half dozen or fewer bugs -- all the postgres 8.4 packages
combined currently has two normal and one wishlist bug for example.
That's because their workflow is a lot like ours -- when a bug comes
in it comes in by email and people are used to responding to emails
immediately. When the thread resolves you put a line in your email
saying to close the bug or refile it or whatever you need to do.

I think it's a conceit to think we always fix bugs immediately. Of
course the critical ones get fixed, but we've had plenty of bugs
reported that are known "would be nice" bugs that ought to be fixed
when we get a round tuit. Many of them have been dropped over time
because we just forget about them. Sometimes people raise them again
and we end up fixing them sometime, sometimes Tom pulls stuff out of
his archives because he gets a bee in his bonnet, sometimes we just
hope nobody notices. It would be useful to have a list of them if only
so when people report them we can say it's a known issue.

--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Peter Eisentraut
Date:
On mån, 2009-08-24 at 23:07 +0100, Greg Stark wrote:
> I think it's a conceit to think we always fix bugs immediately.

I completely agree with that one.  The claim that we don't need a bug
tracker because most bugs get fixed immediately is bogus because a) it's
not true, and b) it doesn't help people *track* bugs.

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Greg Stark
Date:
On Mon, Aug 24, 2009 at 11:38 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On m=E5n, 2009-08-24 at 23:07 +0100, Greg Stark wrote:
>> I think it's a conceit to think we always fix bugs immediately.
>
> I completely agree with that one. =A0The claim that we don't need a bug
> tracker because most bugs get fixed immediately is bogus because a) it's
> not true, and b) it doesn't help people *track* bugs.

Well to be fair I think when people say that they're thinking of
projects like Mozilla which have thousands and thousands of bugs in
their bugzilla which keep getting moved from one release target to the
next until someone is tasked with closing them all because the code's
been rewritten five times and probably doesn't still have the problem.

That's mostly a problem with GUI apps where users submit tons of bugs
which will never be reproducible and may be caused by all kinds of
desktop ui interactions -- even things like viruses. I do think some
bug trackers make the problem worse by being designed around that
workflow, and I want to avoid seeing us being pushed down that road.

But we do have pending bugs. Sometimes Tom posts a patch but doesn't
commit it because he doesn't really like the fix, sometimes we aren't
sure how to solve it or there are just more important things going on.
The recent "avoid redundant detoasting" had been a pending problem
without a good solution for a long time. Tom didn't forget about that
one but there are probably a dozen or so like it and I certainly don't
remember them all. And my list probably isn't the same as Tom's list.

I would expect us to be like debian where we have a handful of bugs
which come in each week. We exchange a few emails and we close the
bug. But every now and then one comes in that we neither want to close
nor fix right away. When it comes time for a point release or a major
release we check that list to see if any of them are things we really
want to commit before releasing.

--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Personally I still think debbugs would suit us perfectly, but 1. I don't
> have time to handle it, 2. nobody else believes this, 3. the debbugs
> developers are not very interested in helping us use it.

What is it that we'd need their help for?

            regards, tom lane

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Peter Eisentraut
Date:
On mån, 2009-08-24 at 19:56 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Personally I still think debbugs would suit us perfectly, but 1. I don't
> > have time to handle it, 2. nobody else believes this, 3. the debbugs
> > developers are not very interested in helping us use it.
>
> What is it that we'd need their help for?

They don't really make stable code releases.  It's a continuous flow up
coding and deploying small upgrades.

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
"Brian Ceccarelli"
Date:
Actually Greg . . .

     After a long and careful study of this, this is a Postgres bug.
Postgres is boogering up the sort.   The operators < >, and the "order by"
in the select statement are both case-sensitive and case-insensitive.    It
is inconsistent.   It does not matter what my collating sequence is.    This
is the behavior I am seeing:

If the words are the same words but letters have different case, then the
operator is case-sensitive.
If the words are not the same words, then the operator is case-insensitive
until the operator reaches the character position in both strings where the
letters become different.

It is a combination of both case-sensitive and case-insenstive.   Completely
bogus.    It does not matter what computer I am using.    I get the same
bogus behavior on all my operating systems and Postgres versions.



Brian Ceccarelli
Talus Software
4605 Woodmill Run
Apex, NC  27539







-----Original Message-----
From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark
Sent: Thursday, August 20, 2009 5:50 PM
To: Brian Ceccarelli
Subject: Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

On Thu, Aug 20, 2009 at 9:24 PM, Brian
Ceccarelli<ceccareb@talussoftware.com> wrote:

> The operator equals is case sensitive.
> The operator < and > are case-insensitive.
> This is not consistent.
>
> Most of the time, operator < and > are case-insenstive, until you compare
> the upper and lower cases of the same letter.

This is all down to your collation locale. en_US and most other non-C
locales sort case insensitively. You'll find the Unix command "sort"
and other utilities behave similarly. You can set your lc_collate to C
if you want strict ascii binary byte order. In 8.3 and previous this
was a cluster-wide setting but in 8.4 you can set it per-database.

If you can find any cases that are actually inconsistent (the above
examples are not -- you need to find three values where a < b < c but
a >= c) then this is a serious problem and you should not use that
locale with postgres or it will result in corrupt indexes.
--
greg
http://mit.edu/~gsstark/resume.pdf

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Tom Lane
Date:
"Brian Ceccarelli" <ceccareb@talusmusic.com> writes:
>      After a long and careful study of this, this is a Postgres bug.
> Postgres is boogering up the sort.   The operators < >, and the "order by"
> in the select statement are both case-sensitive and case-insensitive.    It
> is inconsistent.

"Consistency" is not a hallmark of the sort ordering in most non-C
locales :-(.  If this behavior troubles you, use C locale.

            regards, tom lane

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Jaime Casanova
Date:
On Tue, Aug 25, 2009 at 8:38 PM, Brian
Ceccarelli<ceccareb@talusmusic.com> wrote:
>
> If the words are the same words but letters have different case, then the
> operator is case-sensitive.
> If the words are not the same words, then the operator is case-insensitive
> until the operator reaches the character position in both strings where t=
he
> letters become different.
>

this is completely non-sense. can you present a test case that proves
what you're claiming?

--=20
Atentamente,
Jaime Casanova
Soporte y capacitaci=C3=B3n de PostgreSQL
Asesor=C3=ADa y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: BUG #4999: select 'a' < 'A' is true, but should be false . . .

From
Alvaro Herrera
Date:
Brian Ceccarelli wrote:

> If the words are the same words but letters have different case, then the
> operator is case-sensitive.
> If the words are not the same words, then the operator is case-insensitive
> until the operator reaches the character position in both strings where the
> letters become different.

This is *exactly* how the collation is supposed to work (read: how it is
defined by the C library).  It is not a trivial single-pass comparison.
It is definitely not simply a case-sensitive or case-insensitive
comparison.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.