Thread: Is this a BUG? Is there anyone has the same problem?

Is this a BUG? Is there anyone has the same problem?

From
"jack"
Date:
Providing there is simple table, called "test_xx", which has only one
column, called
"aa" CHAR(16), and the column is set as primary key. Put in some data such
as following.
( _ stands for a space character)

'AA_B'
'AAB'
'BB_123'
'BB123'

Execute following statement with psql.
SELECT * FROM test_xx ORDER BY aa;

I 've got ...

'AAB'
'AA_B'
'BB123'
'BB_123'

But if excute
SELECT subStr(aa, 1,4) FROM test_xx ORDER BY aa;
I've got ..

'AA_B'
'AAB'
BB_12'
BB123'

jack




Re: Is this a BUG? Is there anyone has the same problem?

From
Tom Lane
Date:
"jack" <datactrl@tpg.com.au> writes:
> [ funny ordering behavior ]

What locale is your database running in?
        regards, tom lane


Is this a BUG? Is there anyone has the same problem?

From
"jack"
Date:
Thanks, Tom

> What locale is your database running in?

My locale is en_AU (on Redhat 7.2)

jack



Re: Is this a BUG? Is there anyone has the same problem?

From
Tom Lane
Date:
"jack" <datactrl@tpg.com.au> writes:
>> What locale is your database running in?

> My locale is en_AU (on Redhat 7.2)

Hmph.  It seems to be a peculiarity of the locale sorting rules for
English.  Using RedHat 7.2, I made a file containing 3 lines, the last
of which has one trailing blank:

[tgl@rh1 tgl]$ cat test
AAB
AA B
AAB

-- hmm, can't see the spaces very well, so do this:

[tgl@rh1 tgl]$ sed 's/ /_/g' test
AAB
AA_B
AAB_

-- Now sort under Aussie rules:

[tgl@rh1 tgl]$ LANG=en_AU sort test
AAB
AA B
AAB

-- uh, let's try looking to see where the spaces are:

[tgl@rh1 tgl]$ LANG=en_AU sort test | sed 's/ /_/g'
AAB
AA_B
AAB_

-- Not too consistent, eh?  I get the same results with en_US though:

[tgl@rh1 tgl]$ LANG=en_US sort test | sed 's/ /_/g'
AAB
AA_B
AAB_

-- but traditional "C" locale does this:

[tgl@rh1 tgl]$ LANG=C sort test | sed 's/ /_/g'
AA_B
AAB
AAB_


The reason that your SQL tests reflect this is that comparisons for type
CHAR(n) remove any trailing blanks before comparing; but the result of
substr() is of type TEXT, so it assumes trailing blanks are significant.
So the data you were sorting were in the one case effectively

'AA B'
'AAB'
'BB 123'
'BB123'

and in the other case

'AA B'
'AAB '
'BB 1'
'BB12'

and the locale sort rules treat 'AAB' differently from 'AAB '.

If you think that's a bug, you can take it up with whoever maintains
Linux's locale rules.  It ain't our bug though.  (You might prefer
to initdb under C locale if you'd rather sort according to C rules.)
        regards, tom lane


Re: Is this a BUG? Is there anyone has the same problem?

From
David Stanaway
Date:
Hi Tom.

This seems to be a bit of a FAQ at the moment...  Do you think it should
be added to the release notes? Maybe the people that build the packages
for different distributions of Linux need to have some kind of warning
in the installation scripts which let the users pic the locale used for
Postgres by default.


Cheers...

--
David Stanaway


On Mon, 2002-04-22 at 11:35, Tom Lane wrote:
> "jack" <datactrl@tpg.com.au> writes:
> >> What locale is your database running in?
>
> > My locale is en_AU (on Redhat 7.2)
>
> Hmph.  It seems to be a peculiarity of the locale sorting rules for
> English.  Using RedHat 7.2, I made a file containing 3 lines, the last
> of which has one trailing blank:
>
> [tgl@rh1 tgl]$ cat test
> AAB
> AA B
> AAB
>
> -- hmm, can't see the spaces very well, so do this:
>
> [tgl@rh1 tgl]$ sed 's/ /_/g' test
> AAB
> AA_B
> AAB_
>
> -- Now sort under Aussie rules:
>
> [tgl@rh1 tgl]$ LANG=en_AU sort test
> AAB
> AA B
> AAB
>
> -- uh, let's try looking to see where the spaces are:
>
> [tgl@rh1 tgl]$ LANG=en_AU sort test | sed 's/ /_/g'
> AAB
> AA_B
> AAB_
>
> -- Not too consistent, eh?  I get the same results with en_US though:
>
> [tgl@rh1 tgl]$ LANG=en_US sort test | sed 's/ /_/g'
> AAB
> AA_B
> AAB_
>
> -- but traditional "C" locale does this:
>
> [tgl@rh1 tgl]$ LANG=C sort test | sed 's/ /_/g'
> AA_B
> AAB
> AAB_
>
>
> The reason that your SQL tests reflect this is that comparisons for type
> CHAR(n) remove any trailing blanks before comparing; but the result of
> substr() is of type TEXT, so it assumes trailing blanks are significant.
> So the data you were sorting were in the one case effectively
>
> 'AA B'
> 'AAB'
> 'BB 123'
> 'BB123'
>
> and in the other case
>
> 'AA B'
> 'AAB '
> 'BB 1'
> 'BB12'
>
> and the locale sort rules treat 'AAB' differently from 'AAB '.
>
> If you think that's a bug, you can take it up with whoever maintains
> Linux's locale rules.  It ain't our bug though.  (You might prefer
> to initdb under C locale if you'd rather sort according to C rules.)
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


Re: Is this a BUG? Is there anyone has the same problem?

From
Tom Lane
Date:
David Stanaway <david@stanaway.net> writes:
> This seems to be a bit of a FAQ at the moment...  Do you think it should
> be added to the release notes? Maybe the people that build the packages
> for different distributions of Linux need to have some kind of warning
> in the installation scripts which let the users pic the locale used for
> Postgres by default.

I think it is mentioned someplace.  Maybe we could make it more prominent.

There's been some talk of having initdb always report the locale
settings being used, but as of 7.2 it doesn't warn you about non-C
locales.  (7.1 did; 7.2 fails to because initdb is overenthusiastic
about hiding backend output :-()
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: Is this a BUG? Is there anyone has the same problem?

From
Tom Lane
Date:
David Stanaway <david@stanaway.net> writes:
> This seems to be a bit of a FAQ at the moment...  Do you think it should
> be added to the release notes? Maybe the people that build the packages
> for different distributions of Linux need to have some kind of warning
> in the installation scripts which let the users pic the locale used for
> Postgres by default.

I think it is mentioned someplace.  Maybe we could make it more prominent.

There's been some talk of having initdb always report the locale
settings being used, but as of 7.2 it doesn't warn you about non-C
locales.  (7.1 did; 7.2 fails to because initdb is overenthusiastic
about hiding backend output :-()
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html