[18] Unintentional behavior change in commit e9931bfb75 - Mailing list pgsql-hackers

From Jeff Davis
Subject [18] Unintentional behavior change in commit e9931bfb75
Date
Msg-id 01a104f0d2179d756261e90d96fd65c36ad6fcf0.camel@j-davis.com
Whole thread Raw
Responses Re: [18] Unintentional behavior change in commit e9931bfb75
List pgsql-hackers
Commit e9931bfb75 (version 18) contained an unexpected behavior change.
LOWER('I') returns:

  In the locale tr_TR.iso88599 (single byte encoding):
                              17      18
    default                    i       ı
    specified                  ı       ı

  In the locale tr_TR.utf8:
                              17      18
    default                    ı       ı
    specified                  ı       ı

(Look carefully to see the dotted vs dotless "i".)

The behavior is commented (commit 176d5bae1d) in formatting.c:

   * ...  When using the default
   * collation, we apply the traditional Postgres behavior that
   * forces ASCII-style treatment of I/i, but in non-default
   * collations you get exactly what the collation says.
   */
  for (p = result; *p; p++)
  {
      if (mylocale)
          *p = tolower_l((unsigned char) *p, mylocale->info.lt);
      else
          *p = pg_tolower((unsigned char) *p);
  }

That's a somewhat strange special case (along with similar ones for
INITCAP() and UPPER()) that applies to single-byte encodings with the
libc provider and the database default collation only. I assume it was
done for backwards compatibility?

My commit e9931bfb75 (version 18) unifies the code paths for default
and explicit collations, and in the process it eliminates the special
case, and always uses tolower_l() for single-byte libc (whether default
collation or not).

Should I put the special case back? If not, it could break expression
indexes on LOWER()/UPPER() after an upgrade for users with the database
default collation of tr_TR who use libc and a single-byte encoding. But
preserving the special case seems weirdly inconsistent -- surely the
results should not depend on the encoding, right?

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Incorrect result of bitmap heap scan.
Next
From: Peter Geoghegan
Date:
Subject: Re: Incorrect result of bitmap heap scan.