Thread: UNIQUE INDEX difference between 7.2 and 7.3

UNIQUE INDEX difference between 7.2 and 7.3

From

Kevin Houle

Date:

12 August 2003, 12:10:27

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Your name               : Kevin Houle
Your email address      : kjh@cert.org


System Configuration
- ----------------------
  Architecture : i686

  Operating System : RH9, 2.4.20-19

  PostgreSQL version :  PostgreSQL-7.3.4 (RPMS from PGDG)

Please enter a FULL description of your problem:
- -------------------------------------------------

There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5)
which contains a file with two SQL INSERT commands to insert two rows
into a test table. The table should look like this:

  CREATE TABLE tbl_test (
    testcol text,
    unique (testcol)
  );

The databases in my testing are using SQL_ASCII encoding.

Executing the two attached INSERT statements produces a duplicate
key error on PostgreSQL 7.3.2 (as distributed by Red Hat) and 7.3.4
(as distributed by PGDG). Running on PostgreSQL 7.2.4 (PGDG) under
RH7.3, these two INSERTs work flawlessly. In all tests, I've executed
the INSERTs using

  psql -d dbname -f filename.sql

The values in the two INSERTs are unicode filenames and they are
different. Something seems to have changed between 7.2 and 7.3 with
regard to how the values are handled, but I cannot find anything
in the docs to suggest what that change might be. So, I'm reporting
it as a bug in case it shouldn't have changed.

Regards,
Kevin

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/OQM4u/NTC+XTbEkRAjtuAJ9DIFwrmRegc8jFkY/XKNxAjqywzACg3LnV
cosGfdzXiqcAhKJ1144Zhq0=
=9gDf
-----END PGP SIGNATURE-----

Attachment

file.sql

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Stephan Szabo

Date:

12 August 2003, 12:21:27

On Tue, 12 Aug 2003, Kevin Houle wrote:

> System Configuration
> - ----------------------
>   Architecture : i686
>
>   Operating System : RH9, 2.4.20-19
>
>   PostgreSQL version :  PostgreSQL-7.3.4 (RPMS from PGDG)
>
> Please enter a FULL description of your problem:
> - -------------------------------------------------
>
> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5)
> which contains a file with two SQL INSERT commands to insert two rows
> into a test table. The table should look like this:
>
>   CREATE TABLE tbl_test (
>     testcol text,
>     unique (testcol)
>   );
>
> The databases in my testing are using SQL_ASCII encoding.

I don't receive an error on my 7.3.4 system, what locale is the database
initialized in?

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Kevin Houle

Date:

12 August 2003, 12:32:20

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --On Tuesday, August 12, 2003 08:18:53 AM -0700 Stephan Szabo
<sszabo@megazone
.bigpanda.com> wrote:

>
> On Tue, 12 Aug 2003, Kevin Houle wrote:
>
>> System Configuration
>> - ----------------------
>>   Architecture : i686
>>
>>   Operating System : RH9, 2.4.20-19
>>
>>   PostgreSQL version :  PostgreSQL-7.3.4 (RPMS from PGDG)
>>
>> Please enter a FULL description of your problem:
>> - -------------------------------------------------
>>
>> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5)
>> which contains a file with two SQL INSERT commands to insert two rows
>> into a test table. The table should look like this:
>>
>>   CREATE TABLE tbl_test (
>>     testcol text,
>>     unique (testcol)
>>   );
>>
>> The databases in my testing are using SQL_ASCII encoding.
>
> I don't receive an error on my 7.3.4 system, what locale is the database
> initialized in?

I'm using defaults, so on RH9 it is:

  /var/lib/pgsql/initdb.i18n:
   LANG="en_US.UTF-8"

and on RH7.3 it is:

  /var/lib/pgsql/initdb.i18n:
   LANG="en_US"

Kevin



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/OQeiu/NTC+XTbEkRAqyCAJ9dWKLKDy7ikbFFNwc6bzt7rEqj+wCfevAe
LOaw4FFNrbr6wdtl0Zls+cU=
=X2Z2
-----END PGP SIGNATURE-----

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Stephan Szabo

Date:

12 August 2003, 12:47:48

On Tue, 12 Aug 2003, Kevin Houle wrote:

> >> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5)
> >> which contains a file with two SQL INSERT commands to insert two rows
> >> into a test table. The table should look like this:
> >>
> >>   CREATE TABLE tbl_test (
> >>     testcol text,
> >>     unique (testcol)
> >>   );
> >>
> >> The databases in my testing are using SQL_ASCII encoding.
> >
> > I don't receive an error on my 7.3.4 system, what locale is the database
> > initialized in?
>
> I'm using defaults, so on RH9 it is:
>
>   /var/lib/pgsql/initdb.i18n:
>    LANG="en_US.UTF-8"

Okay, I see it with en_US.UTF-8, but not with C locale, nor with
en_US or en_US.iso885915.  It looks like the comparison rules are
different between the locales (and I'm not sure if SQL_ASCII encoding
and a UTF8 locale makes sense in practice).

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Kevin Houle

Date:

12 August 2003, 13:55:21

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --On Tuesday, August 12, 2003 08:47:08 AM -0700 Stephan Szabo
<sszabo@megazone
.bigpanda.com> wrote:

> On Tue, 12 Aug 2003, Kevin Houle wrote:
>
>> >> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5)
>> >> which contains a file with two SQL INSERT commands to insert two rows
>> >> into a test table. The table should look like this:
>> >>
>> >>   CREATE TABLE tbl_test (
>> >>     testcol text,
>> >>     unique (testcol)
>> >>   );
>> >>
>> >> The databases in my testing are using SQL_ASCII encoding.
>> >
>> > I don't receive an error on my 7.3.4 system, what locale is the
>> > database initialized in?
>>
>> I'm using defaults, so on RH9 it is:
>>
>>   /var/lib/pgsql/initdb.i18n:
>>    LANG="en_US.UTF-8"
>
> Okay, I see it with en_US.UTF-8, but not with C locale, nor with
> en_US or en_US.iso885915.  It looks like the comparison rules are
> different between the locales (and I'm not sure if SQL_ASCII encoding
> and a UTF8 locale makes sense in practice).

Good point. I don't see the behavior with locale=en_US, either. I'll
use that as a work-around. Probably a few other RH9 users out there
getting into the same situation using the PGDG and Red Hat RPMS.

I appreciate your taking the time to look at this.

Cheers,
Kevin


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/ORu7u/NTC+XTbEkRAiwlAKCkqpPyQX7mLXx5iKMAAf7v03t6JwCfUvIB
qL8Xz60qXwigV/LnzkGTM8M=
=Q0iV
-----END PGP SIGNATURE-----

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Tom Lane

Date:

12 August 2003, 16:47:27

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Okay, I see it with en_US.UTF-8, but not with C locale, nor with
> en_US or en_US.iso885915.  It looks like the comparison rules are
> different between the locales (and I'm not sure if SQL_ASCII encoding
> and a UTF8 locale makes sense in practice).

I'd think not --- the byte sequence is most likely not a valid string in
UTF8 encoding.  I'm not sure what strcoll() would do when comparing
illegal byte sequences, but failing to detect that they're different is
certainly not too implausible.

This brings up once again the question of whether initdb ought to accept
the locale it finds in the environment.  I had not realized that Red Hat
9 is defaulting to en_US.UTF-8.  That is an actively evil choice for us
(unless we change the default database encoding to match).

IIRC we were about evenly split between changing or not changing
initdb's behavior, but if this really is a typical RHL9 setup, I think
that has got to affect the decision.

Comments?

            regards, tom lane

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Stephan Szabo

Date:

12 August 2003, 16:54:08

On Tue, 12 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > Okay, I see it with en_US.UTF-8, but not with C locale, nor with
> > en_US or en_US.iso885915.  It looks like the comparison rules are
> > different between the locales (and I'm not sure if SQL_ASCII encoding
> > and a UTF8 locale makes sense in practice).
>
> I'd think not --- the byte sequence is most likely not a valid string in
> UTF8 encoding.  I'm not sure what strcoll() would do when comparing
> illegal byte sequences, but failing to detect that they're different is
> certainly not too implausible.

That's what I was worried about.

> This brings up once again the question of whether initdb ought to accept
> the locale it finds in the environment.  I had not realized that Red Hat
> 9 is defaulting to en_US.UTF-8.  That is an actively evil choice for us
> (unless we change the default database encoding to match).

That's somewhat interesting too, because my server is also RHL9,
but it appears to default accounts to en_US.iso885915.  I think
there might have been a set up option relating to using UTF8 locales.

> IIRC we were about evenly split between changing or not changing
> initdb's behavior, but if this really is a typical RHL9 setup, I think
> that has got to affect the decision.

I don't know enough about the issues involved.  Can we reasonably tell
that a particular locale and encoding don't make sense together (apart
from things like looking for UTF-8 in the name for example)?

Re: UNIQUE INDEX difference between 7.2 and 7.3

From

Tom Lane

Date:

12 August 2003, 17:10:40

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I don't know enough about the issues involved.  Can we reasonably tell
> that a particular locale and encoding don't make sense together (apart
> from things like looking for UTF-8 in the name for example)?

There was some discussion about this a week or two ago.  Apparently,
glibc has a way to ask what character set a given locale expects,
but there's no such capability in the C standards, so it's not portable.

Since glibc-based systems seem to be the main ones guilty of defaulting
to non-C locales, perhaps it would be Good Enough (TM) to make the check
on glibc, and assume that the user knows what he's doing elsewhere.
Needs thought though.

            regards, tom lane