Thread: UNIQUE INDEX difference between 7.2 and 7.3
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Your name : Kevin Houle Your email address : kjh@cert.org System Configuration - ---------------------- Architecture : i686 Operating System : RH9, 2.4.20-19 PostgreSQL version : PostgreSQL-7.3.4 (RPMS from PGDG) Please enter a FULL description of your problem: - ------------------------------------------------- There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5) which contains a file with two SQL INSERT commands to insert two rows into a test table. The table should look like this: CREATE TABLE tbl_test ( testcol text, unique (testcol) ); The databases in my testing are using SQL_ASCII encoding. Executing the two attached INSERT statements produces a duplicate key error on PostgreSQL 7.3.2 (as distributed by Red Hat) and 7.3.4 (as distributed by PGDG). Running on PostgreSQL 7.2.4 (PGDG) under RH7.3, these two INSERTs work flawlessly. In all tests, I've executed the INSERTs using psql -d dbname -f filename.sql The values in the two INSERTs are unicode filenames and they are different. Something seems to have changed between 7.2 and 7.3 with regard to how the values are handled, but I cannot find anything in the docs to suggest what that change might be. So, I'm reporting it as a bug in case it shouldn't have changed. Regards, Kevin -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/OQM4u/NTC+XTbEkRAjtuAJ9DIFwrmRegc8jFkY/XKNxAjqywzACg3LnV cosGfdzXiqcAhKJ1144Zhq0= =9gDf -----END PGP SIGNATURE-----
Attachment
On Tue, 12 Aug 2003, Kevin Houle wrote: > System Configuration > - ---------------------- > Architecture : i686 > > Operating System : RH9, 2.4.20-19 > > PostgreSQL version : PostgreSQL-7.3.4 (RPMS from PGDG) > > Please enter a FULL description of your problem: > - ------------------------------------------------- > > There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5) > which contains a file with two SQL INSERT commands to insert two rows > into a test table. The table should look like this: > > CREATE TABLE tbl_test ( > testcol text, > unique (testcol) > ); > > The databases in my testing are using SQL_ASCII encoding. I don't receive an error on my 7.3.4 system, what locale is the database initialized in?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Tuesday, August 12, 2003 08:18:53 AM -0700 Stephan Szabo <sszabo@megazone .bigpanda.com> wrote: > > On Tue, 12 Aug 2003, Kevin Houle wrote: > >> System Configuration >> - ---------------------- >> Architecture : i686 >> >> Operating System : RH9, 2.4.20-19 >> >> PostgreSQL version : PostgreSQL-7.3.4 (RPMS from PGDG) >> >> Please enter a FULL description of your problem: >> - ------------------------------------------------- >> >> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5) >> which contains a file with two SQL INSERT commands to insert two rows >> into a test table. The table should look like this: >> >> CREATE TABLE tbl_test ( >> testcol text, >> unique (testcol) >> ); >> >> The databases in my testing are using SQL_ASCII encoding. > > I don't receive an error on my 7.3.4 system, what locale is the database > initialized in? I'm using defaults, so on RH9 it is: /var/lib/pgsql/initdb.i18n: LANG="en_US.UTF-8" and on RH7.3 it is: /var/lib/pgsql/initdb.i18n: LANG="en_US" Kevin -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/OQeiu/NTC+XTbEkRAqyCAJ9dWKLKDy7ikbFFNwc6bzt7rEqj+wCfevAe LOaw4FFNrbr6wdtl0Zls+cU= =X2Z2 -----END PGP SIGNATURE-----
On Tue, 12 Aug 2003, Kevin Houle wrote: > >> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5) > >> which contains a file with two SQL INSERT commands to insert two rows > >> into a test table. The table should look like this: > >> > >> CREATE TABLE tbl_test ( > >> testcol text, > >> unique (testcol) > >> ); > >> > >> The databases in my testing are using SQL_ASCII encoding. > > > > I don't receive an error on my 7.3.4 system, what locale is the database > > initialized in? > > I'm using defaults, so on RH9 it is: > > /var/lib/pgsql/initdb.i18n: > LANG="en_US.UTF-8" Okay, I see it with en_US.UTF-8, but not with C locale, nor with en_US or en_US.iso885915. It looks like the comparison rules are different between the locales (and I'm not sure if SQL_ASCII encoding and a UTF8 locale makes sense in practice).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Tuesday, August 12, 2003 08:47:08 AM -0700 Stephan Szabo <sszabo@megazone .bigpanda.com> wrote: > On Tue, 12 Aug 2003, Kevin Houle wrote: > >> >> There is an email attachment (md5: 5cc780da645df9516235d43d1cf1e8b5) >> >> which contains a file with two SQL INSERT commands to insert two rows >> >> into a test table. The table should look like this: >> >> >> >> CREATE TABLE tbl_test ( >> >> testcol text, >> >> unique (testcol) >> >> ); >> >> >> >> The databases in my testing are using SQL_ASCII encoding. >> > >> > I don't receive an error on my 7.3.4 system, what locale is the >> > database initialized in? >> >> I'm using defaults, so on RH9 it is: >> >> /var/lib/pgsql/initdb.i18n: >> LANG="en_US.UTF-8" > > Okay, I see it with en_US.UTF-8, but not with C locale, nor with > en_US or en_US.iso885915. It looks like the comparison rules are > different between the locales (and I'm not sure if SQL_ASCII encoding > and a UTF8 locale makes sense in practice). Good point. I don't see the behavior with locale=en_US, either. I'll use that as a work-around. Probably a few other RH9 users out there getting into the same situation using the PGDG and Red Hat RPMS. I appreciate your taking the time to look at this. Cheers, Kevin -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/ORu7u/NTC+XTbEkRAiwlAKCkqpPyQX7mLXx5iKMAAf7v03t6JwCfUvIB qL8Xz60qXwigV/LnzkGTM8M= =Q0iV -----END PGP SIGNATURE-----
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > Okay, I see it with en_US.UTF-8, but not with C locale, nor with > en_US or en_US.iso885915. It looks like the comparison rules are > different between the locales (and I'm not sure if SQL_ASCII encoding > and a UTF8 locale makes sense in practice). I'd think not --- the byte sequence is most likely not a valid string in UTF8 encoding. I'm not sure what strcoll() would do when comparing illegal byte sequences, but failing to detect that they're different is certainly not too implausible. This brings up once again the question of whether initdb ought to accept the locale it finds in the environment. I had not realized that Red Hat 9 is defaulting to en_US.UTF-8. That is an actively evil choice for us (unless we change the default database encoding to match). IIRC we were about evenly split between changing or not changing initdb's behavior, but if this really is a typical RHL9 setup, I think that has got to affect the decision. Comments? regards, tom lane
On Tue, 12 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > Okay, I see it with en_US.UTF-8, but not with C locale, nor with > > en_US or en_US.iso885915. It looks like the comparison rules are > > different between the locales (and I'm not sure if SQL_ASCII encoding > > and a UTF8 locale makes sense in practice). > > I'd think not --- the byte sequence is most likely not a valid string in > UTF8 encoding. I'm not sure what strcoll() would do when comparing > illegal byte sequences, but failing to detect that they're different is > certainly not too implausible. That's what I was worried about. > This brings up once again the question of whether initdb ought to accept > the locale it finds in the environment. I had not realized that Red Hat > 9 is defaulting to en_US.UTF-8. That is an actively evil choice for us > (unless we change the default database encoding to match). That's somewhat interesting too, because my server is also RHL9, but it appears to default accounts to en_US.iso885915. I think there might have been a set up option relating to using UTF8 locales. > IIRC we were about evenly split between changing or not changing > initdb's behavior, but if this really is a typical RHL9 setup, I think > that has got to affect the decision. I don't know enough about the issues involved. Can we reasonably tell that a particular locale and encoding don't make sense together (apart from things like looking for UTF-8 in the name for example)?
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > I don't know enough about the issues involved. Can we reasonably tell > that a particular locale and encoding don't make sense together (apart > from things like looking for UTF-8 in the name for example)? There was some discussion about this a week or two ago. Apparently, glibc has a way to ask what character set a given locale expects, but there's no such capability in the C standards, so it's not portable. Since glibc-based systems seem to be the main ones guilty of defaulting to non-C locales, perhaps it would be Good Enough (TM) to make the check on glibc, and assume that the user knows what he's doing elsewhere. Needs thought though. regards, tom lane