Thread: sorting/grouping/(non-)unique indexes bug
Greetings! Before I present the following, I must say that 'simple' index corruption is highly improbable; the server did not suffer hard reboots. rdw=# \d reg_user Table "reg_user" Column | Type | Modifiers ------------------+------------------------+------------------------------------------------ user_id | integer | not null default nextval('reg_user_seq'::text) user_email | character varying(50) | not null user_passwd | character varying(32) | not null user_active | boolean | not null default 't' user_allow_pauth | boolean | default 'f' user_full_name | character varying(100) | user_pseudonym | character varying(100) | user_who | character(1) | Primary key: reg_user_pkey Unique keys: reg_user_email_key rdw=# \d reg_user_email_key Index "reg_user_email_key" Column | Type ------------+----------------------- user_email | character varying(50) unique btree rdw=# create table broken as select user_email from reg_user; SELECT rdw=# -- one may think that emails are unique... rdw=# select count(user_email) from broken; count -------- 212400 (1 row) rdw=# select count(distinct user_email) from broken; count -------- 212397 (1 row) rdw=# -- look closely at the numbers rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc; user_email | count ----------------------------------------------------+------- [duplicated email] | 2 [some email] | 1 [some email] | 1 [some email] | 1 rdw=# delete from broken where user_email = '[duplicated email]'; DELETE 2 rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc; user_email | count ----------------------------------------------------+------- [another email] | 2 [some email] | 1 [some email] | 1 rdw=# delete from broken where user_email = '[another email]'; DELETE 2 rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc; user_email | count ----------------------------------------------------+------- [some email] | 1 [some email] | 1 [some email] | 1 rdw=# --finally, no more duplicates! rdw=# create unique index broken_email_key on broken (user_email); ERROR: Cannot create unique index. Table contains non-unique values rdw=# select version(); version ------------------------------------------------------------- PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) $psql -l List of databases Name | Owner | Encoding -----------+----------+----------- ... rdw | alex | WIN ... The server is run under ru_RU.CP1251 locale, initdb was done under the same locale. As you probably guessed, reg_user table contains actual email addresses of our site's registered users. I can provide them for testing (without passwords and stuff, of course) if this is going to help. -- Yours, Alexey V. Borzov, Webmaster of RDW.ru
Unbelievable, but you may have a couple of record with null user_email. Try: select * from broken where user_email is null; select * from reg_user where user_email is null; Alexey Borzov wrote: > > Greetings! > > Before I present the following, I must say that 'simple' index > corruption is highly improbable; the server did not suffer hard > reboots. > > rdw=# \d reg_user > Table "reg_user" > Column | Type | Modifiers > ------------------+------------------------+------------------------------------------------ > user_id | integer | not null default nextval('reg_user_seq'::text) > user_email | character varying(50) | not null > user_passwd | character varying(32) | not null > user_active | boolean | not null default 't' > user_allow_pauth | boolean | default 'f' > user_full_name | character varying(100) | > user_pseudonym | character varying(100) | > user_who | character(1) | > Primary key: reg_user_pkey > Unique keys: reg_user_email_key > > rdw=# \d reg_user_email_key > Index "reg_user_email_key" > Column | Type > ------------+----------------------- > user_email | character varying(50) > unique btree > > rdw=# create table broken as select user_email from reg_user; > SELECT > > rdw=# -- one may think that emails are unique... > rdw=# select count(user_email) from broken; > count > -------- > 212400 > (1 row) > > rdw=# select count(distinct user_email) from broken; > count > -------- > 212397 > (1 row) > > rdw=# -- look closely at the numbers > rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc; > user_email | count > ----------------------------------------------------+------- > [duplicated email] | 2 > [some email] | 1 > [some email] | 1 > [some email] | 1 > > rdw=# delete from broken where user_email = '[duplicated email]'; > DELETE 2 > > rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc; > user_email | count > ----------------------------------------------------+------- > [another email] | 2 > [some email] | 1 > [some email] | 1 > > rdw=# delete from broken where user_email = '[another email]'; > DELETE 2 > > rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc; > user_email | count > ----------------------------------------------------+------- > [some email] | 1 > [some email] | 1 > [some email] | 1 > > rdw=# --finally, no more duplicates! > rdw=# create unique index broken_email_key on broken (user_email); > ERROR: Cannot create unique index. Table contains non-unique values > > rdw=# select version(); > version > ------------------------------------------------------------- > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 > (1 row) > > $psql -l > List of databases > Name | Owner | Encoding > -----------+----------+----------- > ... > rdw | alex | WIN > ... > > The server is run under ru_RU.CP1251 locale, initdb was done under the > same locale. > > As you probably guessed, reg_user table contains actual email > addresses of our site's registered users. I can provide them for > testing (without passwords and stuff, of course) if this is going to > help. > > -- > Yours, Alexey V. Borzov, Webmaster of RDW.ru > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
Hello Jean-Luc, Friday, June 07, 2002, 8:26:55 PM, you wrote: JLL> Unbelievable, but you may have a couple of record with null user_email. JLL> Try: JLL> select * from broken where user_email is null; JLL> select * from reg_user where user_email is null; Nope, there are none. Which is expected, 'cause user_email was declared not null: rdw=# select * from reg_user where user_email is null; user_id | user_email | user_passwd | user_active | user_allow_pauth | user_full_name | user_pseudonym | user_who ---------+------------+-------------+-------------+------------------+----------------+----------------+---------- (0 rows) rdw=# select * from broken where user_email is null; user_email ------------ (0 rows) And the locale is probably OK, too. I tried dumping the table in question and passing it through sort/uniq. It worked OK, unlike GROUP BY in Postgres. JLL> Alexey Borzov wrote: >> >> Greetings! >> >> Before I present the following, I must say that 'simple' index >> corruption is highly improbable; the server did not suffer hard >> reboots. >> [skip] -- Yours, Alexey V. Borzov, webmaster of RDW.ru
Alexey Borzov <borz_off@rdw.ru> writes: > rdw=# --finally, no more duplicates! > rdw=# create unique index broken_email_key on broken (user_email); > ERROR: Cannot create unique index. Table contains non-unique values Seems pretty broken :-( > rdw=# select version(); > version > ------------------------------------------------------------- > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 Uh, what is the platform exactly? > As you probably guessed, reg_user table contains actual email > addresses of our site's registered users. I can provide them for > testing (without passwords and stuff, of course) if this is going to > help. If you could send me the email address data (off-list), I could try to reproduce the problem here. regards, tom lane