Thread: sorting/grouping/(non-)unique indexes bug

sorting/grouping/(non-)unique indexes bug

From
Alexey Borzov
Date:
Greetings!

Before I present the following, I must say that 'simple' index
corruption is highly improbable; the server did not suffer hard
reboots.

rdw=# \d reg_user
                                      Table "reg_user"
      Column      |          Type          |                   Modifiers
------------------+------------------------+------------------------------------------------
 user_id          | integer                | not null default nextval('reg_user_seq'::text)
 user_email       | character varying(50)  | not null
 user_passwd      | character varying(32)  | not null
 user_active      | boolean                | not null default 't'
 user_allow_pauth | boolean                | default 'f'
 user_full_name   | character varying(100) |
 user_pseudonym   | character varying(100) |
 user_who         | character(1)           |
Primary key: reg_user_pkey
Unique keys: reg_user_email_key

rdw=# \d reg_user_email_key
     Index "reg_user_email_key"
   Column   |         Type
------------+-----------------------
 user_email | character varying(50)
unique btree

rdw=# create table broken as select user_email from reg_user;
SELECT

rdw=# -- one may think that emails are unique...
rdw=# select count(user_email) from broken;
 count
--------
 212400
(1 row)

rdw=# select count(distinct user_email) from broken;
 count
--------
 212397
(1 row)

rdw=# -- look closely at the numbers
rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
                     user_email                     | count
----------------------------------------------------+-------
 [duplicated email]                                 |     2
 [some email]                                       |     1
 [some email]                                       |     1
 [some email]                                       |     1

rdw=# delete from broken where user_email = '[duplicated email]';
DELETE 2

rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
                     user_email                     | count
----------------------------------------------------+-------
 [another email]                                    |     2
 [some email]                                       |     1
 [some email]                                       |     1

rdw=# delete from broken where user_email = '[another email]';
DELETE 2

rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
                     user_email                     | count
----------------------------------------------------+-------
 [some email]                                       |     1
 [some email]                                       |     1
 [some email]                                       |     1

rdw=# --finally, no more duplicates!
rdw=# create unique index broken_email_key on broken (user_email);
ERROR:  Cannot create unique index. Table contains non-unique values

rdw=# select version();
                           version
-------------------------------------------------------------
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

$psql -l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+-----------
...
 rdw       | alex     | WIN
...

The server is run under ru_RU.CP1251 locale, initdb was done under the
same locale.

As you probably guessed, reg_user table contains actual email
addresses of our site's registered users. I can provide them for
testing (without passwords and stuff, of course) if this is going to
help.

--
Yours, Alexey V. Borzov, Webmaster of RDW.ru



Re: sorting/grouping/(non-)unique indexes bug

From
Jean-Luc Lachance
Date:
Unbelievable, but you may have a couple of record with null user_email.

Try:

select * from broken where user_email is null;
select * from reg_user where user_email is null;


Alexey Borzov wrote:
>
> Greetings!
>
> Before I present the following, I must say that 'simple' index
> corruption is highly improbable; the server did not suffer hard
> reboots.
>
> rdw=# \d reg_user
>                                       Table "reg_user"
>       Column      |          Type          |                   Modifiers
> ------------------+------------------------+------------------------------------------------
>  user_id          | integer                | not null default nextval('reg_user_seq'::text)
>  user_email       | character varying(50)  | not null
>  user_passwd      | character varying(32)  | not null
>  user_active      | boolean                | not null default 't'
>  user_allow_pauth | boolean                | default 'f'
>  user_full_name   | character varying(100) |
>  user_pseudonym   | character varying(100) |
>  user_who         | character(1)           |
> Primary key: reg_user_pkey
> Unique keys: reg_user_email_key
>
> rdw=# \d reg_user_email_key
>      Index "reg_user_email_key"
>    Column   |         Type
> ------------+-----------------------
>  user_email | character varying(50)
> unique btree
>
> rdw=# create table broken as select user_email from reg_user;
> SELECT
>
> rdw=# -- one may think that emails are unique...
> rdw=# select count(user_email) from broken;
>  count
> --------
>  212400
> (1 row)
>
> rdw=# select count(distinct user_email) from broken;
>  count
> --------
>  212397
> (1 row)
>
> rdw=# -- look closely at the numbers
> rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
>                      user_email                     | count
> ----------------------------------------------------+-------
>  [duplicated email]                                 |     2
>  [some email]                                       |     1
>  [some email]                                       |     1
>  [some email]                                       |     1
>
> rdw=# delete from broken where user_email = '[duplicated email]';
> DELETE 2
>
> rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
>                      user_email                     | count
> ----------------------------------------------------+-------
>  [another email]                                    |     2
>  [some email]                                       |     1
>  [some email]                                       |     1
>
> rdw=# delete from broken where user_email = '[another email]';
> DELETE 2
>
> rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
>                      user_email                     | count
> ----------------------------------------------------+-------
>  [some email]                                       |     1
>  [some email]                                       |     1
>  [some email]                                       |     1
>
> rdw=# --finally, no more duplicates!
> rdw=# create unique index broken_email_key on broken (user_email);
> ERROR:  Cannot create unique index. Table contains non-unique values
>
> rdw=# select version();
>                            version
> -------------------------------------------------------------
>  PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
> (1 row)
>
> $psql -l
>         List of databases
>    Name    |  Owner   | Encoding
> -----------+----------+-----------
> ...
>  rdw       | alex     | WIN
> ...
>
> The server is run under ru_RU.CP1251 locale, initdb was done under the
> same locale.
>
> As you probably guessed, reg_user table contains actual email
> addresses of our site's registered users. I can provide them for
> testing (without passwords and stuff, of course) if this is going to
> help.
>
> --
> Yours, Alexey V. Borzov, Webmaster of RDW.ru
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

Re: sorting/grouping/(non-)unique indexes bug

From
"Alexey V. Borzov"
Date:
Hello Jean-Luc,

Friday, June 07, 2002, 8:26:55 PM, you wrote:

JLL> Unbelievable, but you may have a couple of record with null user_email.

JLL> Try:

JLL> select * from broken where user_email is null;
JLL> select * from reg_user where user_email is null;

Nope, there are none. Which is expected, 'cause user_email was
declared not null:

rdw=# select * from reg_user where user_email is null;
 user_id | user_email | user_passwd | user_active | user_allow_pauth | user_full_name | user_pseudonym | user_who
---------+------------+-------------+-------------+------------------+----------------+----------------+----------
(0 rows)

rdw=# select * from broken where user_email is null;
 user_email
------------
(0 rows)

And the locale is probably OK, too. I tried dumping the table in
question and passing it through sort/uniq. It worked OK, unlike GROUP BY in
Postgres.


JLL> Alexey Borzov wrote:
>>
>> Greetings!
>>
>> Before I present the following, I must say that 'simple' index
>> corruption is highly improbable; the server did not suffer hard
>> reboots.
>>
[skip]


--
Yours, Alexey V. Borzov, webmaster of RDW.ru


Re: sorting/grouping/(non-)unique indexes bug

From
Tom Lane
Date:
Alexey Borzov <borz_off@rdw.ru> writes:
> rdw=# --finally, no more duplicates!
> rdw=# create unique index broken_email_key on broken (user_email);
> ERROR:  Cannot create unique index. Table contains non-unique values

Seems pretty broken :-(

> rdw=# select version();
>                            version
> -------------------------------------------------------------
>  PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96

Uh, what is the platform exactly?

> As you probably guessed, reg_user table contains actual email
> addresses of our site's registered users. I can provide them for
> testing (without passwords and stuff, of course) if this is going to
> help.

If you could send me the email address data (off-list), I could try
to reproduce the problem here.

            regards, tom lane