Thread: "Unvisible" duplicates in SELECT [...] HAVING count(attrib) > 1

"Unvisible" duplicates in SELECT [...] HAVING count(attrib) > 1

From
Krzysztof Gajdemski
Date:
============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name        : Krzysztof Gajdemski
Your email address    : songo@vi.pl


System Configuration
---------------------
  Architecture (example: Intel Pentium)      : Intel Pentium

  Operating System (example: Linux 2.0.26 ELF)     : 2.2.19 SMP

  PostgreSQL version (example: PostgreSQL-7.1):   PostgreSQL-7.1rc4

  Compiler used (example:  gcc 2.8.0)        : gcc 2.95.3


Please enter a FULL description of your problem:
------------------------------------------------

SELECT attrib FROM table GROUP BY attrib HAVING count(attrib) > 1;

query doesn't produce reliable results for some specific circumstances
(I suppose that number of records and/or strange ASCII chars in varchar
fields are one of required conditions to reproduce this behaviour, but I
can't define additional for now). Table which I use in following example
is available at: http://i.use.vi.pl/postgres/dup_example.sql.gz (size
400KB, this is why I don't place it here).

Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
songo=> SELECT version();
                            version
----------------------------------------------------------------
 PostgreSQL 7.1rc4 on i686-pc-linux-gnu, compiled by GCC 2.95.3
(1 row)

songo=> \i dup_example.sql
psql:dup_example.sql:14: NOTICE:  CREATE TABLE/PRIMARY KEY will create implicit
index 'user_id_pkey' for table 'user_id'
CREATE
CHANGE
CHANGE
songo=> SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
    nick
------------
 sexylady\@
(1 row)

songo=> DELETE FROM user_id WHERE nick IN (
songo(> SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1);
DELETE 2
songo=> SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
 nick
------
(0 rows)

songo=> CREATE UNIQUE INDEX nick_idx ON user_id(nick);
ERROR:  Cannot create unique index. Table contains non-unique values
songo=> SELECT nick FROM user_id WHERE nick LIKE 'dr dre%';
        nick
--------------------
 dr dre\$\$\$\$\$\$
 dr dre\$\$\$\$\$\$
(2 rows)

songo=> SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
 nick
------
(0 rows)

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------


      k.
--
- - - - - -  Krzysztof Gajdemski | songo @ vi.pl | KG4751-RIPE
Registered Linux User # 133457 | BLUG Registered Member # 0005
PGP public keys at: keyserver.linux.bydg.org * KeyID: D336705B
,,Szanujê was wszystkich, którzy pozostajecie w cieniu'' SNERG

Re: "Unvisible" duplicates in SELECT [...] HAVING count(attrib) > 1

From
Tom Lane
Date:
Krzysztof Gajdemski <songo@vi.pl> writes:
>   PostgreSQL version (example: PostgreSQL-7.1):   PostgreSQL-7.1rc4

> query doesn't produce reliable results for some specific circumstances

I tried your example in current sources and get reasonable-looking
behavior:

regression=#  SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
        nick
--------------------
 dr dre\$\$\$\$\$\$
 sexylady\@
(2 rows)

regression=# DELETE FROM user_id WHERE nick IN (
regression(# SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1);
DELETE 4
regression=# SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
 nick
------
(0 rows)

regression=# CREATE UNIQUE INDEX nick_idx ON user_id(nick);
CREATE
regression=# SELECT nick FROM user_id WHERE nick LIKE 'dr dre%';
 nick
------
(0 rows)

regression=# SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
 nick
------
(0 rows)

regression=#

It could be that there's a locale dependency here (I did this in a
non-locale-enabled compilation), but I think it's more likely that
you've run into one of the complex-query bugs we corrected since rc4.
Please update to 7.1.1 (or better 7.1.2, which should be out very soon)
and see if you still see a problem.

            regards, tom lane

Re: "Unvisible" duplicates in SELECT [...] HAVING count(attrib) > 1

From
Krzysztof Gajdemski
Date:
18.05.2001, 11:12:27, Tom Lane wrote:

> Krzysztof Gajdemski <songo@vi.pl> writes:
> >   PostgreSQL version (example: PostgreSQL-7.1):   PostgreSQL-7.1rc4
>
> > query doesn't produce reliable results for some specific circumstances
>
> I tried your example in current sources and get reasonable-looking
> behavior:
>
> regression=#  SELECT nick FROM user_id GROUP BY nick HAVING count(nick) > 1;
>         nick
> --------------------
>  dr dre\$\$\$\$\$\$
>  sexylady\@
> (2 rows)

[ cut ]

> It could be that there's a locale dependency here (I did this in a
> non-locale-enabled compilation),

I've used binary PostgreSQL package from Debian/unstable distribution,
so could it be, that this is Debian specific problem if you can't
reproduce it.

> but I think it's more likely that
> you've run into one of the complex-query bugs we corrected since rc4.
> Please update to 7.1.1 (or better 7.1.2, which should be out very soon)
> and see if you still see a problem.

Yes, it is a good idea. I've to try compile it directly from source
(with locale support enabled).

      k.
--
- - - - - -  Krzysztof Gajdemski | songo @ vi.pl | KG4751-RIPE
Registered Linux User # 133457 | BLUG Registered Member # 0005
PGP public keys at: keyserver.linux.bydg.org * KeyID: D336705B
,,Szanujê was wszystkich, którzy pozostajecie w cieniu'' SNERG