Thread: weired behavior... after pg_resetxlog-> dump->initdb-->reload.
I recovered my database by pg_resetxlog and then did a dump , initdb , reload. one of the tables exhibited this phenomenon. I thought it was worth mentioning to the developers. Regds mallah. rt2=# SELECT name from users group by name having count(*) > 1 limit 20; +----------------------+ | name | +----------------------+ | �˾ | | ����163.com | | �@hotmail.com | | p� | +----------------------+ (4 rows) Time: 1961.199 ms rt2=# rt2=# SELECT name from users group by name having count(*) > 1 ; +------+ | name | +------+ +------+ (0 rows) Time: 789.184 ms
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > rt2=# SELECT name from users group by name having count(*) > 1 limit 20; > (4 rows) > rt2=# SELECT name from users group by name having count(*) > 1 ; > (0 rows) Weird. What PG version is this? What does EXPLAIN show for each of these queries (maybe EXPLAIN ANALYZE too)? Can you provide a self-contained test case? regards, tom lane
Dear Tom,
Sorry i could not replicate that issue . But
here is something which was the original problem
and is replicable.
Regds
Mallah.
rt2=# CREATE UNIQUE INDEX users_1 on users_sample(name);
ERROR: could not create unique index
DETAIL: Table contains duplicated values.
rt2=#
rt2=# SELECT name from users_sample group by name having count(*) > 1 limit 20;
+------+
| name |
+------+
+------+
(0 rows)
Time: 766.725 ms
rt2=# SELECT version();
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+------+
| name |
+------+
+------+
(0 rows)
Time: 905.586 ms
rt2=# SELECT version();
+----------------------------------------------------------------------------------------------------------+
| version |
+----------------------------------------------------------------------------------------------------------+
| PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-34) |
+----------------------------------------------------------------------------------------------------------+
(1 row)
Time: 338.191 ms
Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:rt2=# SELECT name from users group by name having count(*) > 1 limit 20; (4 rows) rt2=# SELECT name from users group by name having count(*) > 1 ; (0 rows)Weird. What PG version is this? What does EXPLAIN show for each of these queries (maybe EXPLAIN ANALYZE too)? Can you provide a self-contained test case? regards, tom lane
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > here is something which was the original problem > and is replicable. Hm. Does EXPLAIN show that the GROUP BY query is using hash aggregation? Does its behavior change if you turn off enable_hashagg? regards, tom lane
Dear Tom,
It does.
rt2=# explain SELECT name from users_sample group by name having count(*) > 1 ;
+-----------------------------------------------------------------------+
| QUERY PLAN |
+-----------------------------------------------------------------------+
| HashAggregate (cost=25.00..26.00 rows=200 width=78) |
| Filter: (count(*) > 1) |
| -> Seq Scan on users_sample (cost=0.00..20.00 rows=1000 width=78) |
+-----------------------------------------------------------------------+
(3 rows)
Time: 375.619 ms
rt2=# explain analyze SELECT name from users_sample group by name having count(*) > 1 ;
+------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------+
| HashAggregate (cost=25.00..26.00 rows=200 width=78) (actual time=567.981..567.981 rows=0 loops=1) |
| Filter: (count(*) > 1) |
| -> Seq Scan on users_sample (cost=0.00..20.00 rows=1000 width=78) (actual time=0.046..160.706 rows=100489 loops=1) |
| Total runtime: 577.212 ms |
+------------------------------------------------------------------------------------------------------------------------+
(4 rows)
Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:here is something which was the original problem and is replicable.Hm. Does EXPLAIN show that the GROUP BY query is using hash aggregation? Does its behavior change if you turn off enable_hashagg? regards, tom lane
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > It does. And? > Tom Lane wrote: >> Hm. Does EXPLAIN show that the GROUP BY query is using hash >> aggregation? Does its behavior change if you turn off enable_hashagg? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ regards, tom lane
Tom Lane wrote:
disabling hash aggregates. On disabling i could get the repeated rows.
Regds
Mallah.
rt2=# SET enable_hashagg TO off;
SET
Time: 329.533 ms
rt2=# explain analyze SELECT name from users_sample group by name having count(*) > 1 ;
+------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------+
| GroupAggregate (cost=69.83..78.33 rows=200 width=78) (actual time=2411.411..2749.733 rows=4 loops=1) |
| Filter: (count(*) > 1) |
| -> Sort (cost=69.83..72.33 rows=1000 width=78) (actual time=2411.300..2532.821 rows=100489 loops=1) |
| Sort Key: name |
| -> Seq Scan on users_sample (cost=0.00..20.00 rows=1000 width=78) (actual time=0.024..170.258 rows=100489 loops=1) |
| Total runtime: 2915.439 ms |
+------------------------------------------------------------------------------------------------------------------------------+
(6 rows)
Time: 3265.102 ms
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+----------------------+
| name |
+----------------------+
| ��� |
| ����163.com |
| �@hotmail.com |
| p� |
+----------------------+
(4 rows)
Time: 3358.030 ms
rt2=# SET enable_hashagg TO on;
SET
Time: 330.148 ms
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+------+
| name |
+------+
+------+
(0 rows)
Time: 752.395 ms
rt2=#
Yep the problem of original posting could be replicated onRajesh Kumar Mallah <mallah@trade-india.com> writes:It does.And?
disabling hash aggregates. On disabling i could get the repeated rows.
Regds
Mallah.
rt2=# SET enable_hashagg TO off;
SET
Time: 329.533 ms
rt2=# explain analyze SELECT name from users_sample group by name having count(*) > 1 ;
+------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------+
| GroupAggregate (cost=69.83..78.33 rows=200 width=78) (actual time=2411.411..2749.733 rows=4 loops=1) |
| Filter: (count(*) > 1) |
| -> Sort (cost=69.83..72.33 rows=1000 width=78) (actual time=2411.300..2532.821 rows=100489 loops=1) |
| Sort Key: name |
| -> Seq Scan on users_sample (cost=0.00..20.00 rows=1000 width=78) (actual time=0.024..170.258 rows=100489 loops=1) |
| Total runtime: 2915.439 ms |
+------------------------------------------------------------------------------------------------------------------------------+
(6 rows)
Time: 3265.102 ms
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+----------------------+
| name |
+----------------------+
| ��� |
| ����163.com |
| �@hotmail.com |
| p� |
+----------------------+
(4 rows)
Time: 3358.030 ms
rt2=# SET enable_hashagg TO on;
SET
Time: 330.148 ms
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+------+
| name |
+------+
+------+
(0 rows)
Time: 752.395 ms
rt2=#
Tom Lane wrote:Hm. Does EXPLAIN show that the GROUP BY query is using hash aggregation? Does its behavior change if you turn off enable_hashagg?^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ regards, tom lane
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > Yep the problem of original posting could be replicated on > disabling hash aggregates. On disabling i could get the repeated rows. Okay. What I suspect is happening is that there are entries in the column that are equal according to the datatype's comparison function, but are not bitwise equal and therefore yield different hash codes. This makes it a crapshoot whether they are seen to be equal or not when hash aggregation is used. We identified a similar bug in the inet/cidr datatypes just a few weeks ago. What exactly is the datatype of the "name" column? If it's a text type, what database encoding and locale settings (LC_COLLATE/LC_CTYPE) are you using? Can you investigate exactly what's stored within each of these groups of matching names? regards, tom lane
Tom Lane wrote:
name | character varying(120) | not null
List of databases
+-----------------+----------+-----------+
| Name | Owner | Encoding |
+-----------------+----------+-----------+
| bric | postgres | UNICODE |
+--------------------------------+-----------------+
| name | setting |
+--------------------------------+-----------------+
| lc_collate | en_US.UTF-8 |
| lc_ctype | en_US.UTF-8 |
| lc_messages | en_US.iso885915 |
| lc_monetary | en_US.iso885915 |
| lc_numeric | en_US.iso885915 |
| lc_time | en_US.iso885915 |
Can you tell me how to do it please?
Mallah.
Rajesh Kumar Mallah <mallah@trade-india.com> writes:Yep the problem of original posting could be replicated on disabling hash aggregates. On disabling i could get the repeated rows.Okay. What I suspect is happening is that there are entries in the column that are equal according to the datatype's comparison function, but are not bitwise equal and therefore yield different hash codes. This makes it a crapshoot whether they are seen to be equal or not when hash aggregation is used. We identified a similar bug in the inet/cidr datatypes just a few weeks ago. What exactly is the datatype of the "name" column?
name | character varying(120) | not null
If it's a text type, what database encoding and locale settings (LC_COLLATE/LC_CTYPE) are you using?
List of databases
+-----------------+----------+-----------+
| Name | Owner | Encoding |
+-----------------+----------+-----------+
| bric | postgres | UNICODE |
+--------------------------------+-----------------+
| name | setting |
+--------------------------------+-----------------+
| lc_collate | en_US.UTF-8 |
| lc_ctype | en_US.UTF-8 |
| lc_messages | en_US.iso885915 |
| lc_monetary | en_US.iso885915 |
| lc_numeric | en_US.iso885915 |
| lc_time | en_US.iso885915 |
Can you investigate exactly what's stored within each of these groups of matching names?
Can you tell me how to do it please?
Regdsregards, tom lane
Mallah.