Home > mailing lists

RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs

From	David Raymond
Subject	RE: BUG #16031: Group by returns duplicate groups
Date	October 8, 2019 13:36:50
Msg-id	VI1PR07MB6029587F019D6F4735411CC5879A0@VI1PR07MB6029.eurprd07.prod.outlook.com Whole thread Raw
In response to	Re: BUG #16031: Group by returns duplicate groups (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-bugs

Tree view

" Can you try running this with enable_hashagg = off? That should give you another Sort on the CTE Scan, and a
GroupAggregateat the top. I wonder if that makes the issue go away ..."
 

Here's what I get for that. And extra weirdness below:


testing=> set enable_hashagg = off;
SET
Time: 0.241 ms
testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name
havingcount(*) > 1;
 
 name
------
(0 rows)

Time: 10423.486 ms (00:10.423)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name
fromweird_grouping group by name) select name from foo group by name having count(*) > 1;
 
                                                                        QUERY PLAN
                             
 

----------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=737694.91..744780.15 rows=67 width=516) (actual time=7650.411..7650.411 rows=0 loops=1)
   Output: foo.name
   Group Key: foo.name
   Filter: (count(*) > 1)
   Rows Removed by Filter: 1176103
   Buffers: shared hit=160 read=8156, temp read=14645 written=19235
   CTE foo
     ->  Group  (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4811.449..6027.355 rows=1176103 loops=1)
           Output: weird_grouping.name
           Group Key: weird_grouping.name
           Buffers: shared hit=160 read=8156, temp read=7800 written=7830
           ->  Sort  (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4811.447..5884.667 rows=1297265
loops=1)
                 Output: weird_grouping.name
                 Sort Key: weird_grouping.name
                 Sort Method: external merge  Disk: 39048kB
                 Buffers: shared hit=160 read=8156, temp read=7800 written=7830
                 ->  Seq Scan on name_stuff.weird_grouping  (cost=0.00..21288.65 rows=1297265 width=20) (actual
time=0.058..117.833rows=1297265 loops=1)
 
                       Output: weird_grouping.name
                       Buffers: shared hit=160 read=8156
   ->  Sort  (cost=551594.86..553955.78 rows=944366 width=516) (actual time=6915.562..7418.978 rows=1176103 loops=1)
         Output: foo.name
         Sort Key: foo.name
         Sort Method: external merge  Disk: 36368kB
         Buffers: shared hit=160 read=8156, temp read=14645 written=19235
         ->  CTE Scan on foo  (cost=0.00..18887.32 rows=944366 width=516) (actual time=4811.451..6243.160 rows=1176103
loops=1)
               Output: foo.name
               Buffers: shared hit=160 read=8156, temp read=7800 written=12363
 Settings: enable_hashagg = 'off', search_path = 'name_stuff'
 Planning Time: 0.064 ms
 Execution Time: 10175.478 ms
(30 rows)

Time: 10175.906 ms (00:10.176)
testing=>




But now here's another weird bit:



testing=> select count(*), count(distinct name) from weird_grouping;
   count   |   count
-----------+-----------
 1,297,265 | 1,176,103
(1 row)

Time: 6866.369 ms (00:06.866)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) select count(*), count(distinct name)
fromweird_grouping;
 
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=27774.98..27774.99 rows=1 width=16) (actual time=6642.856..6642.856 rows=1 loops=1)
   Output: count(*), count(DISTINCT name)
   Buffers: shared hit=928 read=7388, temp read=5484 written=5506
   ->  Seq Scan on name_stuff.weird_grouping  (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.072..110.798
rows=1297265loops=1)
 
         Output: name
         Buffers: shared hit=928 read=7388
 Settings: search_path = 'name_stuff'
 Planning Time: 0.030 ms
 Execution Time: 6642.875 ms
(9 rows)

Time: 6643.181 ms (00:06.643)
testing=> select count(*), count(distinct name collate "C") from weird_grouping;
   count   |   count
-----------+-----------
 1,297,265 | 1,176,101
(1 row)

Time: 1655.202 ms (00:01.655)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) select count(*), count(distinct name
collate"C") from weird_grouping;
 
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=27774.98..27774.99 rows=1 width=16) (actual time=1788.276..1788.277 rows=1 loops=1)
   Output: count(*), count(DISTINCT (name)::character varying(254))
   Buffers: shared hit=992 read=7324, temp read=5484 written=5506
   ->  Seq Scan on name_stuff.weird_grouping  (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.059..112.815
rows=1297265loops=1)
 
         Output: name
         Buffers: shared hit=992 read=7324
 Settings: search_path = 'name_stuff'
 Planning Time: 0.030 ms
 Execution Time: 1788.295 ms
(9 rows)

Time: 1788.596 ms (00:01.789)
testing=> show lc_collate;
 lc_collate
------------
 en-US
(1 row)

Time: 0.122 ms
testing=> show server_encoding;
 server_encoding
-----------------
 UTF8
(1 row)

Time: 0.082 ms
testing=> select count(*), count(distinct name collate "en-US") from weird_grouping;
ERROR:  collation "en-US" for encoding "UTF8" does not exist
LINE 1: select count(*), count(distinct name collate "en-US") from w...
                                             ^
Time: 5.759 ms
testing=>

pgsql-bugs by date:

From: Pavel Pleva
Date: 08 October 2019, 12:30:56
Subject: Potential to_date(string, string) function malfunction

From: Tom Lane
Date: 08 October 2019, 15:38:44
Subject: Re: Potential to_date(string, string) function malfunction

RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs

Previous

Next