RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs
From | David Raymond |
---|---|
Subject | RE: BUG #16031: Group by returns duplicate groups |
Date | |
Msg-id | VI1PR07MB6029587F019D6F4735411CC5879A0@VI1PR07MB6029.eurprd07.prod.outlook.com Whole thread Raw |
In response to | Re: BUG #16031: Group by returns duplicate groups (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-bugs |
" Can you try running this with enable_hashagg = off? That should give you another Sort on the CTE Scan, and a GroupAggregateat the top. I wonder if that makes the issue go away ..." Here's what I get for that. And extra weirdness below: testing=> set enable_hashagg = off; SET Time: 0.241 ms testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name havingcount(*) > 1; name ------ (0 rows) Time: 10423.486 ms (00:10.423) testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name fromweird_grouping group by name) select name from foo group by name having count(*) > 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=737694.91..744780.15 rows=67 width=516) (actual time=7650.411..7650.411 rows=0 loops=1) Output: foo.name Group Key: foo.name Filter: (count(*) > 1) Rows Removed by Filter: 1176103 Buffers: shared hit=160 read=8156, temp read=14645 written=19235 CTE foo -> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4811.449..6027.355 rows=1176103 loops=1) Output: weird_grouping.name Group Key: weird_grouping.name Buffers: shared hit=160 read=8156, temp read=7800 written=7830 -> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4811.447..5884.667 rows=1297265 loops=1) Output: weird_grouping.name Sort Key: weird_grouping.name Sort Method: external merge Disk: 39048kB Buffers: shared hit=160 read=8156, temp read=7800 written=7830 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.058..117.833rows=1297265 loops=1) Output: weird_grouping.name Buffers: shared hit=160 read=8156 -> Sort (cost=551594.86..553955.78 rows=944366 width=516) (actual time=6915.562..7418.978 rows=1176103 loops=1) Output: foo.name Sort Key: foo.name Sort Method: external merge Disk: 36368kB Buffers: shared hit=160 read=8156, temp read=14645 written=19235 -> CTE Scan on foo (cost=0.00..18887.32 rows=944366 width=516) (actual time=4811.451..6243.160 rows=1176103 loops=1) Output: foo.name Buffers: shared hit=160 read=8156, temp read=7800 written=12363 Settings: enable_hashagg = 'off', search_path = 'name_stuff' Planning Time: 0.064 ms Execution Time: 10175.478 ms (30 rows) Time: 10175.906 ms (00:10.176) testing=> But now here's another weird bit: testing=> select count(*), count(distinct name) from weird_grouping; count | count -----------+----------- 1,297,265 | 1,176,103 (1 row) Time: 6866.369 ms (00:06.866) testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) select count(*), count(distinct name) fromweird_grouping; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=27774.98..27774.99 rows=1 width=16) (actual time=6642.856..6642.856 rows=1 loops=1) Output: count(*), count(DISTINCT name) Buffers: shared hit=928 read=7388, temp read=5484 written=5506 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.072..110.798 rows=1297265loops=1) Output: name Buffers: shared hit=928 read=7388 Settings: search_path = 'name_stuff' Planning Time: 0.030 ms Execution Time: 6642.875 ms (9 rows) Time: 6643.181 ms (00:06.643) testing=> select count(*), count(distinct name collate "C") from weird_grouping; count | count -----------+----------- 1,297,265 | 1,176,101 (1 row) Time: 1655.202 ms (00:01.655) testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) select count(*), count(distinct name collate"C") from weird_grouping; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=27774.98..27774.99 rows=1 width=16) (actual time=1788.276..1788.277 rows=1 loops=1) Output: count(*), count(DISTINCT (name)::character varying(254)) Buffers: shared hit=992 read=7324, temp read=5484 written=5506 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.059..112.815 rows=1297265loops=1) Output: name Buffers: shared hit=992 read=7324 Settings: search_path = 'name_stuff' Planning Time: 0.030 ms Execution Time: 1788.295 ms (9 rows) Time: 1788.596 ms (00:01.789) testing=> show lc_collate; lc_collate ------------ en-US (1 row) Time: 0.122 ms testing=> show server_encoding; server_encoding ----------------- UTF8 (1 row) Time: 0.082 ms testing=> select count(*), count(distinct name collate "en-US") from weird_grouping; ERROR: collation "en-US" for encoding "UTF8" does not exist LINE 1: select count(*), count(distinct name collate "en-US") from w... ^ Time: 5.759 ms testing=>
pgsql-bugs by date: