BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16031: Group by returns duplicate groups |
Date | |
Msg-id | 16031-4ef55395a5fbb687@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16031: Group by returns duplicate groups
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16031 Logged by: David Raymond Email address: david.raymond@tomtom.com PostgreSQL version: 11.5 Operating system: Windows 10 Description: I have a large table which I'm running a query on, grouping on a single field, and returning only that 1 field, filtered with a HAVING clause. The results that I'm getting back however contain a duplicate entry for one of the returned values, which I believe should never be able to happen. I'm working on dropping as many extra fields and records as I can to still get the bad result from something small enough to send as a test case, but figured I'd post this while I'm at it to get any advice. I've done a dump of the table and then restored it to a new table, and still get the same weird results. On the original version of the table I had clustered it on the index that starts with the field being grouped on, but it's still giving the bad groups after a dump and restore. I'm running 11.5 on Windows 10 (Enterprise DB installer) and unfortunately don't have the resources to build a new version myself if you come up with a patch (also why I'm hoping to shrink it down to where others can test it) Here's output from psql: testing=> select version(); version ------------------------------------------------------------ PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit (1 row) Time: 0.272 ms testing=> \d+ big_table Table "name_stuff.big_table" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------+------------------------+-----------+----------+---------+----------+--------------+------------- id_1 | uuid | | not null | | plain | | field_2 | uuid | | | | plain | | name | character varying(254) | | not null | | extended | | field_4 | character varying(254) | | | | extended | | field_5 | numeric(2,0) | | not null | | main | | field_6 | character varying(4) | | | | extended | | field_7 | character varying(3) | | not null | | extended | | field_8 | character varying(3) | | | | extended | | arr_field_1 | character varying(254) | | | | extended | | arr_field_2 | character varying(254) | | | | extended | | arr_field_3 | character varying(254) | | | | extended | | arr_field_4 | character varying(254) | | | | extended | | arr_field_5 | character varying(254) | | | | extended | | arr_field_6 | character varying(254) | | | | extended | | field_15 | boolean | | | | plain | | field_16 | boolean | | | | plain | | id_2 | text | | not null | | extended | | Indexes: "big_table_pkey" PRIMARY KEY, btree (id_1, id_2) "big_table_name_id_1_id_2_idx" btree (name, id_1, id_2) testing=> select count(*) from big_table; count ------------- 108,565,086 (1 row) Time: 273770.205 ms (04:33.770) testing=> explain (analyze, verbose, costs, buffers, timing, summary) create table bad_groups_1 as select name from big_table group by name having count(distinct array[arr_field_1, arr_field_2, arr_field_3, arr_field_4, arr_field_5, arr_field_6]) > 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=29317810.81..30149735.19 rows=487417 width=22) (actual time=758501.326..895894.082 rows=745024 loops=1) Output: name Group Key: big_table.name Filter: (count(DISTINCT ARRAY[big_table.arr_field_1, big_table.arr_field_2, big_table.arr_field_3, big_table.arr_field_4, big_table.arr_field_5, big_table.arr_field_6]) > 1) Rows Removed by Filter: 80610652 Buffers: shared hit=2325 read=1515081, temp read=2464481 written=2467410 -> Sort (cost=29317810.81..29589026.23 rows=108486168 width=57) (actual time=758493.476..819035.136 rows=108565086 loops=1) Output: name, arr_field_1, arr_field_2, arr_field_3, arr_field_4, arr_field_5, arr_field_6 Sort Key: big_table.name Sort Method: external merge Disk: 4174488kB Buffers: shared hit=2317 read=1515081, temp read=2464481 written=2467410 -> Seq Scan on name_stuff.big_table (cost=0.00..2602259.68 rows=108486168 width=57) (actual time=23.216..119113.708 rows=108565086 loops=1) Output: name, arr_field_1, arr_field_2, arr_field_3, arr_field_4, arr_field_5, arr_field_6 Buffers: shared hit=2317 read=1515081 Planning Time: 0.196 ms Execution Time: 897276.109 ms (16 rows) Time: 897285.808 ms (14:57.286) testing=> \d+ bad_groups_1 Table "name_stuff.bad_groups_1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- name | character varying(254) | | | | extended | | testing=> select count(*), count(distinct name) from bad_groups_1; count | count ---------+--------- 745,024 | 745,023 (1 row) Time: 899.273 ms testing=> select name from bad_groups_1 group by name having count(*) > 1; name ------ DK (1 row) Time: 337.663 ms testing=>
pgsql-bugs by date: