RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs
| From | David Raymond |
|---|---|
| Subject | RE: BUG #16031: Group by returns duplicate groups |
| Date | |
| Msg-id | VI1PR07MB6029E7C9B73250E386088047879D0@VI1PR07MB6029.eurprd07.prod.outlook.com Whole thread Raw |
| In response to | Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
| Responses |
RE: BUG #16031: Group by returns duplicate groups
|
| List | pgsql-bugs |
"But this is all kinds of messed up"
Sounds about right :)
Output below.
As asked earlier it's Windows 10, all the lc_* settings are "en-US", and the server encoding is UTF8.
Throughout the table there are records with pretty much every notation alphabet. Latin, Cyrillic, Greek, Arabic,
Hebrew,Japanese, Mandarin, etc. etc. Which "shouldn't" matter, but I figured I'd mention it.
testing=> set enable_indexscan = off;
SET
Time: 0.536 ms
testing=> set enable_bitmapscan = off;
SET
Time: 0.225 ms
testing=> with sd as (select name, row_number() over (order by name) rnum
testing(> from big_table)
testing-> select rnum, name, encode(convert_to(name, 'SQL_ASCII'),'escape'),
testing-> name < 'DK' as lt, name = 'DK' as eq, name > 'DK' as gt
testing-> from sd
testing-> where rnum >= (select min(rnum) from sd where name='DK')
testing-> and rnum <= (select max(rnum) from sd where name='DK');
rnum | name | encode | lt | eq | gt
------------+-----------------------------------+-----------------------------------+----+----+----
27,900,023 | DK | DK | f | t | f
27,900,024 | DK | DK | f | t | f
27,900,025 | DK | DK | f | t | f
27,900,026 | DK | DK | f | t | f
27,900,027 | DK | DK | f | t | f
27,900,028 | DK | DK | f | t | f
27,900,029 | DK | DK | f | t | f
27,900,030 | DK | DK | f | t | f
27,900,031 | DK | DK | f | t | f
27,900,032 | Dk'bus Marine | Dk'bus Marine | f | f | t
27,900,033 | Dk's Auto's | Dk's Auto's | f | f | t
27,900,034 | Dk's Bar & Grill | Dk's Bar & Grill | f | f | t
27,900,035 | Dk's Barbers & Stylist | Dk's Barbers & Stylist | f | f | t
27,900,036 | Dk's Beach Boutique | Dk's Beach Boutique | f | f | t
27,900,037 | Dk's Cabinets & Countertops | Dk's Cabinets & Countertops | f | f | t
27,900,038 | Dk's Cleaning Service | Dk's Cleaning Service | f | f | t
27,900,039 | Dk's Clothing | Dk's Clothing | f | f | t
27,900,040 | Dk's Communications | Dk's Communications | f | f | t
27,900,041 | Dk's Dancewear & Fitnesswear | Dk's Dancewear & Fitnesswear | f | f | t
27,900,042 | Dk's Dancewear Boutique | Dk's Dancewear Boutique | f | f | t
27,900,043 | Dk's Discount Dance & Fitnesswear | Dk's Discount Dance & Fitnesswear | f | f | t
27,900,044 | DK's Drywall Service | DK's Drywall Service | f | f | t
27,900,045 | DK'S DUSTBUSTERS | DK'S DUSTBUSTERS | f | f | t
27,900,046 | Dk's Family Five Star Trophies | Dk's Family Five Star Trophies | f | f | t
27,900,047 | DK's Family Five Star Trophies | DK's Family Five Star Trophies | f | f | t
27,900,048 | Dk's Food Mart | Dk's Food Mart | f | f | t
27,900,049 | Dk'S Group Pte. Ltd. | Dk'S Group Pte. Ltd. | f | f | t
27,900,050 | Dk's Hair Designs | Dk's Hair Designs | f | f | t
27,900,051 | Dk's Hair Happenings | Dk's Hair Happenings | f | f | t
27,900,052 | Dk's Hair Supply | Dk's Hair Supply | f | f | t
27,900,053 | Dk's Home Decor | Dk's Home Decor | f | f | t
27,900,054 | DK's Informática | DK's Inform\303\241tica | f | f | t
27,900,055 | Dk's Janitorial | Dk's Janitorial | f | f | t
27,900,056 | DK's Liquors | DK's Liquors | f | f | t
27,900,057 | Dk's Market | Dk's Market | f | f | t
27,900,058 | Dk's Moda Masculina | Dk's Moda Masculina | f | f | t
27,900,059 | Dk's Nails And Spa | Dk's Nails And Spa | f | f | t
27,900,060 | DK's Pawn Shop | DK's Pawn Shop | f | f | t
27,900,061 | Dk's Pet Grooming | Dk's Pet Grooming | f | f | t
27,900,062 | DK's Quality Service | DK's Quality Service | f | f | t
27,900,063 | DK's Restoration | DK's Restoration | f | f | t
27,900,064 | Dk's Sports Center | Dk's Sports Center | f | f | t
27,900,065 | Dk's Statuary | Dk's Statuary | f | f | t
27,900,066 | Dk's Style Hut | Dk's Style Hut | f | f | t
27,900,067 | Dk's Temiskaming Shore Taxi | Dk's Temiskaming Shore Taxi | f | f | t
27,900,068 | Dk's Towing | Dk's Towing | f | f | t
27,900,069 | DK's Travel | DK's Travel | f | f | t
27,900,070 | Dk'Style | Dk'Style | f | f | t
27,900,071 | DK'Z Car Wash | DK'Z Car Wash | f | f | t
27,900,072 | Dk- | Dk- | t | f | f
27,900,073 | DK | DK | f | t | f
27,900,074 | DK | DK | f | t | f
27,900,075 | DK | DK | f | t | f
27,900,076 | DK | DK | f | t | f
27,900,077 | DK | DK | f | t | f
27,900,078 | DK | DK | f | t | f
27,900,079 | DK | DK | f | t | f
27,900,080 | DK | DK | f | t | f
27,900,081 | DK | DK | f | t | f
27,900,082 | DK | DK | f | t | f
27,900,083 | DK | DK | f | t | f
27,900,084 | DK | DK | f | t | f
27,900,085 | DK | DK | f | t | f
27,900,086 | DK | DK | f | t | f
27,900,087 | DK | DK | f | t | f
27,900,088 | DK | DK | f | t | f
(66 rows)
Time: 821796.036 ms (13:41.796)
testing=> explain (analyze, verbose, costs, buffers, timing, summary)
testing-> with sd as (select name, row_number() over (order by name) rnum
testing(> from big_table)
testing-> select rnum, name, encode(convert_to(name, 'SQL_ASCII'),'escape'),
testing-> name < 'DK' as lt, name = 'DK' as eq, name > 'DK' as gt
testing-> from sd
testing-> where rnum >= (select min(rnum) from sd where name='DK')
testing-> and rnum <= (select max(rnum) from sd where name='DK');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on sd (cost=30560557.18..33281469.69 rows=542825 width=559) (actual time=886056.646..898585.621 rows=66
loops=1)
Output: sd.rnum, sd.name, encode(convert_to((sd.name)::text, 'SQL_ASCII'::name), 'escape'::text), ((sd.name)::text <
'DK'::text),((sd.name)::text = 'DK'::text), ((sd.name)::text > 'DK'::text)
Filter: ((sd.rnum >= $1) AND (sd.rnum <= $2))
Rows Removed by Filter: 108565020
Buffers: shared hit=482 read=1516916, temp read=3200588 written=2619846
CTE sd
-> WindowAgg (cost=23772525.03..25672414.07 rows=108565088 width=30) (actual time=719781.561..832442.189
rows=108565086loops=1)
Output: big_table.name, row_number() OVER (?)
Buffers: shared hit=482 read=1516916, temp read=2033663 written=2036384
-> Sort (cost=23772525.03..24043937.75 rows=108565088 width=22) (actual time=719781.549..793477.044
rows=108565086loops=1)
Output: big_table.name
Sort Key: big_table.name
Sort Method: external merge Disk: 3453888kB
Buffers: shared hit=482 read=1516916, temp read=2033663 written=2036384
-> Seq Scan on name_stuff.big_table (cost=0.00..2603048.88 rows=108565088 width=22) (actual
time=0.010..105238.261rows=108565086 loops=1)
Output: big_table.name
Buffers: shared hit=482 read=1516916
InitPlan 2 (returns $1)
-> Aggregate (cost=2444071.54..2444071.55 rows=1 width=8) (actual time=148300.335..148300.336 rows=1 loops=1)
Output: min(sd_1.rnum)
Buffers: temp read=431489 written=583461
-> CTE Scan on sd sd_1 (cost=0.00..2442714.48 rows=542825 width=8) (actual time=34105.882..148300.327
rows=25loops=1)
Output: sd_1.name, sd_1.rnum
Filter: ((sd_1.name)::text = 'DK'::text)
Rows Removed by Filter: 108565061
Buffers: temp read=431489 written=583461
InitPlan 3 (returns $2)
-> Aggregate (cost=2444071.54..2444071.55 rows=1 width=8) (actual time=14707.032..14707.032 rows=1 loops=1)
Output: max(sd_2.rnum)
Buffers: temp read=583462
-> CTE Scan on sd sd_2 (cost=0.00..2442714.48 rows=542825 width=8) (actual time=3729.712..14707.025
rows=25loops=1)
Output: sd_2.name, sd_2.rnum
Filter: ((sd_2.name)::text = 'DK'::text)
Rows Removed by Filter: 108565061
Buffers: temp read=583462
Planning Time: 0.099 ms
Execution Time: 899881.036 ms
(37 rows)
Time: 899900.240 ms (14:59.900)
testing=>
-----Original Message-----
From: Andrew Gierth <andrew@tao11.riddles.org.uk>
Sent: Tuesday, October 1, 2019 11:21 AM
To: David Raymond <David.Raymond@tomtom.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #16031: Group by returns duplicate groups
But this is all kinds of messed up:
...
Let's see some more data from that. Do this query:
with sd as (select name, row_number() over (order by name) rnum
from big_table)
select rnum, name, encode(convert_to(name, 'SQL_ASCII'),'escape'),
name < 'DK' as lt, name = 'DK' as eq, name > 'DK' as gt
from sd
where rnum >= (select min(rnum) from sd where name='DK')
and rnum <= (select max(rnum) from sd where name='DK');
--
Andrew (irc:RhodiumToad)
pgsql-bugs by date: