Thread: Why order by column not using index with distinct keyword in selectclause?
I have define a simple B Tree index on column *country* for users table. I don’t understand why the order by column not using the index scan when using *distinct* keyword in the select clause. Can anyone explain what is happening here?
aruprakshit=# \d users;
Table "public.users"
Column | Type | Collation | Nullable | Default
------------+-----------------------+-----------+----------+-----------------------------------
city | character varying | | |
last_name | character varying(50) | | |
country | character varying(50) | | |
sequence | integer | | |
first_name | character varying(50) | | |
state | character varying(50) | | |
email | character varying | | |
id | smallint | | not null | nextval('users_id_seq'::regclass)
Indexes:
"users_pk" PRIMARY KEY, btree (id)
aruprakshit=# explain analyze select distinct country from users order by country asc;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Sort (cost=283.19..283.85 rows=263 width=11) (actual time=10.525..10.555 rows=263 loops=1)
Sort Key: country
Sort Method: quicksort Memory: 38kB
-> HashAggregate (cost=269.99..272.62 rows=263 width=11) (actual time=8.469..8.521 rows=263 loops=1)
Group Key: country
-> Seq Scan on users (cost=0.00..244.99 rows=9999 width=11) (actual time=0.022..3.428 rows=9999 loops=1)
Planning time: 0.358 ms
Execution time: 10.634 ms
(8 rows)
aruprakshit=# explain analyze select country from users order by country asc;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Only Scan using country on users (cost=0.29..886.27 rows=9999 width=11) (actual time=0.083..7.581 rows=9999 loops=1)
Heap Fetches: 9999
Planning time: 0.118 ms
Execution time: 8.332 ms
(4 rows)
aruprakshit=# explain analyze select * from users order by country asc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using country on users (cost=0.29..886.27 rows=9999 width=73) (actual time=0.015..8.432 rows=9999 loops=1)
Planning time: 0.213 ms
Execution time: 9.086 ms
(3 rows)
aruprakshit=#
Arup Rakshit <ar@zeit.io> writes: > I have define a simple B Tree index on column *country* for users table. I don’t understand why the order by column notusing the index scan when using *distinct* keyword in the select clause. Can anyone explain what is happening here? Doesn't look complicated to me: the cost of the hashagg+sort plan is 283.85, while the cost of the indexscan plans is 886.27, and adding a Uniq node on top of them would have been even more. The critical point here seems to be that the hashagg node is able to get rid of so many rows that the cost of sorting what remains is low. Also notable is that the IOS isn't giving you any advantage over a plain indexscan, which apparently is because none of the table's pages are marked all-visible. If the table is static then a VACUUM would help that. regards, tom lane
Re: Why order by column not using index with distinct keyword inselect clause?
From
Geoff Winkless
Date:
On Tue, 11 Sep 2018 at 13:56, Arup Rakshit <ar@zeit.io> wrote:
I have define a simple B Tree index on column *country* for users table. I don’t understand why the order by column not using the index scan when using *distinct* keyword in the select clause. Can anyone explain what is happening here?
Bear in mind that index skip scans aren't yet implemented, which (unless I've missed the point) seems to be what you're expecting to help here.
Geoff