Thread: Why order by column not using index with distinct keyword in selectclause?

Why order by column not using index with distinct keyword in selectclause?

From

Arup Rakshit

Date:

11 September 2018, 15:56:34

I have define a simple B Tree index on column *country* for users table. I don’t understand why the order by column not using the index scan when using *distinct* keyword in the select clause. Can anyone explain what is happening here?

aruprakshit=# \d users;

Table "public.users"

Column | Type | Collation | Nullable | Default

------------+-----------------------+-----------+----------+-----------------------------------

city | character varying | | |

last_name | character varying(50) | | |

country | character varying(50) | | |

sequence | integer | | |

first_name | character varying(50) | | |

state | character varying(50) | | |

email | character varying | | |

id | smallint | | not null | nextval('users_id_seq'::regclass)

Indexes:

"users_pk" PRIMARY KEY, btree (id)

aruprakshit=# explain analyze select distinct country from users order by country asc;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------

Sort (cost=283.19..283.85 rows=263 width=11) (actual time=10.525..10.555 rows=263 loops=1)

Sort Key: country

Sort Method: quicksort Memory: 38kB

-> HashAggregate (cost=269.99..272.62 rows=263 width=11) (actual time=8.469..8.521 rows=263 loops=1)

Group Key: country

-> Seq Scan on users (cost=0.00..244.99 rows=9999 width=11) (actual time=0.022..3.428 rows=9999 loops=1)

Planning time: 0.358 ms

Execution time: 10.634 ms

(8 rows)

aruprakshit=# explain analyze select country from users order by country asc;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

Index Only Scan using country on users (cost=0.29..886.27 rows=9999 width=11) (actual time=0.083..7.581 rows=9999 loops=1)

Heap Fetches: 9999

Planning time: 0.118 ms

Execution time: 8.332 ms

(4 rows)

aruprakshit=# explain analyze select * from users order by country asc;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

Index Scan using country on users (cost=0.29..886.27 rows=9999 width=73) (actual time=0.015..8.432 rows=9999 loops=1)

Planning time: 0.213 ms

Execution time: 9.086 ms

(3 rows)

aruprakshit=#

Thanks,

Arup Rakshit

ar@zeit.io

Re: Why order by column not using index with distinct keyword in select clause?

From

Tom Lane

Date:

11 September 2018, 17:07:08

Arup Rakshit <ar@zeit.io> writes:
> I have define a simple B Tree index on column *country* for users table. I don’t understand why the order by column
notusing the index scan when using *distinct* keyword in the select clause. Can anyone explain what is happening here? 

Doesn't look complicated to me: the cost of the hashagg+sort plan is
283.85, while the cost of the indexscan plans is 886.27, and adding
a Uniq node on top of them would have been even more.

The critical point here seems to be that the hashagg node is able to
get rid of so many rows that the cost of sorting what remains is low.

Also notable is that the IOS isn't giving you any advantage over a
plain indexscan, which apparently is because none of the table's
pages are marked all-visible.  If the table is static then a VACUUM
would help that.

            regards, tom lane

Re: Why order by column not using index with distinct keyword inselect clause?

From

Geoff Winkless

Date:

11 September 2018, 17:22:03

On Tue, 11 Sep 2018 at 13:56, Arup Rakshit <ar@zeit.io> wrote:

I have define a simple B Tree index on column *country* for users table. I don’t understand why the order by column not using the index scan when using *distinct* keyword in the select clause. Can anyone explain what is happening here?

Bear in mind that index skip scans aren't yet implemented, which (unless I've missed the point) seems to be what you're expecting to help here.

https://wiki.postgresql.org/wiki/Loose_indexscan

Geoff