Why DISTINCT ... DESC is slow? - Mailing list pgsql-general

From Anton
Subject Why DISTINCT ... DESC is slow?
Date
Msg-id 8cac8dd0612112328m5ac0520m7e3d9778a9d14b47@mail.gmail.com
Whole thread Raw
Responses Re: Why DISTINCT ... DESC is slow?  (Richard Huxton <dev@archonet.com>)
Re: Why DISTINCT ... DESC is slow?  (Ron Johnson <ron.l.johnson@cox.net>)
List pgsql-general
Hi. With this table (about 800 000 rows):

=# \d n_traffic
                         Table "public.n_traffic"
    Column    |            Type             |          Modifiers
--------------+-----------------------------+------------------------------
 login_id     | integer                     | not null
 traftype_id  | integer                     | not null
 collect_time | timestamp without time zone | not null default now()
 bytes_in     | bigint                      | not null default (0)::bigint
 bytes_out    | bigint                      | not null default (0)::bigint
Indexes:
    "n_traffic_collect_time" btree (collect_time)
    "n_traffic_login_id" btree (login_id)
    "n_traffic_login_id_collect_time" btree (login_id, collect_time)
Foreign-key constraints:
    "n_traffic_login_id_fkey" FOREIGN KEY (login_id) REFERENCES
n_logins(login_id) ON UPDATE CASCADE
    "n_traffic_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES
n_traftypes(traftype_id) ON UPDATE CASCADE


=# explain analyze SELECT DISTINCT ON (login_id) login_id,
collect_time AS dt FROM n_traffic ORDER BY login_id collect_time DESC;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=91711.13..95669.41 rows=532 width=12) (actual
time=10698.860..13790.918 rows=798 loops=1)
   ->  Sort  (cost=91711.13..93690.27 rows=791656 width=12) (actual
time=10698.851..12140.496 rows=791656 loops=1)
         Sort Key: login_id, collect_time
         ->  Seq Scan on n_traffic  (cost=0.00..14150.56 rows=791656
width=12) (actual time=0.013..2776.572 rows=791656 loops=1)
 Total runtime: 14049.378 ms
(5 rows)

For me it is strange that postgres uses Seq Scan, but set
enable_seqscan = off don't get any changes.

While without DESC query goes faster... But not so fast!
=# explain analyze SELECT DISTINCT ON (login_id) login_id,
collect_time AS dt FROM n_traffic ORDER BY login_id collect_time;

     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..29843.08 rows=532 width=12) (actual
time=0.045..5146.768 rows=798 loops=1)
   ->  Index Scan using n_traffic_login_id_collect_time on n_traffic
(cost=0.00..27863.94 rows=791656 width=12) (actual
time=0.037..3682.853 rows=791656 loops=1)
 Total runtime: 5158.735 ms
(3 rows)

Why? 768 rows is about 1000 times smaller than entire n_traffic. And
why Index Scan used without DESC but with DESC is not?
--
engineer

pgsql-general by date:

Previous
From: Ashish Karalkar
Date:
Subject: 8.2.0 Installation Problem
Next
From: wheel
Date:
Subject: Re: Restore database from files (not dump files)?