Re: Sort and index - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Sort and index
Date
Msg-id 20050420034041.GC58835@decibel.org
Whole thread Raw
In response to Re: Sort and index  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Sort and index  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-performance
On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > if the sort would fit in memory or not.
>
> Really?  Have you read cost_sort()?
>
> It's certainly possible that the calculation is all wet, but to claim
> that the issue is not considered is just wrong.

To be fair, no, I haven't looked at the code. This is based strictly on
anecdotal evidence on a 120M row table. I'm currently running a test to
see how an index scan compares to a seqscan. I also got the same results
when I added a where clause that would restrict it to about 7% of the
table.

Actually, after running some tests (below), the plan cost does change
when I change sort_mem (it was originally 50000).

stats=# \d email_contrib
   Table "public.email_contrib"
   Column   |  Type   | Modifiers
------------+---------+-----------
 project_id | integer | not null
 id         | integer | not null
 date       | date    | not null
 team_id    | integer |
 work_units | bigint  | not null
Indexes:
    "email_contrib_pkey" primary key, btree (project_id, id, date)
    "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
    "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
    "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
    "email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
    "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
    "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE

stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
         Filter: (project_id = 8)
(4 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
(3 rows)

stats=# select 8862263::float/130969400;
      ?column?
--------------------
 0.0676666687027657
(1 row)

stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 rows=8862263 width=24)
   Index Cond: (project_id = 8)
(2 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 rows=130969400 width=24)
(1 row)

stats=# set enable_seqscan=on;
SET
stats=# set sort_mem=1000;
SET
stats=# explain select * from email_contrib order by project_id, id, date;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
   Sort Key: project_id, id, date
   ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
(3 rows)

stats=#

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-performance by date:

Previous
From: William Yu
Date:
Subject: Re: What to do with 6 disks?
Next
From: Greg Stark
Date:
Subject: Re: Slow copy with little CPU/disk usage