ORDER BY Optimization - Mailing list pgsql-sql

From Derek Buttineau|Compu-SOLVE
Subject ORDER BY Optimization
Date
Msg-id 427BCA10.4090208@csolve.net
Whole thread Raw
Responses Re: ORDER BY Optimization  (Rosser Schwarz <rosser.schwarz@gmail.com>)
List pgsql-sql
Good Day,

I'm hoping this is the right place to send this.  I have a query that's 
causing me some grief trying to optimize it.  The query cost is fine 
until I order the data set.  Mind you after it's been ran and cached, 
subsequent calls to it are near instant.  The Query in question is:

select mr.*,m.* from maillog_received mr JOIN maillog m ON mr.maillog_id 
= m.id WHERE mr.subscription=89 and m.spam=1 ORDER BY m.msg_date desc 
limit 10

The strucutre of the tables involved in the query are as follows:
                                    Table "public.maillog_received"   Column    |          Type          |
             
 
Modifiers
--------------+------------------------+------------------------------------------------------------------id
|integer                | not null default 
 
nextval('public.maillog_received_id_seq'::text)maillog_id   | bigint                 | not nullsubscription | integer
            | not nulllocal_part   | character varying(255) | not nulldomain       | character varying(255) | not null
 
Indexes:   "maillog_received_pkey" PRIMARY KEY, btree (id)   "maillog_received_subscription_idx" btree (subscription)
"maillog_received_subscription_maillog_id_idx"btree (subscription, 
 
maillog_id)
Foreign-key constraints:   "$1" FOREIGN KEY (subscription) REFERENCES subscriptions(id) ON 
DELETE CASCADE   "$2" FOREIGN KEY (maillog_id) REFERENCES maillog(id) ON DELETE CASCADE
Triggers:   checkit BEFORE INSERT OR UPDATE ON maillog_received FOR EACH ROW 
EXECUTE PROCEDURE check_sub()
                                        Table "public.maillog"    Column      |            Type             |
            
 
Modifiers
-----------------+-----------------------------+---------------------------------------------------------id
| bigint                      | not null default 
 
nextval('public.maillog_id_seq'::text)message_id      | character varying(16)       |msg_date        | timestamp
withouttime zone | not nullfrom_local_part | character varying(255)      | not nullfrom_domain     | character
varying(255)     | not nullfrom_ip         | character varying(128)      |from_host       | character varying(255)
|subject        | character varying(255)      |virus           | integer                     | not null default 0spam
        | integer                     | not null default 0list            | integer                     | not null
default0bad_recipient   | integer                     | not null default 0bad_sender      | integer
|not null default 0bad_relay       | integer                     | not null default 0bad_file        | integer
          | not null default 0bad_mime        | integer                     | not null default 0sascore         |
numeric(7,2)               |sareport        | text                        |vscanreport     | text
|contentreport   | text                        |bypassed        | integer                     | not null default
0delivered      | integer                     | not null default 1complete        | integer                     | not
nulldefault 1
 
Indexes:   "maillog_pkey" PRIMARY KEY, btree (id)   "maillog_msg_date_idx" btree (msg_date)

EXPLAIN ANALYZE gives me the following:
Limit  (cost=31402.85..31402.87 rows=10 width=306) (actual 
time=87454.203..87454.334 rows=10 loops=1)  ->  Sort  (cost=31402.85..31405.06 rows=886 width=306) (actual 
time=87454.187..87454.240 rows=10 loops=1)        Sort Key: m.msg_date        ->  Nested Loop  (cost=0.00..31359.47
rows=886width=306) 
 
(actual time=4.740..86430.468 rows=26308 loops=1)              ->  Index Scan using maillog_received_subscription_idx
on
 
maillog_received mr  (cost=0.00..17789.73 rows=4479 width=43) (actual 
time=0.030..33554.061 rows=65508 loops=1)                    Index Cond: (subscription = 89)              ->  Index
Scanusing maillog_pkey on maillog m  
 
(cost=0.00..3.02 rows=1 width=263) (actual time=0.776..0.780 rows=0 
loops=65508)                    Index Cond: ("outer".maillog_id = m.id)                    Filter: (spam = 1)Total
runtime:87478.068 ms
 

Now there is a lot of data in these tables, at least a few million 
records, but I'd hoped to get a bit better performance :)  Now another 
odd thing I will mention, if I take the database schema to a second 
server (both running postgres 8.0.2 on FreeBSD 5.3), I get a much 
different (and to me, it looks much more effecient) query plan (though 
it's with substantially less data (about 500,000 records in the table)):
Limit  (cost=0.00..6482.60 rows=10 width=311) (actual 
time=25.340..26.885 rows=10 loops=1)  ->  Nested Loop  (cost=0.00..1175943.99 rows=1814 width=311) (actual 
time=25.337..26.867 rows=10 loops=1)        ->  Index Scan Backward using maillog_msg_date_idx on maillog 
m  (cost=0.00..869203.93 rows=51395 width=270) (actual 
time=25.156..26.050 rows=48 loops=1)              Filter: (spam = 1)        ->  Index Scan using 
maillog_received_subscription_maillog_id_idx on maillog_received mr  
(cost=0.00..5.96 rows=1 width=41) (actual time=0.011..0.012 rows=0 loops=48)              Index Cond: ((mr.subscription
=89) AND (mr.maillog_id = 
 
"outer".id))Total runtime: 27.016 ms

Any suggestions?

-- 
Regards,

Derek Buttineau
Internet Systems Developer
Compu-SOLVE Internet Services
Compu-SOLVE Technologies Inc.

705.725.1212 x255



pgsql-sql by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: pg_dump without data
Next
From: Rosser Schwarz
Date:
Subject: Re: ORDER BY Optimization