plan question - query with order by and limit not choosing index depends on size of limit, table - Mailing list pgsql-performance

From Mike Broers
Subject plan question - query with order by and limit not choosing index depends on size of limit, table
Date
Msg-id AANLkTinYQGAiTJWOGQRx-Oz8sENWq722zzZsrhf5C7Nu@mail.gmail.com
Whole thread Raw
Responses Re: plan question - query with order by and limit not choosing index depends on size of limit, table
List pgsql-performance
Hello performance, I need help explaining the performance of a particular query:

select * from messages where ((messages.topic = E'/x') AND (messages.processed = 'f'))  ORDER BY messages.created_at ASC limit 10;


Table Structure:
 
  Column   |            Type             |                             Modifiers                              
------------+-----------------------------+--------------------------------------------------------------------
 id         | integer                     | not null default nextval('landing_page.messages_id_seq'::regclass)
 processed  | boolean                     | 
 topic      | character varying(255)      | 
 body       | text                        | 
 created_at | timestamp without time zone | 
 updated_at | timestamp without time zone | 
Indexes:
    "messages_pkey" PRIMARY KEY, btree (id)
    "idx_landing_page_messages_created_at" btree (created_at)
    "idx_messages_topic_processed" btree (topic, processed)


Table row count ~ 1million

When I run the query with limit 10 it skips the idx_messages_topic_processed.
When I run the query with no limit, or with a limit above 20 it uses the desired index.
On a different system with a much smaller data set (~200,000) i have to use a limit of about 35 to use the desired index.

this is the good plan with no limit or 'sweet spot' limit

 Limit  (cost=2050.29..2050.38 rows=35 width=1266)
   ->  Sort  (cost=2050.29..2052.13 rows=737 width=1266)
         Sort Key: created_at
         ->  Bitmap Heap Scan on messages  (cost=25.86..2027.70 rows=737 width=1266)
               Recheck Cond: ((topic)::text = 'x'::text)
               Filter: (NOT processed)
               ->  Bitmap Index Scan on idx_messages_topic_processed  (cost=0.00..25.68 rows=737 width=0)
                     Index Cond: (((topic)::text = '/x'::text) AND (processed = false))

This is the bad plan with limit 10
 Limit  (cost=0.00..1844.07 rows=30 width=1266)
   ->  Index Scan using idx_landing_page_messages_created_at on messages  (cost=0.00..45302.70 rows=737 width=1266)
         Filter: ((NOT processed) AND ((topic)::text = 'x'::text))


Not sure if cost has anything to do with it, but this is set in postgresql.conf.  I am hesitant to change this as I have inherited the database from a previous dba and dont want to adversely affect things that caused this to be set in a non default manner if possible.

#seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 3.0 # same scale as above



Why does the smaller limit cause it to skip the index?
Is there a way to help the planner choose the better plan?

Much appreciated, 
Mike





pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Wrong docs on wal_buffers?
Next
From: "Pierre C"
Date:
Subject: Re: Wrong docs on wal_buffers?