WHERE vs ORDER BY vs LIMIT why not using the correct index? - Mailing list pgsql-general

From D. Dante Lorenso
Subject WHERE vs ORDER BY vs LIMIT why not using the correct index?
Date
Msg-id 4783FC05.5000907@lorenso.com
Whole thread Raw
Responses Re: WHERE vs ORDER BY vs LIMIT why not using the correct index?
List pgsql-general
All,

I have a simple query:

   SELECT tcq_id
   FROM queue q
   WHERE q.status = 'D'
   ORDER BY tcq_id ASC
   LIMIT 1;

What I'm trying to do is find all the items in the queue which have a
status of 'D' and then select the oldest item first.  My data is very
skewed such that there are not many records matching the WHERE clause.

   status  my_count
   D              4
   Q              6
   S             20
   P             74
   F           3294
   C         291206

However, when I explain that query, it's doing an index scan on the
'tcq_id' primary key column instead of using the index on 'status':

------------------------------
     EXPLAIN SELECT tcq_id
     FROM queue q
     WHERE q.status = 'D'
     ORDER BY tcq_id ASC
     LIMIT 1;

  Limit  (cost=0.00..40.09 rows=1 width=8)
    ->  Index Scan using queue_pkey on queue q (cost=0.00..59059.80
rows=1473 width=8)
          Filter: (status = 'D'::bpchar)
------------------------------

But then when I take out the ORDER BY clause, PostgreSQL will use a
different (and proper) index:

------------------------------
     EXPLAIN SELECT tcq_id
     FROM transcode_queue q
     WHERE q.status = 'D'
     LIMIT 1;

  Limit  (cost=0.00..3.81 rows=1 width=8)
    ->  Index Scan using queue_idx_status on queue q (cost=0.00..5618.07
rows=1473 width=8)
          Index Cond: (status = 'D'::bpchar)
------------------------------

I don't understand why the ORDER BY condition would be affecting my
WHERE criteria.  Shouldn't the ordering be done after the filter is
first applied?

I'm wanting: "find the 4 rows where status = 'D' then order those by
tcq_id and return the first one."  But postgresql seems to be choosing:
"order all records by the tcq_id then scan them sequentially and find
the first one matching status = 'D'".

How can I influence the planner's decision while keeping my ORDER BY clause?

After furthing testing, maybe it's not the ORDER BY but the LIMIT that
is causing the poor planner choice?  I tried to do this:

------------------------------
     SELECT tcq_id
     FROM (
       SELECT tcq_id
       FROM queue q
       WHERE q.status = 'D'
     ) x
     ORDER BY x.tcq_id ASC
     LIMIT 1;

  Limit  (cost=0.00..40.09 rows=1 width=8)
    ->  Index Scan using queue_pkey on queue q  (cost=0.00..59059.80
rows=1473 width=8)
          Filter: (status = 'D'::bpchar)
------------------------------

But this results in another wrong index choice.  So, I removed the LIMIT
clause and now it does use the right index:

------------------------------
     SELECT tcq_id
     FROM (
       SELECT tcq_id
       FROM queue q
       WHERE q.status = 'D'
     ) x
     ORDER BY x.tcq_id ASC;

  Sort  (cost=4314.36..4318.05 rows=1473 width=8)
    Sort Key: q.tcq_id
    ->  Bitmap Heap Scan on queue q  (cost=35.71..4236.85 rows=1473 width=8)
          Recheck Cond: (status = 'D'::bpchar)
          ->  Bitmap Index Scan on queue_idx_status  (cost=0.00..35.34
rows=1473 width=0)
                Index Cond: (status = 'D'::bpchar)
------------------------------

Can someone shed some insight here and help me understand what's going on?

-- Dante


pgsql-general by date:

Previous
From: "x asasaxax"
Date:
Subject: XML path function
Next
From: Ivan Sergio Borgonovo
Date:
Subject: benchmarking update/insert and random record update