Slow query with indexed ORDER BY and LIMIT when using OR'd conditions - Mailing list pgsql-performance

From johno
Subject Slow query with indexed ORDER BY and LIMIT when using OR'd conditions
Date
Msg-id CACuOPqC9h6vK5rniezSOm2fu44PNp0NrB8kdtapT=xqNzQgidA@mail.gmail.com
Whole thread Raw
Responses Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions  (David G Johnston <david.g.johnston@gmail.com>)
Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi there,

I am trying to optimize a simple query that returns first 100 rows that have been updated since a given timestamp (ordered by timestamp and id desc).  If there are several rows with the same timestamp I need to a second condition, that states that I want to return rows having the given timestamp and id > given id.

The obvious query is 

SELECT * FROM register_uz_accounting_entities
    WHERE effective_on > '2014-07-11' OR (effective_on = '2014-07-11' AND id > 1459)
    ORDER BY effective_on, id
    LIMIT 100

With a composite index on (effective_on, id)

Query plan

"Limit  (cost=4613.70..4613.95 rows=100 width=1250) (actual time=0.122..0.130 rows=22 loops=1)"
"  Buffers: shared hit=28"
"  ->  Sort  (cost=4613.70..4617.33 rows=1453 width=1250) (actual time=0.120..0.122 rows=22 loops=1)"
"        Sort Key: effective_on, id"
"        Sort Method: quicksort  Memory: 30kB"
"        Buffers: shared hit=28"
"        ->  Bitmap Heap Scan on register_uz_accounting_entities  (cost=35.42..4558.17 rows=1453 width=1250) (actual time=0.036..0.083 rows=22 loops=1)"
"              Recheck Cond: ((effective_on > '2014-07-11'::date) OR ((effective_on = '2014-07-11'::date) AND (id > 1459)))"
"              Buffers: shared hit=28"
"              ->  BitmapOr  (cost=35.42..35.42 rows=1453 width=0) (actual time=0.026..0.026 rows=0 loops=1)"
"                    Buffers: shared hit=6"
"                    ->  Bitmap Index Scan on idx2  (cost=0.00..6.49 rows=275 width=0) (actual time=0.017..0.017 rows=15 loops=1)"
"                          Index Cond: (effective_on > '2014-07-11'::date)"
"                          Buffers: shared hit=3"
"                    ->  Bitmap Index Scan on idx2  (cost=0.00..28.21 rows=1178 width=0) (actual time=0.007..0.007 rows=7 loops=1)"
"                          Index Cond: ((effective_on = '2014-07-11'::date) AND (id > 1459))"
"                          Buffers: shared hit=3"
"Total runtime: 0.204 ms"


Everything works as expected. However if I change the constraint to a timestamp/date more in the past (resulting in far more matching rows) the query slows down drastically.

SELECT * FROM register_uz_accounting_entities
WHERE effective_on > '2014-06-11' OR (effective_on = '2014-06-11' AND id > 1459)
ORDER BY effective_on, id
LIMIT 100
 
"Limit  (cost=0.42..649.46 rows=100 width=1250) (actual time=516.125..516.242 rows=100 loops=1)"
"  Buffers: shared hit=576201"
"  ->  Index Scan using idx2 on register_uz_accounting_entities  (cost=0.42..106006.95 rows=16333 width=1250) (actual time=516.122..516.229 rows=100 loops=1)"
"        Filter: ((effective_on > '2014-06-11'::date) OR ((effective_on = '2014-06-11'::date) AND (id > 1459)))"
"        Rows Removed by Filter: 797708"
"        Buffers: shared hit=576201"
"Total runtime: 516.304 ms"


I've tried to optimize this query by pushing down the limit and order by's into explicit subselects.

SELECT * FROM (
   SELECT * FROM register_uz_accounting_entities 
   WHERE effective_on > '2014-06-11'
   ORDER BY effective_on, id LIMIT 100
   ) t1
UNION
  SELECT * FROM (
    SELECT * FROM register_uz_accounting_entities
    WHERE effective_on = '2014-06-11' AND id > 1459
    ORDER BY effective_on, id LIMIT 100
) t2
ORDER BY effective_on, id
LIMIT 100
 
-- query plan
"Limit  (cost=684.29..684.54 rows=100 width=1250) (actual time=2.648..2.708 rows=100 loops=1)"
"  Buffers: shared hit=203"
"  ->  Sort  (cost=684.29..684.79 rows=200 width=1250) (actual time=2.646..2.672 rows=100 loops=1)"
"        Sort Key: register_uz_accounting_entities.effective_on, register_uz_accounting_entities.id"
"        Sort Method: quicksort  Memory: 79kB"
"        Buffers: shared hit=203"
"        ->  HashAggregate  (cost=674.65..676.65 rows=200 width=1250) (actual time=1.738..1.971 rows=200 loops=1)"
"              Buffers: shared hit=203"
"              ->  Append  (cost=0.42..661.15 rows=200 width=1250) (actual time=0.054..0.601 rows=200 loops=1)"
"                    Buffers: shared hit=203"
"                    ->  Limit  (cost=0.42..338.62 rows=100 width=1250) (actual time=0.053..0.293 rows=100 loops=1)"
"                          Buffers: shared hit=101"
"                          ->  Index Scan using idx2 on register_uz_accounting_entities  (cost=0.42..22669.36 rows=6703 width=1250) (actual time=0.052..0.260 rows=100 loops=1)"
"                                Index Cond: (effective_on > '2014-06-11'::date)"
"                                Buffers: shared hit=101"
"                    ->  Limit  (cost=0.42..318.53 rows=100 width=1250) (actual time=0.037..0.228 rows=100 loops=1)"
"                          Buffers: shared hit=102"
"                          ->  Index Scan using idx2 on register_uz_accounting_entities register_uz_accounting_entities_1  (cost=0.42..30888.88 rows=9710 width=1250) (actual time=0.036..0.187 rows=100 loops=1)"
"                                Index Cond: ((effective_on = '2014-06-11'::date) AND (id > 1459))"
"                                Buffers: shared hit=102"
"Total runtime: 3.011 ms"

=> Very fast.

The question is... why is the query planner unable to make this optimization for the slow query? What am I missing?


Thanks in advance.


pgsql-performance by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: 60 core performance with 9.3
Next
From: David G Johnston
Date:
Subject: Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions