Thread: Some issues with planner and query optimization

Some issues with planner and query optimization

From
"Boguk Maxim"
Date:
Postgres 8.1
Sample test table:
(all queries done on fresh vacuumed analyzed table with statistics on
rub_id and news_dtime set to 1000)
(all table in memory and server do not doing anything other)

media=> \d test_table
              Table "public.test_table"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 id         | integer                     |
 rub_id     | integer                     |
 news_id    | integer                     |
 news_dtime | timestamp without time zone |
Indexes:
    "test_table_pk" UNIQUE, btree (id)
    "test_table_main_idx" btree (rub_id, news_dtime)

media=> select count(*) from test_table;
  count
---------
 5834463
media=> select count(distinct rub_id) from test_table;
 count
-------
   342

Now doing 3 simple query:

First:
media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5)
order by news_dtime limit 20;
                                                                   QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
 Limit  (cost=0.00..10.73 rows=20 width=20) (actual time=0.018..0.121
rows=20 loops=1)
   ->  Index Scan using test_table_main_idx on test_table
(cost=0.00..29758.11 rows=55447 width=20) (actual time=0.014..0.054
rows=20 loops=1)
         Index Cond: (rub_id = 5)
 Total runtime: 0.186 ms

Second (almost same but with rub_id 8):
media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (8)
order by news_dtime limit 20;
                                                                   QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-
 Limit  (cost=0.00..1.98 rows=20 width=20) (actual time=0.019..0.121
rows=20 loops=1)
   ->  Index Scan using test_table_main_idx on test_table
(cost=0.00..45976.37 rows=463684 width=20) (actual time=0.014..0.054
rows=20 loops=1)
         Index Cond: (rub_id = 8)
 Total runtime: 0.186 ms


Now try with rub_id IN (5,8) (I was assumed query will work 2-10 time
longer max... With almost same plan)
But i'm got bad plan/really slow query:

media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5,8)
order by news_dtime limit 20;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
 Limit  (cost=103337.45..103337.50 rows=20 width=20) (actual
time=4437.841..4437.976 rows=20 loops=1)
   ->  Sort  (cost=103337.45..104624.26 rows=514725 width=20) (actual
time=4437.836..4437.873 rows=20 loops=1)
         Sort Key: news_dtime
         ->  Bitmap Heap Scan on test_table  (cost=3818.96..54506.92
rows=514725 width=20) (actual time=82.139..1100.021 rows=515340 loops=1)
               Recheck Cond: ((rub_id = 5) OR (rub_id = 8))
               ->  BitmapOr  (cost=3818.96..3818.96 rows=519131 width=0)
(actual time=80.498..80.498 rows=0 loops=1)
                     ->  Bitmap Index Scan on test_table_main_idx
(cost=0.00..409.06 rows=55447 width=0) (actual time=8.342..8.342
rows=54959 loops=1)
                           Index Cond: (rub_id = 5)
                     ->  Bitmap Index Scan on test_table_main_idx
(cost=0.00..3409.89 rows=463684 width=0) (actual time=72.146..72.146
rows=460381 loops=1)
                           Index Cond: (rub_id = 8)
 Total runtime: 4458.999 ms
(11 rows)

Ouch.... 25000 slower...
Why planner not try two index scan and merge results...
Only one workaround wich i found was use query like:

media=> EXPLAIN ANALYZE
media-> (select * from test_table where rub_id IN (8) order by
news_dtime limit 20)
media-> UNION
media-> (select * from test_table where rub_id IN (5) order by
news_dtime limit 20)
media-> ORDER BY news_dtime LIMIT 20;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------
 Limit  (cost=15.75..15.80 rows=20 width=20) (actual time=0.788..0.893
rows=20 loops=1)
   ->  Sort  (cost=15.75..15.85 rows=40 width=20) (actual
time=0.784..0.826 rows=20 loops=1)
         Sort Key: news_dtime
         ->  Unique  (cost=14.18..14.68 rows=40 width=20) (actual
time=0.462..0.694 rows=40 loops=1)
               ->  Sort  (cost=14.18..14.28 rows=40 width=20) (actual
time=0.458..0.527 rows=40 loops=1)
                     Sort Key: id, rub_id, news_id, news_dtime
                     ->  Append  (cost=0.00..13.12 rows=40 width=20)
(actual time=0.021..0.370 rows=40 loops=1)
                           ->  Limit  (cost=0.00..1.98 rows=20 width=20)
(actual time=0.017..0.118 rows=20 loops=1)
                                 ->  Index Scan using
test_table_main_idx on test_table  (cost=0.00..45976.37 rows=463684
width=20) (actual time=0.014..0.052 rows=20 loops=1)
                                       Index Cond: (rub_id = 8)
                           ->  Limit  (cost=0.00..10.73 rows=20
width=20) (actual time=0.012..0.114 rows=20 loops=1)
                                 ->  Index Scan using
test_table_main_idx on test_table  (cost=0.00..29758.11 rows=55447
width=20) (actual time=0.010..0.050 rows=20 loops=1)
                                       Index Cond: (rub_id = 5)
 Total runtime: 0.987 ms
(14 rows)

Performance look good.
But writing query for 10-20 rub_id inside IN (list) is pain.
I tested and UNION work faster even with 100 rub_id inside IN (list).

So any way rewrite query for get fast results without using long UNION
lists?
Or any way force postgres use different plan for query?



PS: trying disable bitmap_scan lead to seq_scan:

media=> set enable_bitmapscan=off;
SET
media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5,8)
order by news_dtime limit 20;
                                                             QUERY PLAN

------------------------------------------------------------------------
-------------------------------------------------------------
 Limit  (cost=179248.47..179248.52 rows=20 width=20) (actual
time=6815.917..6816.014 rows=20 loops=1)
   ->  Sort  (cost=179248.47..180535.28 rows=514725 width=20) (actual
time=6815.912..6815.945 rows=20 loops=1)
         Sort Key: news_dtime
         ->  Seq Scan on test_table  (cost=0.00..130417.94 rows=514725
width=20) (actual time=549.371..3529.336 rows=515340 loops=1)
               Filter: ((rub_id = 5) OR (rub_id = 8))
 Total runtime: 6838.502 ms
(6 rows)


Regars, Maxim Boguk
astar@rambler-co.ru | www.rambler.ru


Re: Some issues with planner and query optimization

From
Richard Huxton
Date:
Boguk Maxim wrote:
> Postgres 8.1
> Sample test table:
> (all queries done on fresh vacuumed analyzed table with statistics on
> rub_id and news_dtime set to 1000)
> (all table in memory and server do not doing anything other)
>
> media=> \d test_table
>               Table "public.test_table"
>    Column   |            Type             | Modifiers
> ------------+-----------------------------+-----------
>  id         | integer                     |
>  rub_id     | integer                     |
>  news_id    | integer                     |
>  news_dtime | timestamp without time zone |
> Indexes:
>     "test_table_pk" UNIQUE, btree (id)
>     "test_table_main_idx" btree (rub_id, news_dtime)
>
> media=> select count(*) from test_table;
>   count
> ---------
>  5834463
> media=> select count(distinct rub_id) from test_table;
>  count
> -------
>    342
>
> Now doing 3 simple query:
>
> First:
> media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5)
> order by news_dtime limit 20;
>                                                                    QUERY
> PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
>  Limit  (cost=0.00..10.73 rows=20 width=20) (actual time=0.018..0.121
> rows=20 loops=1)
>    ->  Index Scan using test_table_main_idx on test_table
> (cost=0.00..29758.11 rows=55447 width=20) (actual time=0.014..0.054
> rows=20 loops=1)
>          Index Cond: (rub_id = 5)
>  Total runtime: 0.186 ms
>
> Second (almost same but with rub_id 8):
> media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (8)
> order by news_dtime limit 20;
>                                                                    QUERY
> PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -
>  Limit  (cost=0.00..1.98 rows=20 width=20) (actual time=0.019..0.121
> rows=20 loops=1)
>    ->  Index Scan using test_table_main_idx on test_table
> (cost=0.00..45976.37 rows=463684 width=20) (actual time=0.014..0.054
> rows=20 loops=1)
>          Index Cond: (rub_id = 8)
>  Total runtime: 0.186 ms
>
>
> Now try with rub_id IN (5,8) (I was assumed query will work 2-10 time
> longer max... With almost same plan)
> But i'm got bad plan/really slow query:
>
> media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5,8)
> order by news_dtime limit 20;
>
> QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
>  Limit  (cost=103337.45..103337.50 rows=20 width=20) (actual
> time=4437.841..4437.976 rows=20 loops=1)
>    ->  Sort  (cost=103337.45..104624.26 rows=514725 width=20) (actual
> time=4437.836..4437.873 rows=20 loops=1)
>          Sort Key: news_dtime
>          ->  Bitmap Heap Scan on test_table  (cost=3818.96..54506.92
> rows=514725 width=20) (actual time=82.139..1100.021 rows=515340 loops=1)
>                Recheck Cond: ((rub_id = 5) OR (rub_id = 8))
>                ->  BitmapOr  (cost=3818.96..3818.96 rows=519131 width=0)
> (actual time=80.498..80.498 rows=0 loops=1)
>                      ->  Bitmap Index Scan on test_table_main_idx
> (cost=0.00..409.06 rows=55447 width=0) (actual time=8.342..8.342
> rows=54959 loops=1)
>                            Index Cond: (rub_id = 5)
>                      ->  Bitmap Index Scan on test_table_main_idx
> (cost=0.00..3409.89 rows=463684 width=0) (actual time=72.146..72.146
> rows=460381 loops=1)
>                            Index Cond: (rub_id = 8)
>  Total runtime: 4458.999 ms
> (11 rows)
>
> Ouch.... 25000 slower...
> Why planner not try two index scan and merge results...

Try: ORDER BY rub_id, news_dtime
Does that give it enough of a hint?
The problem is you're asking for the 20 oldest regardless of rub_id, so
the index isn't as much use as it might be.

Perhaps an index on (news_dtime,rub_id) rather than the other way around?

--
   Richard Huxton
   Archonet Ltd