Thread: planner parameters

planner parameters

From
Torsten Förtsch
Date:
Hi,

depending on the OFFSET parameter I have seen at least 3 different query
plans.

SELECT * FROM
    (
     SELECT * FROM transaction tt
     WHERE
         tt.account_id = '1376641'
         AND tt.transaction_time >= '2013-02-03 05:37:24'
         AND tt.transaction_time < '2013-08-23 05:37:24'
     ORDER BY
         tt.transaction_time ASC,
         tt.id ASC
     LIMIT 10000
     OFFSET 0
    ) t1
    LEFT OUTER JOIN fmb t2
        ON (t1.fmb_id = t2.id)
    LEFT OUTER JOIN payment.payment t3
        ON (t1.payment_id = t3.id);

The best of them is this:

 Nested Loop Left Join  (cost=1488.34..126055.47 rows=9985 width=1015)
                        (actual time=26.894..78.711 rows=10000 loops=1)
   ->  Nested Loop Left Join
       (cost=1487.91..86675.47 rows=9985 width=828)
       (actual time=26.892..72.170 rows=10000 loops=1)
         ->  Limit  (cost=1487.35..1911.50 rows=9985 width=597)
                    (actual time=26.873..33.735 rows=10000 loops=1)
               ->  Index Scan using xxx on transaction tt
                   (cost=0.57..1911.50 rows=44985 width=597)
                   (actual time=0.020..31.707 rows=45000 loops=1)
                     Index Cond: ((account_id = 1376641::bigint) AND
                                  (transaction_time >= '...') AND
                                  (transaction_time < '...'))
         ->  Index Scan using pk_fmb on fmb t2
             (cost=0.56..8.47 rows=1 width=231)
             (actual time=0.003..0.003 rows=1 loops=10000)
               Index Cond: (tt.fmb_id = id)
   ->  Index Scan using pk_payment on payment t3
       (cost=0.43..3.93 rows=1 width=187)
       (actual time=0.000..0.000 rows=0 loops=10000)
         Index Cond: (tt.payment_id = id)
 Total runtime: 79.219 ms

Another one is this:

 Hash Left Join  (cost=55139.59..140453.16 rows=9985 width=1015)
                 (actual time=715.450..762.989 rows=10000 loops=1)
   Hash Cond: (tt.payment_id = t3.id)
   ->  Nested Loop Left Join
       (cost=1487.91..86675.47 rows=9985 width=828)
       (actual time=27.472..70.723 rows=10000 loops=1)
         ->  Limit  (cost=1487.35..1911.50 rows=9985 width=597)
                    (actual time=27.453..34.066 rows=10000 loops=1)
               ->  Index Scan using xxx on transaction tt
                   (cost=0.57..1911.50 rows=44985 width=597)
                   (actual time=0.076..32.050 rows=45000 loops=1)
                     Index Cond: ((account_id = 1376641::bigint) AND
                                  (transaction_time >= '...') AND
                                  (transaction_time < '...'))
         ->  Index Scan using pk_fmb on fmb t2
             (cost=0.56..8.47 rows=1 width=231)
             (actual time=0.003..0.003 rows=1 loops=10000)
               Index Cond: (tt.fmb_id = id)
   ->  Hash  (cost=40316.30..40316.30 rows=1066830 width=187)
             (actual time=687.651..687.651 rows=1066830 loops=1)
         Buckets: 131072  Batches: 1  Memory Usage: 235206kB
         ->  Seq Scan on payment t3
             (cost=0.00..40316.30 rows=1066830 width=187)
             (actual time=0.004..147.681 rows=1066830 loops=1)
 Total runtime: 781.584 ms


You see this 2nd plan takes 10 times longer.

Now, if I

  set enable_seqscan=off;

the planner generates the 1st plan also for this parameter set and it
executes in about the same time (~80 ms).

Then I created a new tablespace with very low cost settings:

  alter tablespace trick_indexes set
    (seq_page_cost=0.0001, random_page_cost=0.0001);

and moved the pk_payment there. The tablespace is located on the same
disk. The only reason for it's existence are the differing cost parameters.

Now I could turn enable_seqscan back on and still got the better query plan.


Is there an other way to make the planner use generate the 1st plan?

Why does it generate the 2nd plan at all?

Does the planner take into account what is currently present in shared
memory? If so, it could know that the pk_payment index is probably in
RAM most of the time.


Thanks,
Torsten


Re: planner parameters

From
Kevin Grittner
Date:
Torsten Förtsch <torsten.foertsch@gmx.net> wrote:

> Is there an other way to make the planner use generate the 1st
> plan?

The planner cost factors are based on the assumption that a
moderate percentage of random page reads will need to actually go
out to disk.  If a high percentage of pages are in cache, you may
want to reduce random_page_cost to something closer to (or even
equal to) seq_page_cost.  I generally find I get better plans if I
raise cpu_tuple_cost to 0.03.  effective_cache_size should
generally be between 50% and 75% of machine RAM.  If these changes
(or others of their ilk) cause costs to be estimated in a way which
more nearly matches reality, better plans will be chosen.

> Why does it generate the 2nd plan at all?

It has the lowest estimated cost, based on your memory
configuration and cost factors.

> Does the planner take into account what is currently present in
> shared memory?

No.  If you search the archives you can probably find previous
discussions of whether it would be a good idea to do so; the
consensus has been that it would not be.

If you have further performance-related questions, please review
this page so that you can provide enough information to allow
people to give the most helpful advice:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company