Thread: Re: Reasons for choosing one execution plan overanother?

Re: Reasons for choosing one execution plan overanother?

From
Mikkel Lauritsen
Date:
Hi all,

On Wed, 11 Sep 2013 18:55:38 +0200, Giuseppe Broccolo
<giuseppe.broccolo@2ndquadrant.it> wrote:
> Il 11/09/2013 13:16, Mikkel Lauritsen ha scritto:
> > Hi all,
> >
> > I have a number of Postgres 9.2.4 databases with the same schema but
> > with slightly different contents, running on small servers that are
> > basically alike (8-16 GB ram).
>
> I think that your answer can be found in your statement "slightly
> different contents".

Yup, that's what I've been thinking myself so far - it definitely doesn't
look as if I'm hitting a bug, and I've been through the schema so I feel
reasonably sure that I'm not missing an index.

In the example from my original mail the i and a tables are identical in
the two databases. The slow plan is chosen when the x and e tables contain
3.2M and 6.2M rows, the fast plan has 12.8M and 17M rows.

So - does anybody with enough insight in the planner know if it sounds
likely that it would choose the given plans in these two cases, or if
it's more likely that I have a tuning problem that leads to bad planning?

And if the different plans are to be expected, is there any way I can hint
at the planner to make it choose the fast plan in both cases?

FWIW if I do a very simple query like

SELECT e.id FROM e INNER JOIN x USING (id) WHERE e.h_id = 'foo';

on the two databases I also end up with two different plans (included
below). Here the execution time difference is much less pronounced (note
that the "fast" execution is on inferior hardware and with a much larger
result), but the way the join is executed is the same as in the initial
larger plans. Setting seq_page_cost and random_page_cost to the same
value makes the planner choose the fast plan in both cases, but unfortu-
nately that has no effect on my initial problem :-/

Best regards & thanks,
  Mikkel Lauritsen

---

Fast plan:

 Nested Loop  (cost=0.00..24523.00 rows=1138 width=39) (actual
time=2.546..33.858 rows=1192 loops=1)
   Buffers: shared hit=8991
   ->  Index Scan using e_h_id_idx on e  (cost=0.00..6171.55 rows=1525
width=39) (actual time=0.053..1.211 rows=1857 loops=1)
         Index Cond: (healthtrack_id =
'-95674114670403931535179954575983492851'::text)
         Buffers: shared hit=350
   ->  Index Only Scan using x_pkey on x  (cost=0.00..12.02 rows=1
width=39) (actual time=0.017..0.017 rows=1 loops=1857)
         Index Cond: (id = e.id)
         Heap Fetches: 1192
         Buffers: shared hit=8641
 Total runtime: 34.065 ms


Slow plan:

 Nested Loop  (cost=22.25..7020.66 rows=277 width=39) (actual
time=0.298..13.996 rows=228 loops=1)
   Buffers: shared hit=3173
   ->  Bitmap Heap Scan on e  (cost=22.25..2093.50 rows=537 width=39)
(actual time=0.219..0.628 rows=697 loops=1)
         Recheck Cond: (healthtrack_id = 'foo'::text)
         Buffers: shared hit=152
         ->  Bitmap Index Scan on e_h_id_idx  (cost=0.00..22.12 rows=537
width=0) (actual time=0.188..0.188 rows=697 loops=1)
               Index Cond: (h_id = 'foo'::text)
               Buffers: shared hit=9
   ->  Index Only Scan using x_pkey on x  (cost=0.00..9.17 rows=1
width=39) (actual time=0.018..0.018 rows=0 loops=697)
         Index Cond: (id = e.id)
         Heap Fetches: 228
         Buffers: shared hit=3021
 Total runtime: 14.070 ms


Re: Reasons for choosing one execution plan overanother?

From
Mikkel Lauritsen
Date:
I wrote:

--- snip ---

> So - does anybody with enough insight in the planner know if it sounds
> likely that it would choose the given plans in these two cases, or if
> it's more likely that I have a tuning problem that leads to bad
> planning?

Duh. It suddenly dawned on me that I need to look closer at the plans...

The big difference in the estimated and actual row count in lines like

->  Nested Loop  (cost=0.00..250.78 rows=338 width=47) (actual
time=0.100..189.676 rows=187012 loops=1)

indicates that the planner is somehow mislead by the statistics on (at
least) one of the tables, right? Any suggestions as to how I go about
investigating that further?

One thing here that is slightly confusing is the relationship between
the estimated row count of 169 in the outer loop and 6059 in the last
index scan in the partial plan below. How do they relate to each other?

->  Nested Loop  (cost=0.00..452.10 rows=169 width=47) (actual
time=0.088..41.244 rows=32863 loops=1)
      ->  Nested Loop  (cost=0.00..16.55 rows=1 width=39) (actual
time=0.031..0.035 rows=1 loops=1)
            ->  Index Scan using i_c_id on i  (cost=0.00..8.27 rows=1
width=39) (actual time=0.016..0.017 rows=1 loops=1)
                  Index Cond: (c = 'bar'::text)
            ->  Index Scan using a_i_id_idx on a  (cost=0.00..8.27 rows=1
width=78) (actual time=0.012..0.013 rows=1 loops=1)
                  Index Cond: (i_id = i.id)
      ->  Index Scan using x_a_id_idx on x  (cost=0.00..374.95 rows=6059
width=86) (actual time=0.055..27.219 rows=32863 loops=1)
            Index Cond: (a_id = a.id)


Best regards & thanks,
  Mikkel Lauritsen