Thread: Explaining an EXPLAIN.

Explaining an EXPLAIN.

From
Matthew Wakeling
Date:
Alright. I have finally worked out why I was being censored. Majordomo
doesn't like subject lines beginning with the word "help". It by default
sends your message off to the moderators and doesn't tell you. Now follows
my original mail:

Hi. I thought by now I would be fairly good at understanding EXPLAIN
ANALYSE results, but I can't quite figure this one out. Perhaps someone
could help me.

EXPLAIN ANALYSE SELECT *
FROM GeneGoAnnotation a1, GOAnnotation a2, OntologyTermRelations a3
WHERE a1.GoAnnotation = a2.id AND a2.ontologyTermId = a3.OntologyTerm;

                               QUERY PLAN
---------------------------------------------------------------------------------------
   Nested Loop  (cost=0.00..673587.67 rows=330437962 width=95)
                (actual time=0.056..1924645.797 rows=344491124 loops=1)
     ->  Merge Join  (cost=0.00..28369.58 rows=361427 width=87)
                     (actual time=0.039..4620.912 rows=361427 loops=1)
           Merge Cond: (a1.goannotation = a2.id)
           ->  Index Scan using genegoannotation__goannotation on genegoannotation a1
                     (cost=0.00..9710.32 rows=361427 width=8)
                     (actual time=0.015..840.547 rows=361427 loops=1)
           ->  Index Scan using goannotation_pkey on goannotation a2
                     (cost=0.00..13133.12 rows=403323 width=79)
                     (actual time=0.014..1427.179 rows=403323 loops=1)
     ->  Index Scan using ontologytermrelations__ontologyterm on ontologytermrelations a3
                     (cost=0.00..1.20 rows=47 width=8)
                     (actual time=0.022..1.908 rows=953 loops=361427)
           Index Cond: (a3.ontologyterm = a2.ontologytermid)
   Total runtime: 2524647.064 ms
(8 rows)

If I look at the actual results of the outer-most join, the nested loop,
then I can take the number rows=344491124 and divide it by loops=361427 to
get rows=953. Clearly this means that on average each index scan on a3
returned 953 rows.

However, if I apply the same logic to the estimated results, it all falls
apart. The total estimated number of rows is remarkably accurate, as is
the estimated number of loops (results from the merge join). However the
average number of rows expected to be returned from the index scan is only
47. I don't know how the planner is getting its accurate final estimate of
rows=330437962, because it is not from multiplying rows=361427 by rows=47.
That would only give 16987069 rows.

Any ideas/explanations?

Matthew

--
 To be or not to be           -- Shakespeare
 To do is to be               -- Nietzsche
 To be is to do               -- Sartre
 Do be do be do               -- Sinatra

Re: Explaining an EXPLAIN.

From
Tom Lane
Date:
Matthew Wakeling <matthew@flymine.org> writes:
> If I look at the actual results of the outer-most join, the nested loop,
> then I can take the number rows=344491124 and divide it by loops=361427 to
> get rows=953. Clearly this means that on average each index scan on a3
> returned 953 rows.

Right.

> However, if I apply the same logic to the estimated results, it all falls
> apart. The total estimated number of rows is remarkably accurate, as is
> the estimated number of loops (results from the merge join). However the
> average number of rows expected to be returned from the index scan is only
> 47. I don't know how the planner is getting its accurate final estimate of
> rows=330437962, because it is not from multiplying rows=361427 by rows=47.

No, it isn't.  The rowcount estimate for an inner indexscan is derived
based on the index conditions that are assigned to the scan.  It's not
used for anything except estimating the cost of that indexscan; in
particular, the size of the join relation was estimated long before we
even started to think about nestloop-with-inner-indexscan plans.
I don't have time to look right now, but I seem to recall there are some
constraints that mean it's often not a very good estimate.

            regards, tom lane

Re: Explaining an EXPLAIN.

From
Matthew Wakeling
Date:
On Wed, 10 Jun 2009, Tom Lane wrote:
> ...the size of the join relation was estimated long before we even
> started to think about nestloop-with-inner-indexscan plans.

That makes a lot of sense.

Matthew

--
Richards' Laws of Data Security:
 1. Don't buy a computer.
 2. If you must buy a computer, don't turn it on.