Thread: Explaining an EXPLAIN.
Alright. I have finally worked out why I was being censored. Majordomo doesn't like subject lines beginning with the word "help". It by default sends your message off to the moderators and doesn't tell you. Now follows my original mail: Hi. I thought by now I would be fairly good at understanding EXPLAIN ANALYSE results, but I can't quite figure this one out. Perhaps someone could help me. EXPLAIN ANALYSE SELECT * FROM GeneGoAnnotation a1, GOAnnotation a2, OntologyTermRelations a3 WHERE a1.GoAnnotation = a2.id AND a2.ontologyTermId = a3.OntologyTerm; QUERY PLAN --------------------------------------------------------------------------------------- Nested Loop (cost=0.00..673587.67 rows=330437962 width=95) (actual time=0.056..1924645.797 rows=344491124 loops=1) -> Merge Join (cost=0.00..28369.58 rows=361427 width=87) (actual time=0.039..4620.912 rows=361427 loops=1) Merge Cond: (a1.goannotation = a2.id) -> Index Scan using genegoannotation__goannotation on genegoannotation a1 (cost=0.00..9710.32 rows=361427 width=8) (actual time=0.015..840.547 rows=361427 loops=1) -> Index Scan using goannotation_pkey on goannotation a2 (cost=0.00..13133.12 rows=403323 width=79) (actual time=0.014..1427.179 rows=403323 loops=1) -> Index Scan using ontologytermrelations__ontologyterm on ontologytermrelations a3 (cost=0.00..1.20 rows=47 width=8) (actual time=0.022..1.908 rows=953 loops=361427) Index Cond: (a3.ontologyterm = a2.ontologytermid) Total runtime: 2524647.064 ms (8 rows) If I look at the actual results of the outer-most join, the nested loop, then I can take the number rows=344491124 and divide it by loops=361427 to get rows=953. Clearly this means that on average each index scan on a3 returned 953 rows. However, if I apply the same logic to the estimated results, it all falls apart. The total estimated number of rows is remarkably accurate, as is the estimated number of loops (results from the merge join). However the average number of rows expected to be returned from the index scan is only 47. I don't know how the planner is getting its accurate final estimate of rows=330437962, because it is not from multiplying rows=361427 by rows=47. That would only give 16987069 rows. Any ideas/explanations? Matthew -- To be or not to be -- Shakespeare To do is to be -- Nietzsche To be is to do -- Sartre Do be do be do -- Sinatra
Matthew Wakeling <matthew@flymine.org> writes: > If I look at the actual results of the outer-most join, the nested loop, > then I can take the number rows=344491124 and divide it by loops=361427 to > get rows=953. Clearly this means that on average each index scan on a3 > returned 953 rows. Right. > However, if I apply the same logic to the estimated results, it all falls > apart. The total estimated number of rows is remarkably accurate, as is > the estimated number of loops (results from the merge join). However the > average number of rows expected to be returned from the index scan is only > 47. I don't know how the planner is getting its accurate final estimate of > rows=330437962, because it is not from multiplying rows=361427 by rows=47. No, it isn't. The rowcount estimate for an inner indexscan is derived based on the index conditions that are assigned to the scan. It's not used for anything except estimating the cost of that indexscan; in particular, the size of the join relation was estimated long before we even started to think about nestloop-with-inner-indexscan plans. I don't have time to look right now, but I seem to recall there are some constraints that mean it's often not a very good estimate. regards, tom lane
On Wed, 10 Jun 2009, Tom Lane wrote: > ...the size of the join relation was estimated long before we even > started to think about nestloop-with-inner-indexscan plans. That makes a lot of sense. Matthew -- Richards' Laws of Data Security: 1. Don't buy a computer. 2. If you must buy a computer, don't turn it on.