Thread: reading EXPLAIN output

reading EXPLAIN output

From
David Rysdam
Date:
merge join (cost=0.00..348650.65 rows=901849 width=12)
  merge cond {blah}
  join filter {blah}
     index scan using {blah index on blah} (cost=0.00..289740.65
rows=11259514 width=8)
     index scan using {blah index on blah} (cost=0.00..17229.93
rows=902085 width=8)

This query takes about 3 minutes to run and I'm trying to figure out
why.  From a tutorial and the docs, I gather that the "..largenum" part
is the number of page reads required, so I understand where 289740 and
17229 come from.  But what about 348650 page reads for the "merge
join"?  My conjecture is that the joined keys are being stored on disk
(if that's how the internals of postgresql works) and have to be re-read
for the rest of the query.  Is that right?  Does that mean I could speed
this up by giving more RAM to store it in?

When I do EXPLAIN ANALYZE, the actual values come out like this:

merge join: (actual time=170029.404..170029.404)
index scan: (actual time=27.653..84373.805)
index scan: (actual time=45.681..7026.928)

This seems to confirm that it's the final "merge join" that takes
forever.  Because it is writing to and reading from disk?

Re: reading EXPLAIN output

From
Tom Lane
Date:
David Rysdam <drysdam@ll.mit.edu> writes:
> merge join (cost=0.00..348650.65 rows=901849 width=12)
>   merge cond {blah}
>   join filter {blah}
>      index scan using {blah index on blah} (cost=0.00..289740.65
> rows=11259514 width=8)
>      index scan using {blah index on blah} (cost=0.00..17229.93
> rows=902085 width=8)

> This query takes about 3 minutes to run and I'm trying to figure out
> why.  From a tutorial and the docs, I gather that the "..largenum" part
> is the number of page reads required, so I understand where 289740 and
> 17229 come from.  But what about 348650 page reads for the "merge
> join"?

You're misreading it.  An upper node's cost includes the cost of its
children.  So the actual cost estimate for the join step is 41680.07.

> When I do EXPLAIN ANALYZE, the actual values come out like this:

> merge join: (actual time=170029.404..170029.404)

That seems a bit odd ... is there only one row produced?  Could you show
us the entire EXPLAIN ANALYZE output, rather than your assumptions about
what's important?

Increasing work_mem won't help a merge join, but if you can get it large
enough to allow a hash join to be used instead, that might be a win.

            regards, tom lane

Re: reading EXPLAIN output

From
David Rysdam
Date:
Tom Lane wrote:

>David Rysdam <drysdam@ll.mit.edu> writes:
>
>
>>merge join (cost=0.00..348650.65 rows=901849 width=12)
>>  merge cond {blah}
>>  join filter {blah}
>>     index scan using {blah index on blah} (cost=0.00..289740.65
>>rows=11259514 width=8)
>>     index scan using {blah index on blah} (cost=0.00..17229.93
>>rows=902085 width=8)
>>
>>
>
>
>
>>This query takes about 3 minutes to run and I'm trying to figure out
>>why.  From a tutorial and the docs, I gather that the "..largenum" part
>>is the number of page reads required, so I understand where 289740 and
>>17229 come from.  But what about 348650 page reads for the "merge
>>join"?
>>
>>
>
>You're misreading it.  An upper node's cost includes the cost of its
>children.  So the actual cost estimate for the join step is 41680.07.
>
>
>
>>When I do EXPLAIN ANALYZE, the actual values come out like this:
>>
>>
>
>
>
>>merge join: (actual time=170029.404..170029.404)
>>
>>
>
>That seems a bit odd ... is there only one row produced?  Could you show
>us the entire EXPLAIN ANALYZE output, rather than your assumptions about
>what's important?
>
>Increasing work_mem won't help a merge join, but if you can get it large
>enough to allow a hash join to be used instead, that might be a win.
>
>            regards, tom lane
>
>
>
>
I'm looking for certain anomalies, so the end result should be zero rows.

merge join (cost=0.00..348650.65 rows=901849 width=12)  (actual
time=170029.404..170029.404 rows=0 loops=1)
merge cond {blah}
join filter {blah}
  index scan using {blah index on blah} (cost=0.00..289740.65
rows=11259514 width=8) (actual time=29.227..85932.426 rows=11256725 loops=1)
  index scan using {blah index on blah} (cost=0.00..17229.93 rows=902085
width=8) (actual time=39.896..6766.755 rows=902236 loops=1)
Total runtime: 172469.209 ms