Re: Weird explain output - Mailing list pgsql-general

From Tom Lane
Subject Re: Weird explain output
Date
Msg-id 857.1359964121@sss.pgh.pa.us
Whole thread Raw
In response to Weird explain output  (wd <wd@wdicc.com>)
Responses Re: Weird explain output  (wd <wd@wdicc.com>)
List pgsql-general
wd <wd@wdicc.com> writes:
> I have a long SQL with many join, the explain analyse output like
>    ->  Hash Left Join  (cost=197749705.15..5320438982425.18
> rows=5245599776015 width=7542) (actual time=674.634..2645.293 rows=4568
> loops=1)
>          Hash Cond: (p.id = ps.partner_id)
>          ->  Merge Join  (cost=197747693.55..78881918575.29
> rows=5245599776015 width=7486) (actual time=636.814..653.137 rows=4568
> loops=1)
>                Merge Cond: (a3.team_id = t.id)
>                ->  Sort  (cost=3509574.57..3554578.91 rows=18001735
> width=112) (actual time=518.318..519.221 rows=4569 loops=1)
>                      Sort Key: a3.team_id
>                      Sort Method: quicksort  Memory: 549kB
>                       ..................... sniped
>                ->  Materialize  (cost=194238118.98..194529512.87
> rows=58278779 width=7390) (actual time=118.488..122.408 rows=4569 loops=1)
>                      ->  Sort  (cost=194238118.98..194383815.92
> rows=58278779 width=7390) (actual time=118.486..119.723 rows=3687 loops=1)
>                            Sort Key: t.id
>                            Sort Method: quicksort  Memory: 6951kB
>                        ............................. sniped
>          ->  Hash  (cost=1917.17..1917.17 rows=7555 width=68) (actual
> time=37.578..37.578 rows=7555 loops=1)
>                Buckets: 1024  Batches: 1  Memory Usage: 958kB
>                 ............................ sniped

> We can see the 'Hash Left Join node' spend 2645.293 ms, so I change the sql
> and remove the left join node, and expect the SQL will end with 653.137 ms.
> But the explain analyse output like

>    ->  Merge Join  (cost=197548498.05..314933709300.46 rows=5245599776015
> width=7478) (actual time=775.495..2744.759 rows=4568 loops=1)
>          Merge Cond: (a3.team_id = t.id)
>          ->  Sort  (cost=3509574.57..3554578.91 rows=18001735 width=112)
> (actual time=540.636..541.392 rows=4569 loops=1)
>                Sort Key: a3.team_id
>                Sort Method: quicksort  Memory: 549kB
>                .......................... sniped
>          ->  Materialize  (cost=194038923.48..194330317.37 rows=58278779
> width=7382) (actual time=234.632..238.065 rows=4569 loops=1)
>                ->  Sort  (cost=194038923.48..194184620.42 rows=58278779
> width=7382) (actual time=234.628..235.744 rows=3687 loops=1)
>                      Sort Key: t.id
>                      Sort Method: quicksort  Memory: 6899kB
>                  ......................... sniped

> We can see the 'Merge Join' spend 2744.759 ms. The sniped output under the
> 'Merge Join' node in the two explain output are similar, Why does this
> happen?

You're worrying about the wrong thing entirely.  The right thing to be
worrying about is why are some of those row estimates off by four orders
of magnitude, and what you can do to fix that.  The planner will never
deliver sane plans when its estimates are so far off from reality.

From my perspective, you snipped all the interesting parts of this
output, because the misestimation is evidently happening somewhere down
in there.

            regards, tom lane

pgsql-general by date:

Previous
From: wd
Date:
Subject: Weird explain output
Next
From: Tom Lane
Date:
Subject: Re: Using partial index in combination with prepared statement parameters