Re: plan_rows confusion with parallel queries - Mailing list pgsql-hackers

From Robert Haas
Subject Re: plan_rows confusion with parallel queries
Date
Msg-id CA+TgmobunWKyR9TorVeiuAE138Lzj=ss5CEHco+YX37T-7FHgQ@mail.gmail.com
Whole thread Raw
In response to Re: plan_rows confusion with parallel queries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Nov 2, 2016 at 4:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> while eye-balling some explain plans for parallel queries, I got a bit
>> confused by the row count estimates. I wonder whether I'm alone.
>
> I got confused by that a minute ago, so no you're not alone.  The problem
> is even worse in join cases.  For example:
>
>  Gather  (cost=34332.00..53265.35 rows=100 width=8)
>    Workers Planned: 2
>    ->  Hash Join  (cost=33332.00..52255.35 rows=100 width=8)
>          Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
>          ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
>                ->  Parallel Seq Scan on pp  (cost=0.00..8591.67 rows=416667 widt
> h=8)
>                ->  Parallel Seq Scan on pp1  (cost=0.00..23.29 rows=1329 width=8
> )
>          ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8)
>                ->  Seq Scan on cc  (cost=0.00..14425.00 rows=1000000 width=8)
>
> There are actually 1000000 rows in pp, and none in pp1.  I'm not bothered
> particularly by the nonzero estimate for pp1, because I know where that
> came from, but I'm not very happy that nowhere here does it look like
> it's estimating a million-plus rows going into the join.

I welcome suggestions for improvement, but you will note that if the
row count didn't reflect some kind of guess about the number of rows
that each individual worker will see, the costing would be hopelessly
broken.  The cost needs to reflect a guess about the time the query
will finish, not the total amount of effort expended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Confusing docs about GetForeignUpperPaths in fdwhandler.sgml
Next
From: Robert Haas
Date:
Subject: Re: plan_rows confusion with parallel queries