Re: estimates for nested loop very wrong? - Mailing list pgsql-sql

From Stephan Szabo
Subject Re: estimates for nested loop very wrong?
Date
Msg-id 20030410071328.M79630-100000@megazone23.bigpanda.com
Whole thread Raw
In response to estimates for nested loop very wrong?  (joostje@komputilo.org)
List pgsql-sql
On Thu, 10 Apr 2003 joostje@komputilo.org wrote:

> It really puzzles me why the estimate for the Nested Loop is so bad,
> as it looks like a rather easy thing to estimate...
>
> Below:
> table db   has 2.5e6 entries, column "id" has rather evenly distributed values.
>            column id has a (btree) index.
> table tmp1 has 23 entries, column "v0" has all unique entries.
> psql version: 7.2.1
>
> Here is the query twice, once with enable_hashjoin ON, second time
> with enable_hashjoin OFF, to force psql to use Nested Loop
>
> ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
> NOTICE:  QUERY PLAN:
>
> Hash Join  (cost=1.29..67863.71 rows=61140 width=38) (actual time=4475.26..32442.99 rows=756 loops=1)
>   ->  Seq Scan on db  (cost=0.00..54498.12 rows=2520012 width=31) (actual time=0.07..29170.62 rows=2520012 loops=1)
>   ->  Hash  (cost=1.23..1.23 rows=23 width=7) (actual time=0.25..0.25 rows=0 loops=1)
>         ->  Seq Scan on tmp1  (cost=0.00..1.23 rows=23 width=7) (actual time=0.17..0.22 rows=23 loops=1)
> Total runtime: 32443.78 msec
>
>
> --Setting hashjoin off, forcing psql to use the Nested Loop
> ueadb=> set enable_hashjoin = off;
>
>
> ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)
>   ->  Seq Scan on tmp1  (cost=0.00..1.23 rows=23 width=7) (actual time=0.24..0.39 rows=23 loops=1)
>   ->  Index Scan using db_id_idx on db  (cost=0.00..9021.35 rows=2658 width=31) (actual time=0.32..0.69 rows=33
loops=23)

It seems to be misestimating the number of rows to return on db.  That's
probably why the cost is so wrong (it's over estimating by nearly a factor
of 100).  Have you analyzed db recently?


> I guess I'll be doing my queries with enable_hashjoin OFF, but is there anythign I'm
> doing wrong?
> (Apart from maybe uzing psql 7.2.1 -- would 7.3 be smarter here?)

It might.



pgsql-sql by date:

Previous
From: Christoph Haller
Date:
Subject: Re: Extraordinary Full Join
Next
From: Tom Lane
Date:
Subject: Re: estimates for nested loop very wrong?