Wrong estimation of rows for hash join - Mailing list pgsql-general

From Christian Schröder
Subject Wrong estimation of rows for hash join
Date
Msg-id 4AD835EE.7060504@deriva.de
Whole thread Raw
Responses Re: Wrong estimation of rows for hash join
List pgsql-general
Hi list,
I have the following query:
    SELECT *
    FROM base
    INNER JOIN pt USING (x) WHERE pt.y IN ('1121', '11411', '11421',
'1161', '1162');

"explain analyze" yields the following result:

                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=529.18..164930.70 rows=28374 width=0) (actual
time=10.834..4877.326 rows=245298 loops=1)
   Hash Cond: (b.x = pt.x)
   ->  Seq Scan on b  (cost=0.00..159579.93 rows=1210093 width=4)
(actual time=0.018..2464.871 rows=1210250 loops=1)
   ->  Hash  (cost=527.41..527.41 rows=142 width=4) (actual
time=10.775..10.775 rows=138 loops=1)
         ->  Seq Scan on pt  (cost=0.00..527.41 rows=142 width=4)
(actual time=0.057..10.556 rows=138 loops=1)
               Filter: (y = ANY ('{1121,11411,11421,1161,1162}'::bpchar[]))
 Total runtime: 5170.837 ms

As you can see, the estimated number of rows in the join is much lower
than the actual number. Normally, the join is part of a more complex
query which gets really slow, probably (among other reasons) because the
query planner uses the incorrect row estimate for its further plans.

Question is: What can I do to improve that estimation? There is a
foreign key from base.x to pt.x and both fields are declared not null.

Side question: What can I do to improve the speed of the hash join
itself? I understand that 2.5 secs are needed to perform the sequential
scan of table b, but where do the remaining 2.5 secs come from?

Thanks a lot in advance,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



pgsql-general by date:

Previous
From: Andale
Date:
Subject: Re: could not open process token: error code 5
Next
From: Arnaud Lesauvage
Date:
Subject: pgsql2shp : Encoding headache