Re: same query different execution plan (hash join vs. semi-hash join) - Mailing list pgsql-performance

From Tom Lane
Subject Re: same query different execution plan (hash join vs. semi-hash join)
Date
Msg-id 21694.1400209082@sss.pgh.pa.us
Whole thread Raw
In response to same query different execution plan (hash join vs. semi-hash join)  ("Huang, Suya" <Suya.Huang@au.experian.com>)
Responses Re: same query different execution plan (hash join vs. semi-hash join)  ("Huang, Suya" <Suya.Huang@au.experian.com>)
List pgsql-performance
"Huang, Suya" <Suya.Huang@au.experian.com> writes:
> I've got a query as below, it runs several times with different execution plan and totally different execution time.
Theone using hash-join is slow and the one using semi-hash join is very fast. However, I have no control over the
optimizerbehavior of PostgreSQL database. Or, do I have? 

A salient feature of the slow plan is that the planner is misinformed
about the size of test1:

>                      ->  Seq Scan on test1  (cost=0.00..5153.94 rows=63294 width=516) (actual time=0.068..91.378
rows=441736loops=1) 

whereas in the fast plan its rows estimate for that scan is dead on.
It looks like the two cases also have different ideas of how many
distinct values are in the test1.userid column, though this is more a
guess than an indisputable fact.

In short, I suspect you're recreating the test1 table and not bothering
to ANALYZE it after you fill it.  This leaves you at the mercy of when
the autovacuum daemon gets around to analyzing the table before you'll
get good plans for it.

            regards, tom lane


pgsql-performance by date:

Previous
From: "Huang, Suya"
Date:
Subject: same query different execution plan (hash join vs. semi-hash join)
Next
From: tim_wilson
Date:
Subject: autovacuum vacuum creates bad statistics for planner when it log index scans: 0