Thread: Correcting Hash Join Estimates

Correcting Hash Join Estimates

From
mark.lubratt@indeq.com
Date:
Hello!

I posted a similar question to this one about a month ago; but, for some
reason, it never seemed to be broadcast eventhough it ended up in the
archives.  So, since I'm still struggling with this, I thought I'd
repost...

I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below)
shows that some hash join row estimates are wrong by a factor of 2-3,
and upwards of 7-8. There is a corresponding mis-estimation of the
amount of time taken for these steps. The database is vacuum analyzed
nightly by a cron job.  How would I go about tightening up these
errors?  I suspect that I need to SET STATISTIC on some columns, but
how can I tell which columns?

Any help would be appreciated.

WinXP (dual Xeon 1.2GB RAM) PgSQL 8.0.1
Explain Analyze:  <http://www.indeq.com/EA.txt>
View Definition: <http://www.indeq.com/VGAUA.txt>

The largest table contains about 10,000 rows.  All tables have indexes
on their foreign keys.

Thanks!
Mark



Re: Correcting Hash Join Estimates

From
Tom Lane
Date:
mark.lubratt@indeq.com writes:
> I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below)
> shows that some hash join row estimates are wrong by a factor of 2-3,
> and upwards of 7-8.

I doubt that improving those estimates would lead to markedly better
results.  You need to think about improving the view design instead.
What context is this view used in --- do you just do "select * from
view_get_all_user_award2", or are there conditions added to it, or
perhaps it gets joined with other things?  Do you really need the
DISTINCT constraint?  Do you really need the ORDER BY?  Can you
simplify the WHERE clause at all?

Half a second sounds pretty decent to me for a ten-way join with a WHERE
clause as unstructured as that.  If you really need it to execute in way
less time, you're probably going to have to rethink your data
representation to make the query simpler.

            regards, tom lane

Re: Correcting Hash Join Estimates

From
Mark Lubratt
Date:
On Apr 4, 2005, at 12:54 AM, Tom Lane wrote:

> mark.lubratt@indeq.com writes:
>> I'm trying to optimize a query and the EXPLAIN ANALYZE (see link
>> below)
>> shows that some hash join row estimates are wrong by a factor of 2-3,
>> and upwards of 7-8.
>
> I doubt that improving those estimates would lead to markedly better
> results.  You need to think about improving the view design instead.
> What context is this view used in --- do you just do "select * from
> view_get_all_user_award2", or are there conditions added to it, or
> perhaps it gets joined with other things?

Yes.  I forgot to show how the query is executed...

select * from view_get_all_user_award2 where person_id = 1;


>   Do you really need the
> DISTINCT constraint?

Yes.

> Do you really need the ORDER BY?

The customer wants an initial ordering in the displayed data.

> Can you
> simplify the WHERE clause at all?
>

I originally had a bunch of LEFT JOINs.  After reading Tow's "SQL
Tuning", I was hoping to steer the planner into a more "optimal" plan
by using a large where clause instead and doing the joins there (I
think they're called implicit joins).  I was able to shave a couple of
hundred milliseconds off the execution time by doing this.

> Half a second sounds pretty decent to me for a ten-way join with a
> WHERE
> clause as unstructured as that.  If you really need it to execute in
> way
> less time, you're probably going to have to rethink your data
> representation to make the query simpler.
>

Unfortunately, I'm not sure I can restructure the data.  I did consider
materialized views.  However, they couldn't be lazy and that seemed
like a lot of extra work for the backend for very little improvement.

If this sounds like decent performance to you...  I guess I can just
tell the complainers that it's as good as it's going to get (barring a
major hardware upgrade...).

Thanks!
Mark