Thread: Correcting Hash Join Estimates
Hello! I posted a similar question to this one about a month ago; but, for some reason, it never seemed to be broadcast eventhough it ended up in the archives. So, since I'm still struggling with this, I thought I'd repost... I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below) shows that some hash join row estimates are wrong by a factor of 2-3, and upwards of 7-8. There is a corresponding mis-estimation of the amount of time taken for these steps. The database is vacuum analyzed nightly by a cron job. How would I go about tightening up these errors? I suspect that I need to SET STATISTIC on some columns, but how can I tell which columns? Any help would be appreciated. WinXP (dual Xeon 1.2GB RAM) PgSQL 8.0.1 Explain Analyze: <http://www.indeq.com/EA.txt> View Definition: <http://www.indeq.com/VGAUA.txt> The largest table contains about 10,000 rows. All tables have indexes on their foreign keys. Thanks! Mark
mark.lubratt@indeq.com writes: > I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below) > shows that some hash join row estimates are wrong by a factor of 2-3, > and upwards of 7-8. I doubt that improving those estimates would lead to markedly better results. You need to think about improving the view design instead. What context is this view used in --- do you just do "select * from view_get_all_user_award2", or are there conditions added to it, or perhaps it gets joined with other things? Do you really need the DISTINCT constraint? Do you really need the ORDER BY? Can you simplify the WHERE clause at all? Half a second sounds pretty decent to me for a ten-way join with a WHERE clause as unstructured as that. If you really need it to execute in way less time, you're probably going to have to rethink your data representation to make the query simpler. regards, tom lane
On Apr 4, 2005, at 12:54 AM, Tom Lane wrote: > mark.lubratt@indeq.com writes: >> I'm trying to optimize a query and the EXPLAIN ANALYZE (see link >> below) >> shows that some hash join row estimates are wrong by a factor of 2-3, >> and upwards of 7-8. > > I doubt that improving those estimates would lead to markedly better > results. You need to think about improving the view design instead. > What context is this view used in --- do you just do "select * from > view_get_all_user_award2", or are there conditions added to it, or > perhaps it gets joined with other things? Yes. I forgot to show how the query is executed... select * from view_get_all_user_award2 where person_id = 1; > Do you really need the > DISTINCT constraint? Yes. > Do you really need the ORDER BY? The customer wants an initial ordering in the displayed data. > Can you > simplify the WHERE clause at all? > I originally had a bunch of LEFT JOINs. After reading Tow's "SQL Tuning", I was hoping to steer the planner into a more "optimal" plan by using a large where clause instead and doing the joins there (I think they're called implicit joins). I was able to shave a couple of hundred milliseconds off the execution time by doing this. > Half a second sounds pretty decent to me for a ten-way join with a > WHERE > clause as unstructured as that. If you really need it to execute in > way > less time, you're probably going to have to rethink your data > representation to make the query simpler. > Unfortunately, I'm not sure I can restructure the data. I did consider materialized views. However, they couldn't be lazy and that seemed like a lot of extra work for the backend for very little improvement. If this sounds like decent performance to you... I guess I can just tell the complainers that it's as good as it's going to get (barring a major hardware upgrade...). Thanks! Mark