Re: [SQL] ORDER BY Optimization - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: [SQL] ORDER BY Optimization |
Date | |
Msg-id | 6227.1115506545@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [SQL] ORDER BY Optimization (Derek Buttineau|Compu-SOLVE <derek@csolve.net>) |
Responses |
Re: ORDER BY Optimization
|
List | pgsql-performance |
[ cc list limited to -performance ] Derek Buttineau|Compu-SOLVE <derek@csolve.net> writes: >> It seems to me a merge join might be more appropriate here than a >> nestloop. After some experimentation, I don't seem to be able to get the planner to generate a mergejoin based on a backwards index scan. I suspect it's not considering the idea of a merge using descending order at all. Might be a good enhancement, although we'd need to figure out how to keep this from just uselessly doubling the number of mergejoin paths considered :-( In the meantime, the nestloop is the only hope for avoiding a full-scan-and-sort. > I'm not entirely sure what's up with the row-count estimates, the tables > are updated quite frequently (and VACUUM is also run quite frequently), They're probably not as bad as they look. The estimates for the lower nodes are made without regard to the LIMIT, but the actuals of course reflect the fact that the LIMIT stopped execution of the plan early. The problem with this query is that the "fast" plan depends on the assumption that as we scan in backwards m.msg_date order, it won't take very long to find 10 rows that join to mr rows with mr.subscription=89. If that's true then the plan wins, if it's not true then the plan can lose big. That requires a pretty good density of rows with subscription=89, and in particular a good density near the end of the msg_date order. The planner effectively assumes that the proportion of rows with subscription=89 isn't changing over time, but perhaps it is --- if there's been a lot recently that could explain why the "fast" plan is fast. In any case I suppose that the reason the larger server doesn't want to try that plan is that its statistics show a much lower density of rows with subscription=89, and so the plan doesn't look promising compared to something that wins if there are few rows with subscription=89 ... which the other plan does. You could probably get your larger server to try the no-sort plan if you said "set enable_sort = 0" first. It would be interesting to compare the EXPLAIN ANALYZE results for that case with the other server. The contents of the pg_stats row for mr.subscription in each server would be informative, too. One rowcount estimate that does look wrong is -> Index Scan using maillog_received_subscription_idx on maillog_received mr (cost=0.00..17789.73 rows=4479 width=43) (actual time=0.030..33554.061 rows=65508 loops=1) Index Cond: (subscription = 89) so the stats row is suggesting there are only 4479 rows with subscription = 89 when really there are 65508. (The preceding discussion hopefully makes it clear why this is a potentially critical mistake.) This suggests that you may need to raise your statistics targets. regards, tom lane
pgsql-performance by date: