Thread: Yet another question on LIMIT performance :/
Though I've read recent threads, I'm unsure if any matches my case. We have 2 tables: revisions and revisions_active. revisions contains 117707 rows, revisions_active 17827 rows. DDL: http://hannes.imos.net/ddl.sql.txt Joining the 2 tables without an additional condition seems ok for me (given our outdated hardware): http://hannes.imos.net/query_1.sql.txt What worries me is the performance when limiting the recordset: http://hannes.imos.net/query_2.sql.txt Though it should only have to join a few rows it seems to scan all rows. From experience I thought that adding an ORDER BY on the index columns should speed it up. But no effect: http://hannes.imos.net/query_3.sql.txt I'm on 8.1.5, statistics (ANALYZE) are up to date, the tables have each been CLUSTERed by PK, statistic target for the join columns has been set to 100 (without any effect). Thanks in advance! -- Regards, Hannes Dorbath
Hannes Dorbath wrote: > Though it should only have to join a few rows it seems to scan all rows. What makes you think that's the case? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 06.11.2006 15:13, Heikki Linnakangas wrote: > Hannes Dorbath wrote: >> Though it should only have to join a few rows it seems to scan all rows. > > What makes you think that's the case? Sorry, not all rows, but 80753. It's not clear to me why this number is so high with LIMIT 10. -- Regards, Hannes Dorbath
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Hannes Dorbath wrote: >> Though it should only have to join a few rows it seems to scan all rows. > What makes you think that's the case? What it looks like to me is that the range of keys present in pk_revisions_active corresponds to just the upper end of the range of keys present in pk_revisions (somehow not too surprising). So the mergejoin isn't the most effective plan possible for this case --- it has to scan through much of pk_revisions before it starts getting matches. The planner doesn't have any model for that though, and is costing the plan on the assumption of uniformly-distributed matches. A nestloop plan would be faster for this specific case, but much slower if a large number of rows were requested. regards, tom lane