Thread: Unnecessary function calls
Hi, when using LIMIT, how do I tell the planner to only call a function for rows it returns? An example: I want to fetch the top five categories. A function get_category_text_path(cat_id int) returns the textual representation of the category. For that I do something like: SELECT id, get_category_text_path(id) FROM category ORDER BY rank LIMIT 5 Unfortunately this takes very long because it calls get_category_text_path() for all of the 450'000 categories in the table. But I only need the full text path of the top five rows. It does not matter if I declare the function to be IMMUTABLE, STABLE or VOLATILE - it gets called for every row in category (which normally is what you want I guess). How can I rewrite the query to call get_category_text_path() only for the top five rows? Thanks for hints. Markus
Markus Schiltknecht wrote: > Hi, > > when using LIMIT, how do I tell the planner to only call a function for > rows it returns? > > An example: I want to fetch the top five categories. A function > get_category_text_path(cat_id int) returns the textual representation of > the category. For that I do something like: > > SELECT id, get_category_text_path(id) > FROM category > ORDER BY rank > LIMIT 5 > > Unfortunately this takes very long because it calls > get_category_text_path() for all of the 450'000 categories in the table. > But I only need the full text path of the top five rows. SELECT id, get_category_text_path(id) FROM ( SELECT id FROM category ORDER BY rank LIMIT 5 ) AS foo HTH -- Richard Huxton Archonet Ltd
On Tue, May 02, 2006 at 01:37:54PM +0200, Markus Schiltknecht wrote: > Hi, > > when using LIMIT, how do I tell the planner to only call a function for > rows it returns? > > An example: I want to fetch the top five categories. A function > get_category_text_path(cat_id int) returns the textual representation of > the category. For that I do something like: > > SELECT id, get_category_text_path(id) > FROM category > ORDER BY rank > LIMIT 5 How about: SELECT id, get_category_text_path(id) FROM (SELECT id FROM category ORDER BY rank LIMIT 5) as x; Evidently you don't have an index on rank, otherwise it would've used the index to cut down on the number of rows that needed to be examined. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Hello Terry, Thanks a lot. That's so simple I didn't see it. (The original query is much more complex.) The only problem is, rank is not a column of category itself, but a joined row. With this solution, the join will have to be performed twice. But since this doesn't cost that much and because the second join is only done for 5 rows at the max this does not hurt. The more complete query now looks a little ugly: SELECT id, get_category_text_path(id), r.rank FROM category JOIN rank_lookup AS r ON cat_id = id WHERE id IN ( SELECT c.id FROM category AS c JOIN rank_lookup AS rr ON rr.cat_id = c.id ORDER BY rr.rank LIMIT 5 ) It's not possible to optimize out that second join, is it? Regards Markus On Tue, 2006-05-02 at 07:39 -0400, Terry Fielder wrote: > SELECT id, get_category_text_path(id) > FROM category > WHERE id IN ( > SELECT c.id > FROM category AS c > ORDER BY c.rank > LIMIT 5 > )
SELECT id, get_category_text_path(id) FROM category WHERE id IN ( SELECT c.id FROM category AS c ORDER BY c.rank LIMIT 5 ) Terry Fielder terry@greatgulfhomes.com Associate Director Software Development and Deployment Great Gulf Homes / Ashton Woods Homes Fax: (416) 441-9085 Markus Schiltknecht wrote: > Hi, > > when using LIMIT, how do I tell the planner to only call a function for > rows it returns? > > An example: I want to fetch the top five categories. A function > get_category_text_path(cat_id int) returns the textual representation of > the category. For that I do something like: > > SELECT id, get_category_text_path(id) > FROM category > ORDER BY rank > LIMIT 5 > > Unfortunately this takes very long because it calls > get_category_text_path() for all of the 450'000 categories in the table. > But I only need the full text path of the top five rows. > > It does not matter if I declare the function to be IMMUTABLE, STABLE or > VOLATILE - it gets called for every row in category (which normally is > what you want I guess). > > How can I rewrite the query to call get_category_text_path() only for > the top five rows? > > Thanks for hints. > > Markus > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > >
On Tue, 2006-05-02 at 14:02 +0200, Martijn van Oosterhout wrote: > How about: > > SELECT id, get_category_text_path(id) > FROM (SELECT id FROM category > ORDER BY rank > LIMIT 5) as x; Oh that works? Great! Let me see, with 'rank' from a joined table that looks like: SELECT id, get_category_text_path(id), rank FROM ( SELECT c.id FROM category AS c JOIN rank_lookup AS r ON r.cat_id = c.id ORDER BY r.rank LIMIT 5 ) as x; That works perfectly. It prevents a second join and thus solves the question in my previous mail. > Evidently you don't have an index on rank, otherwise it would've used > the index to cut down on the number of rows that needed to be examined. No, there is no index. I need to think about creating one... Thank you very much. Markus
1) If the join to rank_lookup is done across Pkey and/or unique indexed fields and a foreign key is defined for said join, I don't know how the subquery could possibly be more optimized then it is (the reasoning being beyond the scope of this discussion and possibly even beyond the scope of area in which I can safely comment :)
2) It is my understanding and experience (I could be unaware of a trick or special case specifics, however) that using an IN clause is LESS efficient then joining to the table. The only reason I used the in clause is because, as you indicated, you were only ask for the top five, which is a very small set (you probably would not want to do that if the set was large).
Indeed, the IN clause is a de-optimization, it only HAPPENS to make the query run faster because it allows you to avoid calling the select function for all but the selected 5 rows (which was the goal you requested)
3) In SQL there is almost always more then 1 way of doing something, you have now seen 2. There may be more, possibly even better ways. Experts care to comment? :)
Markus Schiltknecht wrote:
2) It is my understanding and experience (I could be unaware of a trick or special case specifics, however) that using an IN clause is LESS efficient then joining to the table. The only reason I used the in clause is because, as you indicated, you were only ask for the top five, which is a very small set (you probably would not want to do that if the set was large).
Indeed, the IN clause is a de-optimization, it only HAPPENS to make the query run faster because it allows you to avoid calling the select function for all but the selected 5 rows (which was the goal you requested)
3) In SQL there is almost always more then 1 way of doing something, you have now seen 2. There may be more, possibly even better ways. Experts care to comment? :)
Terry Fielder terry@greatgulfhomes.com Associate Director Software Development and Deployment Great Gulf Homes / Ashton Woods Homes Fax: (416) 441-9085
Markus Schiltknecht wrote:
Hello Terry, Thanks a lot. That's so simple I didn't see it. (The original query is much more complex.) The only problem is, rank is not a column of category itself, but a joined row. With this solution, the join will have to be performed twice. But since this doesn't cost that much and because the second join is only done for 5 rows at the max this does not hurt. The more complete query now looks a little ugly: SELECT id, get_category_text_path(id), r.rank FROM category JOIN rank_lookup AS r ON cat_id = id WHERE id IN ( SELECT c.id FROM category AS c JOIN rank_lookup AS rr ON rr.cat_id = c.id ORDER BY rr.rank LIMIT 5 ) It's not possible to optimize out that second join, is it? Regards Markus On Tue, 2006-05-02 at 07:39 -0400, Terry Fielder wrote:SELECT id, get_category_text_path(id) FROM category WHERE id IN ( SELECT c.id FROM category AS c ORDER BY c.rank LIMIT 5 )---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match