Thread: Unnecessary function calls

Unnecessary function calls

From
Markus Schiltknecht
Date:
Hi,

when using LIMIT, how do I tell the planner to only call a function for
rows it returns?

An example: I want to fetch the top five categories. A function
get_category_text_path(cat_id int) returns the textual representation of
the category. For that I do something like:

SELECT id, get_category_text_path(id)
   FROM category
   ORDER BY rank
   LIMIT 5

Unfortunately this takes very long because it calls
get_category_text_path() for all of the 450'000 categories in the table.
But I only need the full text path of the top five rows.

It does not matter if I declare the function to be IMMUTABLE, STABLE or
VOLATILE - it gets called for every row in category (which normally is
what you want I guess).

How can I rewrite the query to call get_category_text_path() only for
the top five rows?

Thanks for hints.

Markus



Re: Unnecessary function calls

From
Richard Huxton
Date:
Markus Schiltknecht wrote:
> Hi,
>
> when using LIMIT, how do I tell the planner to only call a function for
> rows it returns?
>
> An example: I want to fetch the top five categories. A function
> get_category_text_path(cat_id int) returns the textual representation of
> the category. For that I do something like:
>
> SELECT id, get_category_text_path(id)
>    FROM category
>    ORDER BY rank
>    LIMIT 5
>
> Unfortunately this takes very long because it calls
> get_category_text_path() for all of the 450'000 categories in the table.
> But I only need the full text path of the top five rows.

SELECT id, get_category_text_path(id)
FROM (
     SELECT id FROM category ORDER BY rank LIMIT 5
) AS foo

HTH
--
   Richard Huxton
   Archonet Ltd

Re: Unnecessary function calls

From
Martijn van Oosterhout
Date:
On Tue, May 02, 2006 at 01:37:54PM +0200, Markus Schiltknecht wrote:
> Hi,
>
> when using LIMIT, how do I tell the planner to only call a function for
> rows it returns?
>
> An example: I want to fetch the top five categories. A function
> get_category_text_path(cat_id int) returns the textual representation of
> the category. For that I do something like:
>
> SELECT id, get_category_text_path(id)
>    FROM category
>    ORDER BY rank
>    LIMIT 5

How about:

SELECT id, get_category_text_path(id)
FROM (SELECT id FROM category
    ORDER BY rank
    LIMIT 5) as x;

Evidently you don't have an index on rank, otherwise it would've used
the index to cut down on the number of rows that needed to be examined.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Unnecessary function calls

From
Markus Schiltknecht
Date:
Hello Terry,

Thanks a lot. That's so simple I didn't see it. (The original query is
much more complex.)

The only problem is, rank is not a column of category itself, but a
joined row. With this solution, the join will have to be performed
twice. But since this doesn't cost that much and because the second join
is only done for 5 rows at the max this does not hurt.

The more complete query now looks a little ugly:

SELECT id, get_category_text_path(id), r.rank
FROM category
JOIN rank_lookup AS r ON cat_id = id
WHERE id IN (
    SELECT c.id
    FROM category AS c
    JOIN rank_lookup AS rr ON rr.cat_id = c.id
    ORDER BY rr.rank
    LIMIT 5
)

It's not possible to optimize out that second join, is it?

Regards

Markus

On Tue, 2006-05-02 at 07:39 -0400, Terry Fielder wrote:
> SELECT id, get_category_text_path(id)
> FROM category
> WHERE id IN (
>    SELECT c.id
>    FROM category AS c
>    ORDER BY c.rank
>    LIMIT 5
> )



Re: Unnecessary function calls

From
Terry Fielder
Date:
SELECT id, get_category_text_path(id)
FROM category
WHERE id IN (
   SELECT c.id
   FROM category AS c
   ORDER BY c.rank
   LIMIT 5
)

Terry Fielder
terry@greatgulfhomes.com
Associate Director Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
Fax: (416) 441-9085



Markus Schiltknecht wrote:
> Hi,
>
> when using LIMIT, how do I tell the planner to only call a function for
> rows it returns?
>
> An example: I want to fetch the top five categories. A function
> get_category_text_path(cat_id int) returns the textual representation of
> the category. For that I do something like:
>
> SELECT id, get_category_text_path(id)
>    FROM category
>    ORDER BY rank
>    LIMIT 5
>
> Unfortunately this takes very long because it calls
> get_category_text_path() for all of the 450'000 categories in the table.
> But I only need the full text path of the top five rows.
>
> It does not matter if I declare the function to be IMMUTABLE, STABLE or
> VOLATILE - it gets called for every row in category (which normally is
> what you want I guess).
>
> How can I rewrite the query to call get_category_text_path() only for
> the top five rows?
>
> Thanks for hints.
>
> Markus
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>

Re: Unnecessary function calls

From
Markus Schiltknecht
Date:
On Tue, 2006-05-02 at 14:02 +0200, Martijn van Oosterhout wrote:
> How about:
>
> SELECT id, get_category_text_path(id)
> FROM (SELECT id FROM category
>     ORDER BY rank
>     LIMIT 5) as x;

Oh that works? Great!

Let me see, with 'rank' from a joined table that looks like:

SELECT id, get_category_text_path(id), rank
FROM (
    SELECT c.id
    FROM category AS c
    JOIN rank_lookup AS r ON r.cat_id = c.id
    ORDER BY r.rank
    LIMIT 5
) as x;

That works perfectly. It prevents a second join and thus solves the
question in my previous mail.

> Evidently you don't have an index on rank, otherwise it would've used
> the index to cut down on the number of rows that needed to be examined.

No, there is no index. I need to think about creating one...

Thank you very much.

Markus



Re: Unnecessary function calls

From
Terry Fielder
Date:
1) If the join to rank_lookup is done across Pkey and/or unique indexed fields and a foreign key is defined for said join, I don't know how the subquery could possibly be more optimized then it is (the reasoning being beyond the scope of this discussion and possibly even beyond the scope of area in which I can safely comment :)

2) It is my understanding and experience (I could be unaware of a trick or special case specifics, however) that using an IN clause is LESS efficient then joining to the table.  The only reason I used the in clause is because, as you indicated, you were only ask for the top five, which is a very small set (you probably would not want to do that if the set was large).
Indeed, the IN clause is a de-optimization, it only HAPPENS to make the query run faster because it allows you to avoid calling the select function for all but the selected 5 rows (which was the goal you requested)

3) In SQL there is almost always more then 1 way of doing something, you have now seen 2.  There may be more, possibly even better ways.  Experts care to comment?  :)

Terry Fielder
terry@greatgulfhomes.com
Associate Director Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
Fax: (416) 441-9085


Markus Schiltknecht wrote:
Hello Terry,

Thanks a lot. That's so simple I didn't see it. (The original query is
much more complex.)

The only problem is, rank is not a column of category itself, but a
joined row. With this solution, the join will have to be performed
twice. But since this doesn't cost that much and because the second join
is only done for 5 rows at the max this does not hurt.

The more complete query now looks a little ugly:

SELECT id, get_category_text_path(id), r.rank
FROM category
JOIN rank_lookup AS r ON cat_id = id
WHERE id IN (   SELECT c.id   FROM category AS c   JOIN rank_lookup AS rr ON rr.cat_id = c.id   ORDER BY rr.rank   LIMIT 5
)

It's not possible to optimize out that second join, is it?

Regards

Markus

On Tue, 2006-05-02 at 07:39 -0400, Terry Fielder wrote: 
SELECT id, get_category_text_path(id)
FROM category
WHERE id IN (  SELECT c.id  FROM category AS c  ORDER BY c.rank  LIMIT 5
)   


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's datatypes do not      match