Thread: Important speed difference between a query and a function with the same query

Important speed difference between a query and a function with the same query

From
Frederic Jolliton
Date:
Hi,

(PostgreSQL 7.3.2 on i386-portbld-freebsd4.7, compiled by GCC 2.95.4)

I've a curious performance problem with a function returning set of
rows.  The query alone is very fast, but not when called from the
function.

To "emulate" a parametred view¹, I created a function as follow:

CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
    AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
    LANGUAGE sql;

The table table1 have 330K rows, and table2 have 3K rows.

When I run the following query (prefixed with SELECT * to try to get
the same behavior that the second query), I obtain very good time.

database=# SELECT * FROM (
    (SELECT a.field1,a.field2,a.field3,b.field3,b.field4,a.field5
        FROM table1 AS a, table1 AS b
        WHERE a.field6=b.field4
        ORDER BY a.field6 DESC
        LIMIT 10)
    UNION
    (SELECT a.field1,a.field2,b.field3,a.field3,a.field4,b.field5
        FROM table2 AS a, table1 AS b
        WHERE a.field4=b.field6
        ORDER BY a.field4 DESC
        LIMIT 10)
    ORDER BY field4 DESC
    LIMIT 10
) AS type_get_info;

[...]
(10 rows)

Time: 185.86 ms

But, when I run the function (with 10 as parameter, but even 1 is
slow) I get poor time:

database=# SELECT * FROM get_info(10);
[...]
(10 rows)

Time: 32782.26 ms
database=#

(even after a VACUUM FULL ANALYZE, and REINDEX of indexes used in the
queries)

What is curious is that I remember that the function was fast at a
time..

What is the difference between the two case ?

[1] Is there another solution to this 'hack' ? I can't simply create a
view and use 'LIMIT 10' because intermediate SELECT have be limited
too (to avoid UNION with 300K rows where only the first 10 are of
interest to me.)

--
Frédéric Jolliton


Re: Important speed difference between a query and a

From
Frederic Jolliton
Date:
> (PostgreSQL 7.3.2 on i386-portbld-freebsd4.7, compiled by GCC 2.95.4)
>
> I've a curious performance problem with a function returning set of
> rows.  The query alone is very fast, but not when called from the
> function.
>
> To "emulate" a parametred view, I created a function as follow:
>
> CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
>     AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
>     LANGUAGE sql;

Setting enable_seqscan to off give same result speed between the query
and the function !

So, the query in the function is not using index but the exact same
query alone does !

Is there an explanation ?

--
Frédéric Jolliton


Re: Important speed difference between a query and a

From
Tom Lane
Date:
Frederic Jolliton <fred-pg@jolliton.com> writes:
>> To "emulate" a parametred view, I created a function as follow:
>>
>> CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
>> AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
>> LANGUAGE sql;

> So, the query in the function is not using index but the exact same
> query alone does !

But it's not the same query, is it?  With "LIMIT $1" the planner can't
know what the limit value is exactly, so it has to generate a plan that
won't be too unreasonable for either a small or a large limit.

            regards, tom lane


Re: Important speed difference between a query and a

From
Stephan Szabo
Date:
On Thu, 24 Apr 2003, Frederic Jolliton wrote:

> > (PostgreSQL 7.3.2 on i386-portbld-freebsd4.7, compiled by GCC 2.95.4)
> >
> > I've a curious performance problem with a function returning set of
> > rows.  The query alone is very fast, but not when called from the
> > function.
> >
> > To "emulate" a parametred view, I created a function as follow:
> >
> > CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
> >     AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
> >     LANGUAGE sql;
>
> Setting enable_seqscan to off give same result speed between the query
> and the function !
>
> So, the query in the function is not using index but the exact same
> query alone does !
>
> Is there an explanation ?

My guess is that limit $1 is assuming a larger number of rows when
planning the queries, large enough that it expects seqscan to be better
(assuming the limit is what it expects).  It's probably not going to plan
that query each time the function is called so it's not going to know
whether you're calling with a small number (index scan may be better) or a
large number (seq scan may be better).  For example, if you sent 100000,
the index scan might be a loser.

Perhaps plpgsql with EXECUTE would work better for that, although it's
likely to have some general overhead.


Re: Important speed difference between a query and a

From
Frederic Jolliton
Date:
> Frederic Jolliton <fred-pg@jolliton.com> writes:
>>> To "emulate" a parametred view, I created a function as follow:
>>>
>>> CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
>>> AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
>>> LANGUAGE sql;
>
>> So, the query in the function is not using index but the exact same
>> query alone does !

Tom Lane <tgl@sss.pgh.pa.us> writes:
> But it's not the same query, is it?  With "LIMIT $1" the planner can't
> know what the limit value is exactly, so it has to generate a plan that
> won't be too unreasonable for either a small or a large limit.

Ok. So the query is optimized once and not each time.. I understand
now.

But, since I "know" better that PostgreSQL that query must use index
in most of case, can I force in some manner the function when
declaring it to take this in account ? I suppose (not tested) that
setting enable_seqscan just before will probably do it, but what about
dump/restore of the database when recreating the function and keep
this "fix" automatically ?


Re: Important speed difference between a query and a

From
Frederic Jolliton
Date:
> On Thu, 24 Apr 2003, Frederic Jolliton wrote:
>> > CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
>> >     AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
>> >     LANGUAGE sql;
>>
>> Setting enable_seqscan to off give same result speed between the query
>> and the function !
>>
>> So, the query in the function is not using index but the exact same
>> query alone does !
>>
>> Is there an explanation ?

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> My guess is that limit $1 is assuming a larger number of rows when
> planning the queries, large enough that it expects seqscan to be
> better (assuming the limit is what it expects).  It's probably not
> going to plan that query each time the function is called so it's
> not going to know whether you're calling with a small number (index
> scan may be better) or a large number (seq scan may be better).  For
> example, if you sent 100000, the index scan might be a loser.
>
> Perhaps plpgsql with EXECUTE would work better for that, although
> it's likely to have some general overhead.

The server is rather fast, and the query return 10 to 50 rows in most
case. So, this is probably a solution, even if it's not very
clean. (Well, I have to find an example to RETURN the result of
EXECUTE..)

But, what I don't understand is why enable_seqscan change something if
the query is already planed.

--
Frédéric Jolliton


Re: Important speed difference between a query and a

From
Stephan Szabo
Date:
On Thu, 24 Apr 2003, Frederic Jolliton wrote:

> > On Thu, 24 Apr 2003, Frederic Jolliton wrote:
> > Perhaps plpgsql with EXECUTE would work better for that, although
> > it's likely to have some general overhead.
>
> The server is rather fast, and the query return 10 to 50 rows in most
> case. So, this is probably a solution, even if it's not very
> clean. (Well, I have to find an example to RETURN the result of
> EXECUTE..)

Check out
http://techdocs.postgresql.org/guides/SetReturningFunctions

specifically the GetRows() function for an example of using for in execute
with set returning functions.


Re: Important speed difference between a query and a

From
Frederic Jolliton
Date:
> On Thu, 24 Apr 2003, Frederic Jolliton wrote:
[...]
>> The server is rather fast, and the query return 10 to 50 rows in
>> most case. So, this is probably a solution, even if it's not very
>> clean. (Well, I have to find an example to RETURN the result of
>> EXECUTE..)

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> Check out
> http://techdocs.postgresql.org/guides/SetReturningFunctions
>
> specifically the GetRows() function for an example of using for in
> execute with set returning functions.

Oh right. Thanks you for pointing out this.

--
Frédéric Jolliton


Re: Important speed difference between a query and a

From
Hannu Krosing
Date:
Frederic Jolliton kirjutas N, 24.04.2003 kell 19:33:
> > Frederic Jolliton <fred-pg@jolliton.com> writes:
> >>> To "emulate" a parametred view, I created a function as follow:
> >>>
> >>> CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
> >>> AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
> >>> LANGUAGE sql;
> >
> >> So, the query in the function is not using index but the exact same
> >> query alone does !
>
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > But it's not the same query, is it?  With "LIMIT $1" the planner can't
> > know what the limit value is exactly, so it has to generate a plan that
> > won't be too unreasonable for either a small or a large limit.
>
> Ok. So the query is optimized once and not each time.. I understand
> now.
>
> But, since I "know" better that PostgreSQL that query must use index
> in most of case, can I force in some manner the function when
> declaring it to take this in account ?

You could define two functions - one for small sets with constant LIMITs
(maybe 50) in UNION parts, and another with $1. Then use accordingly.

> I suppose (not tested) that
> setting enable_seqscan just before will probably do it, but what about
> dump/restore of the database when recreating the function and keep
> this "fix" automatically ?

-------------
Hannu


On Wed, Apr 23, 2003 at 07:53:55PM +0200, Frederic Jolliton wrote:
> CREATE FUNCTION get_info (integer) RETURNS SETOF type_get_info
>     AS '...' <- here the query show below, where 'LIMIT $1' is used instead of 'LIMIT 10'
>     LANGUAGE sql;

You should probably define the function to be STABLE.

     LANGUAGE sql STABLE;

See
http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=sql-createfunction.html
for more info.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"