Thread: ORDER BY user defined function performance issues

ORDER BY user defined function performance issues

From
Nick Trainor
Date:
I have an application which logs interactions on a regular basis.  The interaction details (their types, etc) are held
inone table (tblitem) and the 'hits' are held in tbltotal. 

I have written a function to get the total 'hits' during a period and need to collect together the information from
tblitemwith it. 

The query works OK returning results in under a second:

EXPLAIN ANALYSE SELECT
t1.value1,t1.value2,getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0')FROM tblitem
t1WHERE  t1.type_id=23::int2  and (t1.id >= 1::int8 and t1.id<=9223372036854775807::int8) 
OFFSET 0 LIMIT 20;
tracker-#                                                                QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..7.70 rows=20 width=56) (actual time=19.50..846.89 rows=20 loops=1)
   ->  Index Scan using tblitemtype_id on tblitem t1  (cost=0.00..230.10 rows=598 width=56) (actual time=19.49..846.81
rows=21loops=1) 
         Index Cond: (type_id = 23::smallint)
         Filter: ((id >= 1::bigint) AND (id <= 9223372036854775807::bigint))
 Total runtime: 847.04 msec

----
I realised that Postgresql did not like passing t1.id to the function without some form of constraints - hence the
(t1.id>= 1::int8 and t1.id<=9223372036854775807::int8) dummy constraints. 
----


However, when I seek to ORDER the results, then it takes 'forever':

EXPLAIN ANALYSE SELECT t1.value1,t1.value2,
getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0') FROM tblitem t1 WHERE
t1.type_id=23::int2 and (t1.id >= 1::int8 and t1.id<=9223372036854775807::int8) 
ORDER BY getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0') DESC
OFFSET 0 LIMIT 20;


tracker-# tracker-#
                                                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=257.66..257.71 rows=20 width=56) (actual time=25930.90..25930.95 rows=20 loops=1)
   ->  Sort  (cost=257.66..259.15 rows=598 width=56) (actual time=25930.90..25930.91 rows=21 loops=1)
         Sort Key: getday_total(1::smallint, 23::smallint, (id)::integer, 31::smallint, 59::smallint, 2::smallint,
2004::smallint,182::smallint, 153::smallint, 6::smallint, 2004::smallint, 0) 
         ->  Index Scan using tblitemtype_id on tblitem t1  (cost=0.00..230.10 rows=598 width=56) (actual
time=19.60..25927.68rows=693 loops=1) 
               Index Cond: (type_id = 23::smallint)
               Filter: ((id >= 1::bigint) AND (id <= 9223372036854775807::bigint))
 Total runtime: 25931.15 msec


And this is a database of only a few thousand rows, we are anticipating that this database is going to get huge.

What am I missing here?  How can I get it to order by the total of interactions without hitting the performance
problem?

Any help would be much appreciated.

Nick

nick A-T trainorthornton d-o-t co d-o-t uk



Version:
PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk)

Re: ORDER BY user defined function performance issues

From
Paul Thomas
Date:
On 11/06/2004 12:14 Nick Trainor wrote:
> [snip]
> However, when I seek to ORDER the results, then it takes 'forever':
>
> EXPLAIN ANALYSE SELECT t1.value1,t1.value2,
> getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0')
> FROM tblitem t1 WHERE  t1.type_id=23::int2  and (t1.id >= 1::int8 and
> t1.id<=9223372036854775807::int8)
> ORDER BY
> getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0')
> DESC
> OFFSET 0 LIMIT 20;

I expect that pg is having to evaluate your function every time it does a
compare within its sort. Something like
SELECT t1.value1,t1.value2,
     getday_total(..) AS foo
FROM tblitem t1 WHERE  t1.type_id=23::int2  and (t1.id >= 1::int8 and
t1.id<=9223372036854775807::int8)
ORDER BY foo

might work. Otherwise try selecting into a temp table then doing the order
by on that table.

HTH

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for
Business             |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: ORDER BY user defined function performance issues

From
Tom Lane
Date:
Nick Trainor <nick.trainor@trainorthornton.co.uk> writes:
> What am I missing here?

The ORDER BY query has to evaluate the function at *every* row of the
table before it can sort.  The other query was only evaluating the
function at twenty rows.

            regards, tom lane