Thread: optimizing pl/pgsql function

optimizing pl/pgsql function

From
"Uqtous"
Date:
I have a function that determines the distance between zip codes using
latitude and longitude values. This function is called by a query that
searches for members with x miles of the requesting member. The code works,
but is very slow! Any optimization tips would be appreciated!

I'm running PG 7.1.3...

-- Function: zipdist(int4, int4)
CREATE FUNCTION "zipdist"("int4", "int4") RETURNS "numeric" AS 'DECLARE

from_lat NUMERIC;
from_long NUMERIC;
to_lat NUMERIC;
to_long NUMERIC;
xcoord NUMERIC;
ycoord NUMERIC;

BEGIN

SELECT INTO from_lat tlkpZip.zipLatitude::numeric FROM tlkpZip WHERE
tlkpZip.zipCode=$1;
SELECT INTO from_long tlkpZip.zipLongitude::numeric FROM tlkpZip WHERE
tlkpZip.zipCode=$1;
SELECT INTO to_lat tlkpZip.zipLatitude::numeric FROM tlkpZip WHERE
tlkpZip.zipCode=$2;
SELECT INTO to_long tlkpZip.zipLongitude::numeric FROM tlkpZip WHERE
tlkpZip.zipCode=$2;

xcoord:=(69.1::numeric*(to_lat::numeric-from_lat::numeric)::numeric);
ycoord:=(69.1::numeric*((to_long::numeric-from_long::numeric)::numeric*cos(f
rom_lat/57.3::numeric)::numeric));

RETURN
sqrt((xcoord::numeric*xcoord::numeric)::numeric+(ycoord::numeric*ycoord::num
eric)::numeric)::numeric;

END;' LANGUAGE 'plpgsql';

And the query that calls the function; "12345" is the requesting member's
zip, "5" is the preferred distance, and "1" is the requesting member's ID:

SELECT tblmbr.*
FROM tblmbr
WHERE zipdist(12345,tblmbr.mbrzipcode)<=5
AND tblmbr.mbrid <> 1
ORDER BY tblmbr.mbrusername;






Re: optimizing pl/pgsql function

From
Masaru Sugawara
Date:
On Tue, 25 Jun 2002 17:46:51 -0400
"Uqtous" <Ubqtous@hotmail.com> wrote:

> I have a function that determines the distance between zip codes using
> latitude and longitude values. This function is called by a query that
> searches for members with x miles of the requesting member. The code works,
> but is very slow! Any optimization tips would be appreciated!

 ...

> SELECT INTO from_lat tlkpZip.zipLatitude::numeric FROM tlkpZip WHERE
> tlkpZip.zipCode=$1;
> SELECT INTO from_long tlkpZip.zipLongitude::numeric FROM tlkpZip WHERE
> tlkpZip.zipCode=$1;
> SELECT INTO to_lat tlkpZip.zipLatitude::numeric FROM tlkpZip WHERE
> tlkpZip.zipCode=$2;
> SELECT INTO to_long tlkpZip.zipLongitude::numeric FROM tlkpZip WHERE
> tlkpZip.zipCode=$2;


These operations  seem to be a main cause of the time loss. In view of
the speed, it is better to reduce the frequencies of the table access.
As for what you would like to get there, the SELECTs in your function
can be moved into the SQL statement. But I don't have real data that
include zip codes, could you confirm the following for the performance
and appropriateness?


-- Function: zipdist(int4, int4, int4, int4)
CREATE FUNCTION zipdist(int4, int4, int4, int4) RETURNS numeric AS '
DECLARE
   xcoord numeric;
   ycoord numeric;
BEGIN
   xcoord := ($1 - $2)::numeric;
   ycoord := ($3 - $4)::numeric * cos($2 / 57.3::numeric)::numeric;
   RETURN sqrt(xcoord * xcoord + ycoord * ycoord) * 69.1::numeric
END;
' LANGUAGE 'plpgsql';


-- If zipCode has no index
CREATE INDEX idx_tlkpZip_zipCode ON tlkpZip(zipCode);


SELECT t.*
FROM (SELECT tblmbr.*,
             tlkpZip.zipLatitude,
             tlkpZip.zipLongitude
        FROM tblmbr LEFT OUTER JOIN tlkpZip USING(zipCode)
     ) AS t
WHERE zipdist(t.zipLatitude,
              (SELECT zipLatitude FROM tlkpZip WHERE zipCode = 12345),
              t.zipLongitude,
              (SELECT zipLongitude FROM tlkpZip WHERE zipCode = 12345)
             ) <= 5
      AND t.mbrid <> 1
ORDER BY t.mbrusername;


-- example of the output of EXPLAIN in 7.2.1
Sort  (cost=303.52..303.52 rows=1667 width=52)
  InitPlan
    ->  Index Scan using idx_tlkpzip_zipcode on tlkpzip
                           (cost=0.00..17.07 rows=5 width=4)
    ->  Index Scan using idx_tlkpzip_zipcode on tlkpzip
                            (cost=0.00..17.07 rows=5 width=4)
  ->  Merge Join  (cost=69.83..214.33 rows=1667 width=52)
        ->  Index Scan using idx_tlkpzip_zipcode on tlkpzip
                               (cost=0.00..52.00 rows=1000 width=12)
        ->  Sort  (cost=69.83..69.83 rows=1000 width=40)
              ->  Seq Scan on tblmbr
                                  (cost=0.00..20.00 rows=1000 width=40)



Regards,
Masaru Sugawara