I sent this message to pgsql-performance yesterday, but as of now (about 30
hours later) I haven't yet received a response. So I figured I'd see what
you guys think. Please help if you can. Thanks!
==========
I am having some problems with user-defined functions in version 7.3. The
planner seems to refuse to use an index that I have created when I define an
SQL function that should take advantage of it. The thing that is driving me
nuts is that if I take the SQL from the function definition and run it
exactly as it is, replacing the parameters with real values, then it does
use the index and performs beautifully. I never saw this problem until I
upgraded from 7.2.3 to 7.3.
At the bottom of this email, I have included a psql test input file and the
results. I have an index on zip_locs(dist1,dist2,dist3,dist4). I'm joining
a table of about 350,000 rows (mytable) against another table of about
42,000 rows (zip_locs) on a ZIP code. The ZIP fields in both tables are
indexed as well. The functions zip_dist[1234](varchar) return the
respective dist[1234] value for the given ZIP code. The zip_lat(varchar)
and zip_lng(varchar) functions return the latitude and longitude for the
given ZIP code, respectively. All these functions are immutable so they
have virtually no effect on the speed of the query. The point of the query
is to get a count of records in mytable that are within a certain distance
of a given ZIP code.
When I do the explicit SELECT, it uses the aforementioned index and then
filters on the result of the earth_distance(real,real,real,real) function.
When I run the radiuscount(varchar,real) function, it apparently does a
sequential scan instead of using the index.
I have tried rewriting this query every way I know how, but nothing seems to
work. Can anybody help me with this?
Here is the psql input file I'm using to demonstrate:
*******************
CREATE OR REPLACE FUNCTION radiuscount(varchar, real) RETURNS bigint AS
'
SELECT COUNT(*)
FROM mytable JOIN zip_locs ON zip = zip_code
WHERE dist1 BETWEEN zip_dist1($1) - $2::real AND zip_dist1($1) +
$2::real AND dist2 BETWEEN zip_dist2($1) - $2::real AND zip_dist2($1) +
$2::real AND dist3 BETWEEN zip_dist3($1) - $2::real AND zip_dist3($1) +
$2::real AND dist4 BETWEEN zip_dist4($1) - $2::real AND zip_dist4($1) +
$2::real AND earth_distance(zip_lat($1), zip_lng($1), lat, lng) < $2::real
' LANGUAGE 'SQL'
STABLE
RETURNS NULL ON NULL INPUT
;
\timing
\a
\t
\echo
\echo 'NOT using the function'
SELECT COUNT(*) AS radiuscount
FROM mytable JOIN zip_locs ON zip = zip_code
WHERE dist1 BETWEEN zip_dist1('30096') - 20::real AND
zip_dist1('30096') + 20::real AND dist2 BETWEEN zip_dist2('30096') - 20::real AND
zip_dist2('30096') + 20::real AND dist3 BETWEEN zip_dist3('30096') - 20::real AND
zip_dist3('30096') + 20::real AND dist4 BETWEEN zip_dist4('30096') - 20::real AND
zip_dist4('30096') + 20::real AND earth_distance(zip_lat('30096'), zip_lng('30096'), lat, lng) <
20::real
;
\echo
\echo 'Using the function'
select radiuscount('30096',20);
*******************
And here is the output:
*******************
CREATE FUNCTION
Timing is on.
Output format is unaligned.
Showing only tuples.
NOT using the function
2775
Time: 584.02 ms
Using the function
2775
Time: 11693.56 ms
*******************