On 2021-02-16 19:30:23 +0100, Paul van der Linden wrote:
> I have 2 functions:
>
> CREATE FUNCTION foo(a text, b text, c text) RETURNS text AS
> $func$
> DECLARE
> retVal text;
> BEGIN
> SELECT
> CASE
> ... snip long list containing various tests on a,b and c
> END INTO retval;
> RETURN retVal;
> END
> $func$ LANGUAGE PLPGSQL IMMUTABLE;
>
> CREATE FUNCTION bar(r text, geom geometry) RETURNS int AS
> $func$
> DECLARE
> retVal int;
> BEGIN
> SELECT
> CASE
> WHEN r='r1' AND st_area(geom)>100 THEN 1
> ... snip long list containing various tests on r and st_area(geom)
> END INTO retval;
> RETURN retVal;
> END
> $func$ LANGUAGE PLPGSQL IMMUTABLE;
>
> and a large table t (100M+ records) with columns a, b, c and geom running on PG
> 11, on spinning disks with 64GB memory and 28 cores.
>
> When I create a simple geom index with CREATE INDEX ON t USING gist(geom) it
> finishes in about an hour, but when I create a partial index using these 2
> functions
> CREATE INDEX ON t USING gist(geom) WHERE bar(foo(a,b,c),geom)<12 it takes over
> 20 hours...
It has to call these functions for each of those 100M+ lines. So that's
about 0.7 ms per line. Is that to be expected for what those functions
do? I don't know.
As depesz wrote, use smaller test case, like 1000 or 10000 rows. You can
test the speed of those functions in isolation. Compare
select a, b, c from t;
select foo(a, b, c) from t;
select bar(foo(a, b, c), geom) from t;
That tells you how much time is spent in foo and bar. If the sum is
close to those 0.7 ms, you know that you need to make those functions
faster.
I think SQL doesn't short-circuit, so in your function bar, that
st_area(geom) in the first WHEN clause will always be called, regardless
of the value of r. You can either call that once (as depesz suggested)
or you can avoid calling it by nesting the cases:
case
when r = 'r1' then
case
when st_area(geom) > 100 then 1
else 2
end
when r = 'r2' then
....
end
You can then go one step furthe:
case
when r like 'r_' then
-- handle r1 ... r9
when r like 'r1_' then
-- handle r10 ... r19
when r like 'r2_' then
-- handle r20 ... r29
end
That reduces the average number of comparisons from 50+ to 8.
Depesz measured an overhead of 0.016ms per log message. That's low
enough that you can probably afford a few messages, even if each
function invocation only takes a few milliseconds. So definitely try
that if you need to know where your functions spend their time.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"