Re: immutable functions vs. join for lookups ? - Mailing list pgsql-performance

From Dawid Kuroczko
Subject Re: immutable functions vs. join for lookups ?
Date
Msg-id 758d5e7f05041802004ee56be4@mail.gmail.com
Whole thread Raw
In response to immutable functions vs. join for lookups ?  (Enrico Weigelt <weigelt@metux.de>)
List pgsql-performance
On 4/15/05, Enrico Weigelt <weigelt@metux.de> wrote:
> a) SELECT items.a, items.b, ..., users.username FROM items, users
>         WHERE items.uid = users.uid;
>
> c) CREATE FUNCTION id2username(oid) RETURNS text
>     LANGUAGE 'SQL' IMMUTABLE AS '
>         SELECT username AS RESULT FROM users WHERE uid = $1';

You will be told that this function is not immutable but stable, and this
is quite right.  But consider such a function:

CREATE OR REPLACE FUNCTION id2username (oid int) RETURNS TEXT AS $$
  BEGIN
    IF oid = 0 THEN RETURN 'foo';
    ELSIF oid = 1 THEN RETURN 'bar';
    END IF;
  END;
$$ LANGUAGE plpgsql IMMUTABLE;

versus a lookup table with similar data.  Logic suggests it should be faster
than a table...  It got me worried when I added: "RAISE WARNING 'Called'"
after begin and I got lots of "Called" warnings when using this IMMUTABLE
function in select...  And the timings for ~6000 values in aaa table
(and two values in lookup table) are:

There is a query, output of the EXPLAIN ANALYZE, Time of EXPLAIN
ANALYZE and "Real time" of SELECT (without EXPLAIN ANALYZE):

a) simple select from temp table, and a lookup cost:
  EXPLAIN ANALYZE SELECT n FROM aaa;
 Seq Scan on aaa  (cost=0.00..87.92 rows=5992 width=4) (actual
time=0.011..24.849 rows=6144 loops=1)
 Total runtime: 51.881 ms
(2 rows)
Time: 52,882 ms
Real time: 16,261 ms

  EXPLAIN ANALYZE SELECT id2username(n) FROM aaa limit 2;
Limit  (cost=0.00..0.03 rows=2 width=4) (actual time=0.111..0.150
rows=2 loops=1)
   ->  Seq Scan on aaa  (cost=0.00..104.80 rows=6144 width=4) (actual
time=0.102..0.129 rows=2 loops=1)
 Total runtime: 0.224 ms
(3 rows)
Time: 1,308 ms
Real time: 1,380 ms

b) natural join with lookup table:
  EXPLAIN ANALYZE SELECT username FROM aaa NATURAL JOIN lookup;
 Hash Join  (cost=2.45..155.09 rows=3476 width=32) (actual
time=0.306..83.677 rows=6144 loops=1)
   Hash Cond: ("outer".n = "inner".n)
   ->  Seq Scan on aaa  (cost=0.00..87.92 rows=5992 width=4) (actual
time=0.006..25.517 rows=6144 loops=1)
   ->  Hash  (cost=2.16..2.16 rows=116 width=36) (actual
time=0.237..0.237 rows=0 loops=1)
         ->  Seq Scan on lookup  (cost=0.00..2.16 rows=116 width=36)
(actual time=0.016..0.034 rows=2 loops=1)
 Total runtime: 107.378 ms
(6 rows)
Time: 109,040 ms
Real time: 25,364 ms

c) IMMUTABLE "static" lookup function:
  EXPLAIN ANALYZE SELECT id2username(n) FROM aaa;
Seq Scan on aaa  (cost=0.00..104.80 rows=6144 width=4) (actual
time=0.089..116.397 rows=6144 loops=1)
 Total runtime: 143.800 ms
(2 rows)
Time: 144,869 ms
Real time: 102,428 ms

d) self-join with a function ;)
  EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
aaa USING (n);
 Hash Left Join  (cost=506.82..688.42 rows=6144 width=36) (actual
time=102.382..182.661 rows=6144 loops=1)
   Hash Cond: ("outer".n = "inner".n)
   ->  Seq Scan on aaa  (cost=0.00..89.44 rows=6144 width=4) (actual
time=0.012..24.360 rows=6144 loops=1)
   ->  Hash  (cost=506.82..506.82 rows=2 width=36) (actual
time=102.217..102.217 rows=0 loops=1)
         ->  Subquery Scan v_lookup  (cost=476.05..506.82 rows=2
width=36) (actual time=53.626..102.057 rows=2 loops=1)
               ->  Subquery Scan "values"  (cost=476.05..506.80 rows=2
width=4) (actual time=53.613..102.023 rows=2 loops=1)
                     ->  Unique  (cost=476.05..506.77 rows=2 width=4)
(actual time=53.456..101.772 rows=2 loops=1)
                           ->  Sort  (cost=476.05..491.41 rows=6144
width=4) (actual time=53.440..76.710 rows=6144 loops=1)
                                 Sort Key: n
                                 ->  Seq Scan on aaa
(cost=0.00..89.44 rows=6144 width=4) (actual time=0.013..26.626
rows=6144 loops=1)
 Total runtime: 209.378 ms
(11 rows)
Time: 211,460 ms
Real time:  46,682 ms

...so this IMMUTABLE is twice as slow (~100 ms) as the query joining
itself with a SELECT DISTINCT on an IMMUTABLE function (~50 ms),
which is twice as slow as JOIN against lookup table (~25 ms), and I feel
this IMMUTABLE function could be around ~20 ms (~16 ms plus
calling the function two times plus giving the values).

Ah, and this is PostgreSQL 8.0.1 running under FreeBSD on a
CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU).

   Regards,
      Dawid

PS: I have a feeling that IMMUTABLE functions worked better in 7.4,
yet I am unable to confirm this.

pgsql-performance by date:

Previous
From: William Yu
Date:
Subject: Re: How to improve db performance with $7K?
Next
From: "Merlin Moncure"
Date:
Subject: Re: immutable functions vs. join for lookups ?