Thread: Performance problem (outer join + view + non-strict functions)‏

Performance problem (outer join + view + non-strict functions)‏

From
Dean Rasheed
Date:
Hi,

I am having performance problems running a number of queries
involving views based on non-strict functions. I have reproduced the
problem with the simple test-case below which shows how the query plan
is different depending on whether the view uses strict or non-strict
functions (even though those columns do not appear in the WHERE
clause).

CREATE OR REPLACE FUNCTION times_ten_strict(int) RETURNS int
  AS 'SELECT $1*10' LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION times_ten_nonstrict(int) RETURNS int
  AS 'SELECT COALESCE($1*10, 0)' LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION setup()
RETURNS void AS
$$
DECLARE
  val int;
BEGIN
  DROP TABLE IF EXISTS t1 CASCADE;
  DROP TABLE IF EXISTS t2 CASCADE;

  CREATE TABLE t1
  (
    a1 int PRIMARY KEY,
    b1 int
  );

  val := 0;
  WHILE val < 10000 LOOP
    INSERT INTO t1 VALUES(val, val);
    val := val+1;
  END LOOP;

  CREATE TABLE t2
  (
    a2 int PRIMARY KEY
  );

  val := 0;
  WHILE val < 10000 LOOP
    INSERT INTO t2 VALUES(val);
    val := val+1;
  END LOOP;

  CREATE VIEW v2_strict AS SELECT a2, times_ten_strict(a2) AS b2 FROM t2;
  CREATE VIEW v2_nonstrict AS SELECT a2, times_ten_nonstrict(a2) AS b2 FROM t2;
END;
$$ LANGUAGE plpgsql;

SELECT setup();
ANALYZE t1;
ANALYZE t2;

EXPLAIN ANALYZE SELECT * FROM t1 LEFT OUTER JOIN v2_strict v2 ON v2.a2=t1.b1 WHERE t1.a1=3;
EXPLAIN ANALYZE SELECT * FROM t1 LEFT OUTER JOIN v2_nonstrict v2 ON v2.a2=t1.b1 WHERE t1.a1=3;

(I know that I don't really need a left outer join in this example,
but my real data does, and it suffers from the same performance
problem, but worse because there is more data and the joins are more
complex.)

The first query, from the view using a strict function has the
expected plan:

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..16.55 rows=1 width=12) (actual time=0.044..0.055 rows=1 loops=1)
   ->  Index Scan using t1_pkey on t1  (cost=0.00..8.27 rows=1 width=8) (actual time=0.015..0.017 rows=1 loops=1)
         Index Cond: (a1 = 3)
   ->  Index Scan using t2_pkey on t2  (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.016 rows=1 loops=1)
         Index Cond: (t2.a2 = t1.b1)
 Total runtime: 0.182 ms


However, the second query, which is almost identical except that one
of the columns being returned uses a non-strict function, has the
following plan:

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..413.27 rows=1 width=16) (actual time=0.057..47.511 rows=1 loops=1)
   Join Filter: (v2.a2 = t1.b1)
   ->  Index Scan using t1_pkey on t1  (cost=0.00..8.27 rows=1 width=8) (actual time=0.012..0.019 rows=1 loops=1)
         Index Cond: (a1 = 3)
   ->  Seq Scan on t2  (cost=0.00..180.00 rows=10000 width=4) (actual time=0.016..26.217 rows=10000 loops=1)
 Total runtime: 47.644 ms


Rather than using the primary key on t2, it does a full table
scan. With multiple joins, this starts doing nested full table scans
and becomes very inefficient, especially when the tables are much
bigger than this.

Both functions have a volatility of IMMUTABLE, and I don't understand
why the strictness of the function should affect the query plan.

Any ideas what is going on?

Thanks,

Dean.

_________________________________________________________________
100’s of Music vouchers to be won with MSN Music
https://www.musicmashup.co.uk

Re: Performance problem (outer join + view + non-strict functions)‏

From
Tom Lane
Date:
Dean Rasheed <dean_rasheed@hotmail.com> writes:
> I am having performance problems running a number of queries
> involving views based on non-strict functions. I have reproduced the
> problem with the simple test-case below which shows how the query plan
> is different depending on whether the view uses strict or non-strict
> functions (even though those columns do not appear in the WHERE
> clause).

Subqueries that produce non-nullable output columns can't be pulled up
underneath the nullable side of an outer join, because their output
values wouldn't go to NULL properly when expanding an unmatched row
from the other side of the join (see has_nullable_targetlist in
prepjointree.c).  In this context that means that we can't recognize
the option of using a inner indexscan for the table within the subquery.

I have some vague ideas about how to eliminate that restriction,
but don't hold your breath.  At the earliest it might happen in 8.4.

            regards, tom lane

RE: Performance problem (outer join + view + non-strict functions)‏

From
Dean Rasheed
Date:
Ah yes, I see the problem. I see that it is also going to be a problem where I have used CASE..WHEN in the select list
ofviews :-( 

Naively, couldn't the subquery be pulled up if any non-nullable columns from the right table t2 were automatically
wrappedin a simple function which returned NULL when the table row isn't matched (eg. when t2.ctid is NULL)? I'm a
completenewbie to Postgres, so I have no idea if this is really possible or how hard it would be to implement in
practice.

Dean.


>> I am having performance problems running a number of queries
>> involving views based on non-strict functions. I have reproduced the
>> problem with the simple test-case below which shows how the query plan
>> is different depending on whether the view uses strict or non-strict
>> functions (even though those columns do not appear in the WHERE
>> clause).
>
> Subqueries that produce non-nullable output columns can't be pulled up
> underneath the nullable side of an outer join, because their output
> values wouldn't go to NULL properly when expanding an unmatched row
> from the other side of the join (see has_nullable_targetlist in
> prepjointree.c). In this context that means that we can't recognize
> the option of using a inner indexscan for the table within the subquery.
>
> I have some vague ideas about how to eliminate that restriction,
> but don't hold your breath. At the earliest it might happen in 8.4.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com