Thread: Performance problem with UNION ALL view and domains

Performance problem with UNION ALL view and domains

From

Dean Rasheed

Date:

23 November 2007, 09:29:50

Hi,

I am having a performance problem trying to query a view which is a
UNION ALL of 2 tables. I have narrowed the problem down to my use of
DOMAINS in the underlying table. So in the test-case below, when the
column "a" is of domain type foo_text, the query runs slowly using
the following plan:

 Subquery Scan foo_v  (cost=0.00..798.00 rows=100 width=64) (actual time=0.049..24.763 rows=2 loops=1)
   Filter: (a = (('foo34'::text)::foo_text)::text)
   ->  Append  (cost=0.00..548.00 rows=20000 width=20) (actual time=0.007..20.338 rows=20000 loops=1)
         ->  Subquery Scan "*SELECT* 1"  (cost=0.00..274.00 rows=10000 width=20) (actual time=0.006..7.341 rows=10000
loops=1)
               ->  Seq Scan on foo  (cost=0.00..174.00 rows=10000 width=20) (actual time=0.004..2.366 rows=10000
loops=1)
         ->  Subquery Scan "*SELECT* 2"  (cost=0.00..274.00 rows=10000 width=10) (actual time=0.009..6.536 rows=10000
loops=1)
               ->  Seq Scan on foo  (cost=0.00..174.00 rows=10000 width=10) (actual time=0.007..2.746 rows=10000
loops=1)
 Total runtime: 24.811 ms

However, when the column type is text, the query runs fast as I
would expect, using the PK index:

 Result  (cost=0.00..16.55 rows=2 width=64) (actual time=0.015..0.025 rows=2 loops=1)
   ->  Append  (cost=0.00..16.55 rows=2 width=64) (actual time=0.014..0.023 rows=2 loops=1)
         ->  Index Scan using foo_pkey on foo  (cost=0.00..8.27 rows=1 width=20) (actual time=0.014..0.014 rows=1
loops=1)
               Index Cond: (a = (('foo34'::text)::foo_text)::text)
         ->  Index Scan using foo_pkey on foo  (cost=0.00..8.27 rows=1 width=10) (actual time=0.007..0.008 rows=1
loops=1)
               Index Cond: (a = (('foo34'::text)::foo_text)::text)
 Total runtime: 0.065 ms

(PostgreSQL 8.2.5)

Any ideas?

Thanks, Dean



CREATE OR REPLACE FUNCTION setup()
RETURNS void AS
$$
DECLARE
  val int;
BEGIN
  DROP TABLE IF EXISTS foo CASCADE;
  DROP DOMAIN IF EXISTS foo_text;

  CREATE DOMAIN foo_text text;-- CONSTRAINT tt_check CHECK (VALUE LIKE 'foo%');

  CREATE TABLE foo
  (
    a foo_text PRIMARY KEY,
    b text
  );

  val := 0;
  WHILE val < 10000 LOOP
    INSERT INTO foo VALUES('foo'||val, 'bar'||val);
    val := val+1;
  END LOOP;

  CREATE VIEW foo_v AS
    (SELECT a,b from foo) UNION ALL (SELECT a,NULL::text AS b FROM foo);
END;
$$ LANGUAGE plpgsql;

SELECT setup();
ANALYZE foo;

EXPLAIN ANALYZE SELECT * FROM foo_v WHERE a='foo34'::foo_text;

_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com

Re: Performance problem with UNION ALL view and domains

From

"Jeff Larsen"

Date:

23 November 2007, 12:29:15

On Nov 23, 2007 7:29 AM, Dean Rasheed <dean_rasheed@hotmail.com> wrote:
> I am having a performance problem trying to query a view which is a
> UNION ALL of 2 tables. I have narrowed the problem down to my use of
> DOMAINS in the underlying table. So in the test-case below, when the
> column "a" is of domain type foo_text, the query runs slowly using
> the following plan:

I don't know much about DOMAINS, but I did learn somethings about
views, unions and where conditions when I posted a similar performance
question. The best answer was, of course, from Tom Lane here:

http://archives.postgresql.org/pgsql-performance/2007-11/msg00041.php

In my case, the data types in each segment of the union were not
originally identical, preventing the planner from efficiently pushing
the qualifications down to the individual segments prior to the union.

In your case the use of a DOMAIN type may be one of those 'special
cases' forcing the planner to perform the union first, then apply the
conditions.

Jeff

Re: Performance problem with UNION ALL view and domains

From

Tom Lane

Date:

23 November 2007, 13:42:05

"Jeff Larsen" <jlar310@gmail.com> writes:
> On Nov 23, 2007 7:29 AM, Dean Rasheed <dean_rasheed@hotmail.com> wrote:
>> I am having a performance problem trying to query a view which is a
>> UNION ALL of 2 tables. I have narrowed the problem down to my use of
>> DOMAINS in the underlying table.

> In my case, the data types in each segment of the union were not
> originally identical, preventing the planner from efficiently pushing
> the qualifications down to the individual segments prior to the union.

> In your case the use of a DOMAIN type may be one of those 'special
> cases' forcing the planner to perform the union first, then apply the
> conditions.

It looks like the problem is that the UNION is taken as producing plain
text output, as you can see with \d:

regression=# \d foo
      Table "public.foo"
 Column |   Type   | Modifiers
--------+----------+-----------
 a      | foo_text | not null
 b      | text     |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (a)

regression=# \d foo_v
    View "public.foo_v"
 Column | Type | Modifiers
--------+------+-----------
 a      | text |
 b      | text |
View definition:
 SELECT foo.a, foo.b
   FROM foo
UNION ALL
 SELECT foo.a, NULL::text AS b
   FROM foo;

Tracing through the code, I see that this happens because
select_common_type() smashes all domains to base types before doing
anything else.  So even though all the inputs are in fact the same
domain type, you end up with the base type as the UNION result type.

Possibly that could be improved sometime, but we certainly wouldn't try
to change it in an existing release branch...

            regards, tom lane

Re: Performance problem with UNION ALL view and domains

From

Dean Rasheed

Date:

24 November 2007, 08:06:13

> It looks like the problem is that the UNION is taken as producing plain
> text output, as you can see with \d:
>
> regression=# \d foo
> Table "public.foo"
> Column | Type | Modifiers
> --------+----------+-----------
> a | foo_text | not null
> b | text |
> Indexes:
> "foo_pkey" PRIMARY KEY, btree (a)
>
> regression=# \d foo_v
> View "public.foo_v"
> Column | Type | Modifiers
> --------+------+-----------
> a | text |
> b | text |
> View definition:
> SELECT foo.a, foo.b
> FROM foo
> UNION ALL
> SELECT foo.a, NULL::text AS b
> FROM foo;
>
> Tracing through the code, I see that this happens because
> select_common_type() smashes all domains to base types before doing
> anything else. So even though all the inputs are in fact the same
> domain type, you end up with the base type as the UNION result type.
>
> Possibly that could be improved sometime, but we certainly wouldn't try
> to change it in an existing release branch...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq


Thanks for your replies. It looks like I can cure the performance problem by casting to the
base type in the view definition:

CREATE VIEW foo_v AS SELECT a::text,b from foo UNION ALL SELECT a::text,NULL:\
:text AS b FROM foo;

Interestingly though, if I cast back to the domain type after taking the union, then the
view has the correct type, but the performance problem comes back in a different way:

CREATE VIEW foo_v AS SELECT foo_u.a::foo_text, foo_u.b FROM
(SELECT a::text,b from foo UNION ALL SELECT a::text,NULL::text AS b FROM foo) as foo_u;

lookup=> \d foo_v
      View "public.foo_v"
 Column |   Type   | Modifiers
--------+----------+-----------
 a      | foo_text |
 b      | text     |
View definition:
 SELECT foo_u.a::foo_text AS a, foo_u.b
   FROM ( SELECT foo.a::text AS a, foo.b
           FROM foo
UNION ALL
         SELECT foo.a::text AS a, NULL::text AS b
           FROM foo) foo_u;

 Result  (cost=0.00..399.00 rows=100 width=64) (actual time=0.023..6.777 rows=2 loops=1)
   ->  Append  (cost=0.00..399.00 rows=100 width=64) (actual time=0.022..6.775 rows=2 loops=1)
         ->  Seq Scan on foo  (cost=0.00..199.00 rows=50 width=20) (actual time=0.022..3.409 rows=1 loops=1)
               Filter: ((((a)::text)::foo_text)::text = (('foo34'::text)::foo_text)::text)
         ->  Seq Scan on foo  (cost=0.00..199.00 rows=50 width=10) (actual time=0.016..3.364 rows=1 loops=1)
               Filter: ((((a)::text)::foo_text)::text = (('foo34'::text)::foo_text)::text)
 Total runtime: 6.849 ms

So the planner has been able to push the condition down into the bottom tables, but it
can't use the PK index. Is this because of all the casts?

Dean.

_________________________________________________________________
100’s of Music vouchers to be won with MSN Music
https://www.musicmashup.co.uk