Processing of subqueries in union - Mailing list pgsql-performance

From Markus Hervén
Subject Processing of subqueries in union
Date
Msg-id 51304B61.5030103@gmail.com
Whole thread Raw
List pgsql-performance
I am currently looking at a performance issue where a subquery is
executed even though I cannot see the result ever being asked for or
used. Not even google has helped me find any information about this one.
I have created a simple test case to demonstrate this.

CREATE TABLE test1 (t1 INT);
INSERT INTO test1 VALUES(1);
CREATE TABLE test2 (t2 INT);
INSERT INTO test2 VALUES(1);
CREATE TABLE test3 (t1 INT, cnt INT);
INSERT INTO test3 VALUES(3, 3);
CREATE VIEW tv AS SELECT t1, (SELECT COUNT(*) FROM test2 WHERE t1=t2) AS
CNT FROM test1;
CREATE VIEW tv2 AS SELECT t1, (SELECT COUNT(*) FROM test2 WHERE t1=t2)
AS CNT FROM test1 UNION ALL SELECT t1, cnt FROM test3;
EXPLAIN ANALYZE SELECT t1 FROM tv;
                                             QUERY PLAN
---------------------------------------------------------------------------------------------------
  Seq Scan on test1  (cost=0.00..34.00 rows=2400 width=4) (actual
time=0.005..0.006 rows=1 loops=1)
  Total runtime: 0.033 ms
(2 rows)
EXPLAIN ANALYZE SELECT t1 FROM tv2;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on tv2  (cost=0.00..96252.20 rows=4540 width=4) (actual
time=0.026..0.042 rows=2 loops=1)
    ->  Append  (cost=0.00..96206.80 rows=4540 width=6) (actual
time=0.024..0.036 rows=2 loops=1)
          ->  Seq Scan on test1  (cost=0.00..96130.00 rows=2400 width=4)
(actual time=0.023..0.024 rows=1 loops=1)
                SubPlan 1
                  ->  Aggregate  (cost=40.03..40.04 rows=1 width=0)
(actual time=0.013..0.014 rows=1 loops=1)
                        ->  Seq Scan on test2  (cost=0.00..40.00 rows=12
width=0) (actual time=0.005..0.006 rows=1 loops=1)
                              Filter: (test1.t1 = t2)
          ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..52.80 rows=2140
width=8) (actual time=0.005..0.007 rows=1 loops=1)
                ->  Seq Scan on test3  (cost=0.00..31.40 rows=2140
width=8) (actual time=0.002..0.004 rows=1 loops=1)
  Total runtime: 0.089 ms
(10 rows)

As can be seen in the last explain it does a sequential scan of table
test2 even though this is not needed. It will perform the scan once for
each row in table test1. No scan of test2 is done if there is not a
union in the view. I cannot see any reason for this happening and am
guessing that the query planner does not know that it can get rid of the
subquery. Is there anything I can do to rid of the subquery myself?

The postgresql version I am using is:
opsspace=# select version();
version
-------------------------------------------------------------------------------------------------------------
  PostgreSQL 9.2.3 on x86_64-redhat-linux-gnu, compiled by gcc (GCC)
4.7.2 20121109 (Red Hat 4.7.2-8), 64-bit
(1 row)


Many thanks
Markus


pgsql-performance by date:

Previous
From: Ao Jianwang
Date:
Subject: Re: SELECT is slow on smaller table?
Next
From: Steven Crandell
Date:
Subject: hardware upgrade, performance degrade?