Re: 7.3 vs 7.2 - different query plan, bad performance - Mailing list pgsql-performance

From Manfred Koizar
Subject Re: 7.3 vs 7.2 - different query plan, bad performance
Date
Msg-id 373rev0fotgpq2osvlldf41b8flambulk9@4ax.com
Whole thread Raw
In response to Re: 7.3 vs 7.2 - different query plan, bad performance  (siaco@allegro.pl)
Responses Re: 7.3 vs 7.2 - different query plan, bad performance
List pgsql-performance
On Mon, 16 Jun 2003 08:38:50 +0200, siaco@allegro.pl wrote:
>[After VACUUM ANALYSE ...] I don't see a big difference:
>
>siaco=# explain analyze  select count(*) from v_c;
>                                                        QUERY PLAN

>---------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=210.83..210.83 rows=1 width=8) (actual time=5418.09..5418.09 rows=1 loops=1)
>   ->  Subquery Scan v_c  (cost=28.40..197.63 rows=5281 width=8) (actual time=4.59..5414.13 rows=5281 loops=1)
>         ->  Hash Join  (cost=28.40..197.63 rows=5281 width=8) (actual time=4.58..5407.73 rows=5281 loops=1)
>               Hash Cond: ("outer".id = "inner".id)
>               ->  Seq Scan on b t1  (cost=0.00..76.81 rows=5281 width=4) (actual time=0.01..9.68 rows=5281 loops=1)
>               ->  Hash  (cost=24.32..24.32 rows=1632 width=4) (actual time=3.29..3.29 rows=0 loops=1)
>                     ->  Seq Scan on a t2  (cost=0.00..24.32 rows=1632 width=4) (actual time=0.01..1.88 rows=1632
loops=1)
>               SubPlan
>                 ->  Aggregate  (cost=28.41..28.41 rows=1 width=0) (actual time=1.02..1.02 rows=1 loops=5281)
>                       ->  Seq Scan on a t3  (cost=0.00..28.40 rows=3 width=0) (actual time=0.76..1.01 rows=1
loops=5281)
>                             Filter: (parent_id = $0)
> Total runtime: 5433.65 msec

Ok, now we have something to work on.

.)  I guess you are not really interested in

    SELECT count(*) FROM v_c;

If you were, you would simply

    SELECT count(*) from b;

Try

    EXPLAIN ANALYSE SELECT * FROM v_c;

and you will see that 7.2 produces a plan that is almost equal to that
produced by 7.3.

.)  Without any index a seq scan is the best you can get.  A scan of a
takes only 1 ms, but doing it 5000 times gives 5 seconds.  Try

    CREATE INDEX a_parent ON a(parent_id);

.)  Wouldn't

CREATE VIEW v_c AS
SELECT t1.id, count(t3.id) AS children_count
  FROM (b t1 LEFT JOIN a t2 ON (t1.id = t2.id))
       LEFT JOIN a t3 ON (t3.parent_id = t2.id)
 GROUP BY t1.id;

give the same results as your view definition with the subselect?  And
under some assumptions about your data even

CREATE VIEW v_c AS
SELECT b.id, count(a.id) AS children_count
  FROM b
       LEFT JOIN a ON (a.parent_id = b.id)
 GROUP BY b.id;

might work.  But I think I don't understand your requirements.  Why
are you not interested in the children_count for an id that doesn't
have a parent itself?

.)  To answer your original question:  The difference seems to be that
7.2 does not evaluate the subselect in the SELECT list, when you are
only asking for count(*).

Servus
 Manfred

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: left join performance problem
Next
From: Ryszard Lach
Date:
Subject: Re: 7.3 vs 7.2 - different query plan, bad performance