Re: osdl-dbt3 run results - puzzled by the execution plans - Mailing list pgsql-performance

From Manfred Koizar
Subject Re: osdl-dbt3 run results - puzzled by the execution plans
Date
Msg-id 3c6mmvcmvlr59kng6tnee2a0q4vf4spctt@email.aon.at
Whole thread Raw
In response to osdl-dbt3 run results - puzzled by the execution plans  (Jenny Zhang <jenny@osdl.org>)
Responses Re: osdl-dbt3 run results - puzzled by the execution
List pgsql-performance
On Thu, 18 Sep 2003 15:36:50 -0700, Jenny Zhang <jenny@osdl.org>
wrote:
>We thought the large effective_cache_size should lead us to better
>plans. But we found the opposite.

The common structure of your query plans is:

 Sort
   Sort Key: sum((partsupp.ps_supplycost * partsupp.ps_availqty))
   InitPlan
     ->  Aggregate
           ->  SubPlan
   ->  Aggregate
         Filter: (sum((ps_supplycost * ps_availqty)) > $0)
         ->  Group
               ->  Sort
                     Sort Key: partsupp.ps_partkey
                     ->  SubPlan (same as above)

where the SubPlan is

 ->  Merge Join  (cost=519.60..99880.05 rows=32068 width=65)
                 (actual time=114.78..17435.28 rows=30400 loops=1)
                 ctr=5.73
       Merge Cond: ("outer".ps_suppkey = "inner".s_suppkey)
       ->  Index Scan using i_ps_suppkey on partsupp
                 (cost=0.00..96953.31 rows=801712 width=34)
                 (actual time=0.42..14008.92 rows=799361 loops=1)
                 ctr=6.92
       ->  Sort  (cost=519.60..520.60 rows=400 width=31)
                 (actual time=106.88..143.49 rows=30321 loops=1)
                 ctr=3.63
             Sort Key: supplier.s_suppkey
             ->  SubSubPlan

for large effective_cache_size and

 ->  Nested Loop  (cost=0.00..130168.30 rows=32068 width=65)
                  (actual time=0.56..1374.41 rows=30400 loops=1)
                  ctr=94.71
       ->  SubSubPlan
       ->  Index Scan using i_ps_suppkey on partsupp
                 (cost=0.00..323.16 rows=80 width=34)
                 (actual time=0.16..2.98 rows=80 loops=380)
                 ctr=108.44
             Index Cond: (partsupp.ps_suppkey = "outer".s_suppkey)

for small effective_cache_size.  Both subplans have an almost
identical subsubplan:

->  Nested Loop  (cost=0.00..502.31 rows=400 width=31)
                 (actual time=0.23..110.51 rows=380 loops=1)
                 ctr=4.55
      Join Filter: ("inner".s_nationkey = "outer".n_nationkey)
      ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=10)
                              (actual time=0.08..0.14 rows=1 loops=1)
                              ctr=9.36
            Filter: (n_name = 'ETHIOPIA'::bpchar)
      ->  Seq Scan on supplier (cost=0.00..376.00 rows=10000 width=21)
                          (actual time=0.10..70.72 rows=10000 loops=1)
                               ctr=5.32

I have added the ctr (cost:time ratio) for each plan node.  These
values are mostly between 5 and 10 with two notable exceptions:

1)     ->  Sort  (cost=519.60..520.60 rows=400 width=31)
                 (actual time=106.88..143.49 rows=30321 loops=1)
                 ctr=3.63

It has already been noticed by Matt Clark that this is the only plan
node where the row count estimation looks wrong.  However, I don't
believe that this has great influence on the total cost of the plan,
because the ctr is not far from the usual range and if it were a bit
higher, it would only add a few hundred cost units to a branch costing
almost 100000 units.  BTW I vaguely remember that there is something
strange with the way actual rows are counted inside a merge join.
Look at the branch below this plan node:  It shows an actual row count
of 380.

2)     ->  Index Scan using i_ps_suppkey on partsupp
                 (cost=0.00..323.16 rows=80 width=34)
                 (actual time=0.16..2.98 rows=80 loops=380)
                 ctr=108.44

Here we have the only plan node where loops > 1, and it is the only
one where the ctr is far off.  The planner computes the cost for one
loop and multiplies it by the number of loops (which it estimates
quite accurately to be 400), thus getting a total cost of ca. 130000.
We have no reason to believe that the single loop cost is very far
from reality (for a *single* index scan), but the planner does not
account for additional index scans hitting pages in the cache that
have been brought in by preceding scans.  This is a known problem, Tom
has mentioned it several times, IIRC.

Now I'm very interested in getting a better understanding of this
problem, so could you please report the results of

. \d i_ps_suppkey

. VACUUM VERBOSE ANALYSE partsupp;
  VACUUM VERBOSE ANALYSE supplier;

. SELECT attname, null_frac, avg_witdh, n_distinct, correlation
    FROM pg_stats
   WHERE tablename = 'partsupp' AND attname IN ('ps_suppkey', ...);

  Please insert other interesting column names for ..., especially
  those contained in i_ps_suppkey, if any.

. SELECT relname, relpages, reltuples
    FROM pg_class
   WHERE relname IN ('partsupp', 'supplier', ...);
                                             ^^^
                    Add relevant index names here.

. EXPLAIN ANALYSE
  SELECT ps_partkey, ps_supplycost, ps_availqty
    FROM partsupp, supplier
   WHERE ps_suppkey = s_suppkey AND s_nationkey = '<youknowit>';

  The idea is to eliminate parts of the plan that are always the same.
  Omitting nation is possibly to much a simplification.  In this case
  please re-add it.
  Do this test for small and large effective_cache_size.
  Force the use of other join methods by setting enable_<joinmethod>
  to off.  Post all results.


Jenny, I understand that this long message contains more questions
than answers and is not of much help for you.  OTOH your tests might
be very helpful for Postgres development ...

Servus
 Manfred

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: osdl-dbt3 run results - puzzled by the execution plans
Next
From: Jenny Zhang
Date:
Subject: Re: osdl-dbt3 run results - puzzled by the execution