Re: VACUUM ANALYZE slows down query - Mailing list pgsql-performance

From Tom Lane
Subject Re: VACUUM ANALYZE slows down query
Date
Msg-id 13064.1109113701@sss.pgh.pa.us
Whole thread Raw
In response to Re: VACUUM ANALYZE slows down query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I wrote:
> Well, with the increased (and much more accurate) rowcount estimate,
> the estimated cost of the nestloop naturally went up a lot: it's
> proportional to the number of rows involved.  It appears that the
> estimated cost of the mergejoin actually went *down* quite a bit
> (else it'd have been selected the first time too).  That seems odd to
> me.

Nah, I just can't count :-(.  What I forgot about was the sub-select in
the output list:

>> select ToolRepairRequest.RequestID, (Select
>> count(ToolHistory.HistoryID) from ToolHistory where
>> ToolRepairRequest.RepairID=ToolHistory.RepairID) as
>> CountOfTH

which shows up in the (un-analyzed) EXPLAIN output here:

   SubPlan
     ->  Aggregate  (cost=524.17..524.17 rows=1 width=4) (actual time=0.032..0.035 rows=1 loops=1518)
           ->  Index Scan using th_repair_key on toolhistory  (cost=0.00..523.82 rows=140 width=4) (actual
time=0.013..0.018rows=1 loops=1518) 
                 Index Cond: ($0 = repairid)

Now in this case the planner is estimating 79 rows out, so the estimated
cost of the nestloop plan includes a charge of 79*524.17 for evaluating
the subplan.  If we discount that then the estimated cost of the
nestloop plan is 3974.74..6645.99 (48055.42-79*524.17).

In the ANALYZEd case the subplan is estimated to be a lot cheaper:

   SubPlan
     ->  Aggregate  (cost=6.98..6.98 rows=1 width=4) (actual time=0.038..0.042 rows=1 loops=1518)
           ->  Index Scan using th_repair_key on toolhistory  (cost=0.00..6.97 rows=2 width=4) (actual
time=0.016..0.021rows=1 loops=1518) 
                 Index Cond: ($0 = repairid)

It's estimated to be needed 1533 times, but that still adds up to less
of a charge than before.  Discounting that, the mergejoin plan was
estimated at 18310.59..18462.10 (29162.44 - 1533*6.98).  So it's not
true that the estimated cost of the join went down in the ANALYZEd case.

Werner sent me a data dump off-list, and trawling through the planner I
got these numbers for the estimated costs without the output subquery:

without any statistics:
    mergejoin cost    9436.42 .. 9571.81
    nestloop cost    3977.74 .. 6700.71

with statistics:
    mergejoin cost    18213.04 .. 18369.73
    nestloop cost     4054.93 .. 24042.85

(these are a bit different from his results because of different ANALYZE
samples etc, but close enough)

So the planner isn't going crazy: in each case it chose what seemed the
cheapest total-cost plan.

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: is pg_autovacuum so effective ?
Next
From: Gaetano Mendola
Date:
Subject: Re: is pg_autovacuum so effective ?