Thread: Bad Row Count Estimate on View with 8.2

Bad Row Count Estimate on View with 8.2

From
"Dave Dutcher"
Date:
Hello,

I discovered a query which is taking 70 seconds on 8.2.1 which used to take
under a second on 8.1.2.  I was digging into what was causing it and I
believe the problem is a view which the planner estimates will return 1 row
when it actually returns 3500.  When I join with the view, the planner ends
up using a nested loop because it thinks the right branch will run once
instead of 3500 times.  I've analyzed all the tables and played around with
the default_statistics_target, but still the planner estimates 1 row.  I was
wondering if anybody else has any other ideas?

Here is the query the view is defined as:

SELECT foo.fund_id, foo.owner_trader_id, foo.strategy_id, foo.cf_account_id,
foo.instrument_id, sum(foo.pos) AS pos, sum(foo.cost) AS cost
FROM
(
    SELECT om_position.fund_id, om_position.owner_trader_id,
om_position.strategy_id, om_position.cf_account_id,
om_position.instrument_id, om_position.pos, om_position.cost
    FROM om_position
    WHERE om_position.as_of_date = date(now())
    UNION ALL
    SELECT om_trade.fund_id, om_trade.owner_trader_id,
om_trade.strategy_id, om_trade.cf_account_id, om_trade.instrument_id,
om_trade.qty::numeric(22,9) AS pos, om_trade.cost
    FROM om_trade
    WHERE om_trade.process_state = 0 OR om_trade.process_state = 2
) foo
GROUP BY foo.fund_id, foo.owner_trader_id, foo.strategy_id,
foo.cf_account_id, foo.instrument_id;



Here is explain analyze from both 8.1.2 and 8.2.1 with
default_statistics_target=10 and tables freshly analyzed:




8.1.2
HashAggregate  (cost=4760.33..4764.95 rows=308 width=168) (actual
time=56.873..71.293 rows=3569 loops=1)
  ->  Append  (cost=0.00..4675.85 rows=3072 width=54) (actual
time=0.037..38.261 rows=3715 loops=1)
        ->  Index Scan using as_of_date_om_position_index on om_position
(cost=0.00..4637.10 rows=3071 width=54) (actual time=0.031..14.722 rows=3559
loops=1)
              Index Cond: (as_of_date = date(now()))
        ->  Bitmap Heap Scan on om_trade  (cost=4.01..8.03 rows=1 width=48)
(actual time=0.118..0.917 rows=156 loops=1)
              Recheck Cond: ((process_state = 0) OR (process_state = 2))
              ->  BitmapOr  (cost=4.01..4.01 rows=1 width=0) (actual
time=0.079..0.079 rows=0 loops=1)
                    ->  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..2.00 rows=1 width=0)
(actual time=0.060..0.060 rows=156 loops=1)
                          Index Cond: (process_state = 0)
                    ->  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..2.00 rows=1 width=0)
(actual time=0.008..0.008 rows=0 loops=1)
                          Index Cond: (process_state = 2)
Total runtime: 82.398 ms

8.2.1
HashAggregate  (cost=6912.51..6912.53 rows=1 width=200) (actual
time=19.005..24.137 rows=3569 loops=1)
  ->  Append  (cost=0.00..6406.73 rows=28902 width=200) (actual
time=0.037..11.569 rows=3715 loops=1)
        ->  Index Scan using as_of_date_om_position_index on om_position
(cost=0.00..4333.82 rows=2964 width=53) (actual time=0.035..4.884 rows=3559
loops=1)
              Index Cond: (as_of_date = date(now()))
        ->  Bitmap Heap Scan on om_trade  (cost=464.40..1783.89 rows=25938
width=49) (actual time=0.060..0.380 rows=156 loops=1)
              Recheck Cond: ((process_state = 0) OR (process_state = 2))
              ->  BitmapOr  (cost=464.40..464.40 rows=308 width=0) (actual
time=0.041..0.041 rows=0 loops=1)
                    ->  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..225.72 rows=154 width=0)
(actual time=0.032..0.032 rows=156 loops=1)
                          Index Cond: (process_state = 0)
                    ->  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..225.72 rows=154 width=0)
(actual time=0.003..0.003 rows=0 loops=1)
                          Index Cond: (process_state = 2)
Total runtime: 27.193 ms





Here is explain analyze from 8.2.1 with default_statistics_target=1000 and
tables freshly analyzed:




HashAggregate  (cost=5344.36..5344.37 rows=1 width=200) (actual
time=18.826..23.950 rows=3569 loops=1)
  ->  Append  (cost=0.00..5280.01 rows=3677 width=200) (actual
time=0.031..11.606 rows=3715 loops=1)
        ->  Index Scan using as_of_date_om_position_index on om_position
(cost=0.00..5224.44 rows=3502 width=54) (actual time=0.029..4.903 rows=3559
loops=1)
              Index Cond: (as_of_date = date(now()))
        ->  Bitmap Heap Scan on om_trade  (cost=9.91..18.79 rows=175
width=49) (actual time=0.069..0.394 rows=156 loops=1)
              Recheck Cond: ((process_state = 0) OR (process_state = 2))
              ->  BitmapOr  (cost=9.91..9.91 rows=2 width=0) (actual
time=0.050..0.050 rows=0 loops=1)
                    ->  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..5.57 rows=2 width=0)
(actual time=0.039..0.039 rows=156 loops=1)
                          Index Cond: (process_state = 0)
                    ->  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..4.26 rows=1 width=0)
(actual time=0.004..0.004 rows=0 loops=1)
                          Index Cond: (process_state = 2)
Total runtime: 27.055 ms



Thanks,


Dave Dutcher
Telluride Asset Management
952.653.6411



Re: Bad Row Count Estimate on View with 8.2

From
Tom Lane
Date:
"Dave Dutcher" <dave@tridecap.com> writes:
> I discovered a query which is taking 70 seconds on 8.2.1 which used to take
> under a second on 8.1.2.  I was digging into what was causing it and I
> believe the problem is a view which the planner estimates will return 1 row
> when it actually returns 3500.

This is evidently a failure of estimate_num_groups().  However, I do not
see any difference in that code between 8.1 and 8.2 branch tips.  I do
notice a possibly-relevant change that was applied in 8.1.4:

2006-05-02 00:34  tgl

    * src/backend/: optimizer/path/allpaths.c, utils/adt/selfuncs.c
    (REL8_1_STABLE): Avoid assuming that statistics for a parent
    relation reflect the properties of the union of its child relations
    as well.  This might have been a good idea when it was originally
    coded, but it's a fatally bad idea when inheritance is being used
    for partitioning.  It's better to have no stats at all than
    completely misleading stats.  Per report from Mark Liberman.

    The bug arguably exists all the way back, but I've only patched
    HEAD and 8.1 because we weren't particularly trying to support
    partitioning before 8.1.

    Eventually we ought to look at deriving union statistics instead of
    just punting, but for now the drop kick looks good.

I think this was only meant to apply to table inheritance situations,
but on reflection it might affect UNION queries too.  The question is
whether the numbers it was using before really mean anything --- they
seem to have been better-than-nothing in your particular case, but I'm
not sure that translates to a conclusion that we should depend on 'em.

In fact, since there isn't any "parent relation" in a UNION, I'm not
sure that this patch actually changed your results ... but I'm not
seeing what else would've ...

            regards, tom lane

Re: Bad Row Count Estimate on View with 8.2

From
"Dave Dutcher"
Date:
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane
>
>
> In fact, since there isn't any "parent relation" in a UNION, I'm not
> sure that this patch actually changed your results ... but I'm not
> seeing what else would've ...
>

Thanks for looking into it.  I thought I might actually test if it was the
patch you mentioned which changed my results, but I haven't had time.
Because you mentioned it was grouping on the results of a UNION ALL which
was throwing off the row estimate I changed my query from a UNION ALL/GROUP
BY to a GROUP BY/FULL OUTER JOIN.  The view runs a hair slower by itself,
but the better estimate of rows makes it work much better for joining with.
If anybody is curious, this is what I changed too:

SELECT
coalesce(pos_set.fund_id, trade_set.fund_id) as fund_id,
coalesce(pos_set.owner_trader_id, trade_set.owner_trader_id) as
owner_trader_id,
coalesce(pos_set.strategy_id,  trade_set.strategy_id) as strategy_id,
coalesce(pos_set.cf_account_id,  trade_set.cf_account_id) as cf_account_id,
coalesce(pos_set.instrument_id, trade_set.instrument_id) as instrument_id,
coalesce(pos_set.pos, 0) + coalesce(trade_set.pos, 0) as pos,
coalesce(pos_set.cost, 0) + coalesce(trade_set.cost, 0) as cost
FROM
(
SELECT om_position.fund_id, om_position.owner_trader_id,
om_position.strategy_id, om_position.cf_account_id,
om_position.instrument_id, om_position.pos, om_position.cost
FROM om_position
WHERE om_position.as_of_date = date(now())
) as pos_set
full outer join
(
SELECT om_trade.fund_id, om_trade.owner_trader_id, om_trade.strategy_id,
om_trade.cf_account_id, om_trade.instrument_id,
sum(om_trade.qty::numeric(22,9)) AS pos, sum(om_trade.cost) as cost
FROM om_trade
WHERE om_trade.process_state = 0 OR om_trade.process_state = 2
GROUP BY om_trade.fund_id, om_trade.owner_trader_id, om_trade.strategy_id,
om_trade.cf_account_id, om_trade.instrument_id
) as trade_set
ON
pos_set.fund_id = trade_set.fund_id and pos_set.owner_trader_id =
trade_set.owner_trader_id and
pos_set.strategy_id = trade_set.strategy_id and pos_set.cf_account_id =
trade_set.cf_account_id and
pos_set.instrument_id = trade_set.instrument_id;




Re: Bad Row Count Estimate on View with 8.2

From
Tom Lane
Date:
"Dave Dutcher" <dave@tridecap.com> writes:
> Thanks for looking into it.  I thought I might actually test if it was the
> patch you mentioned which changed my results, but I haven't had time.
> Because you mentioned it was grouping on the results of a UNION ALL which
> was throwing off the row estimate I changed my query from a UNION ALL/GROUP
> BY to a GROUP BY/FULL OUTER JOIN.  The view runs a hair slower by itself,
> but the better estimate of rows makes it work much better for joining with.

I took another look and think I found the problem: 8.2's new code for
flattening UNION ALL subqueries into "append relations" is failing to
initialize all the fields of the appendrel, which confuses
estimate_num_groups (and perhaps other places).  I think this will fix
it for you.

            regards, tom lane

Index: allpaths.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v
retrieving revision 1.154
diff -c -r1.154 allpaths.c
*** allpaths.c    4 Oct 2006 00:29:53 -0000    1.154
--- allpaths.c    28 Jan 2007 18:44:01 -0000
***************
*** 384,389 ****
--- 384,395 ----
      }

      /*
+      * Set "raw tuples" count equal to "rows" for the appendrel; needed
+      * because some places assume rel->tuples is valid for any baserel.
+      */
+     rel->tuples = rel->rows;
+
+     /*
       * Finally, build Append path and install it as the only access path for
       * the parent rel.    (Note: this is correct even if we have zero or one
       * live subpath due to constraint exclusion.)

Re: Bad Row Count Estimate on View with 8.2

From
"Dave Dutcher"
Date:
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane
>
> I took another look and think I found the problem: 8.2's new code for
> flattening UNION ALL subqueries into "append relations" is failing to
> initialize all the fields of the appendrel, which confuses
> estimate_num_groups (and perhaps other places).  I think this will fix
> it for you.
>

I gave this a try on our test machine yesterday and it worked.  The planner
was estimating that the group by on the union would return about 300 rows
which is very similar to what 8.1.2 thought.  Actually it returned about
3000 rows, but still it is a good enough estimate to pick a plan which takes
100ms instead of a plan which takes 100 seconds.

Thanks,

Dave