Re: Risk Estimation WAS: Planner hints in Postgresql - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Risk Estimation WAS: Planner hints in Postgresql
Date
Msg-id 53288CC2.8090803@agliodbs.com
Whole thread Raw
In response to Planner hints in Postgresql  (Rajmohan C <csrajmohan@gmail.com>)
Responses Re: Risk Estimation WAS: Planner hints in Postgresql  (Atri Sharma <atri.jiit@gmail.com>)
List pgsql-hackers
> On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter).  But it's a long way from wishing that to making
>> it so.  Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.

This is not a model, but here's some starting thoughts:

A "high risk" plan has two components:

a) our statistical data is out-of-date or inadequate

b) the potential execution time if our estimates of selectivity are
wrong is high

c) the cost ratio of certain operations is wrong.

Factor (a) can be modeled two ways:

1. If last_analyze is a long time ago, we have increased the risk.  (Ideally, we'd have some idea of the change rate on
thetable vs.   the last analyze time; right now we don't have those stats)
 

2. Certain patterns, such as multi-column selectivity and GIN/GiST
selectivity are known to have poor estimates, and be higher risk.
Certainly selectivity functions which have been programmed with a flat
coefficient (like default 0.05 selectivity for gist_ops) could also
return a risk factor which is fairly high.

Factor (b) can be modeled simply by estimating the cost of a plan where
all row estimates are changed by 10X, or even better by a calculation on
the risk factor calculated in (a).  This would then give us the "failure
cost" of the bad plan.  Note that we need to estimate in both
directions, both for higher estimates and lower ones; "abort early"
plans fail because the rows returned are lower than expected, for example.

(b) estimation would be expensive if we did every combination of the
entire plan with wrong estimates, so I'm wondering if it would be
adequate to just estimate the node selectivity being off on a per-node
basis.

(c) we can't realistically estimate for at all (i.e. if we knew the cost
factor was wrong, we'd fix it) so I suggest ignoring it for risk estimation.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: pg_archivecleanup bug
Next
From: Bruce Momjian
Date:
Subject: Re: pg_archivecleanup bug