Home > mailing lists

Re: Risk Estimation WAS: Planner hints in Postgresql - Mailing list pgsql-hackers

From	Josh Berkus
Subject	Re: Risk Estimation WAS: Planner hints in Postgresql
Date	March 18, 2014 21:13:33
Msg-id	53288CC2.8090803@agliodbs.com Whole thread Raw
In response to	Planner hints in Postgresql (Rajmohan C <csrajmohan@gmail.com>)
Responses	Re: Risk Estimation WAS: Planner hints in Postgresql (Atri Sharma <atri.jiit@gmail.com>)
List	pgsql-hackers

Tree view

> On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter).  But it's a long way from wishing that to making
>> it so.  Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.

This is not a model, but here's some starting thoughts:

A "high risk" plan has two components:

a) our statistical data is out-of-date or inadequate

b) the potential execution time if our estimates of selectivity are
wrong is high

c) the cost ratio of certain operations is wrong.

Factor (a) can be modeled two ways:

1. If last_analyze is a long time ago, we have increased the risk.  (Ideally, we'd have some idea of the change rate on
thetable vs.   the last analyze time; right now we don't have those stats)

2. Certain patterns, such as multi-column selectivity and GIN/GiST
selectivity are known to have poor estimates, and be higher risk.
Certainly selectivity functions which have been programmed with a flat
coefficient (like default 0.05 selectivity for gist_ops) could also
return a risk factor which is fairly high.

Factor (b) can be modeled simply by estimating the cost of a plan where
all row estimates are changed by 10X, or even better by a calculation on
the risk factor calculated in (a).  This would then give us the "failure
cost" of the bad plan.  Note that we need to estimate in both
directions, both for higher estimates and lower ones; "abort early"
plans fail because the rows returned are lower than expected, for example.

(b) estimation would be expensive if we did every combination of the
entire plan with wrong estimates, so I'm wondering if it would be
adequate to just estimate the node selectivity being off on a per-node
basis.

(c) we can't realistically estimate for at all (i.e. if we knew the cost
factor was wrong, we'd fix it) so I suggest ignoring it for risk estimation.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

pgsql-hackers by date:

From: Simon Riggs
Date: 18 March 2014, 21:11:36
Subject: Re: pg_archivecleanup bug

From: Bruce Momjian
Date: 18 March 2014, 21:19:28
Subject: Re: pg_archivecleanup bug

Re: Risk Estimation WAS: Planner hints in Postgresql - Mailing list pgsql-hackers

Previous

Next