Re: Adaptive query optimization - Mailing list pgsql-hackers

From Kuntal Ghosh
Subject Re: Adaptive query optimization
Date
Msg-id CAGz5QCLiH1VPNSF+F=2wKvr+XiCzQfEsi5DrwNA16o0KUat0TQ@mail.gmail.com
Whole thread Raw
In response to Re: Adaptive query optimization  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Adaptive query optimization
Re: Adaptive query optimization
List pgsql-hackers
Hello,

On Wed, Jun 12, 2019 at 5:06 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> On 12.06.2019 0:43, Tomas Vondra wrote:
> I don't think "learning phase" is an issue, in fact I think that's
> something we need to do - it ensures we have enough data to make good
> decisions.
>
> What is wrong with learning phase is that it requires some DBA assistance: somebody should determine when to start
learning,
> provide relevant workload and determine when learning is finished.
> One of the most recent trends in DBMSes is autonomous databases with zero administration effort.
> It is especially important for clouds. And of one the main advantages of AQO is that it allows to optimize queries
withouthuman interaction.
 
>
> But unfortunately I really do not know how to avoid learning phase, especially if we what to run queries at replica.
>
Avoiding learning phase in AQO a implementation sounds like an oxymoron. :-)
Perhaps, you meant how we can minimize the effort in learning phase. A
learning phase has its own complications - like
a. deciding the the number of iterations needed to achieve certain
kind of confidence
b. which parameters to tune (are the existing parameters enough?)
c. deciding the cost model
Coming up answers for these things is pretty hard.

>
> I think that depends - some machine learning approaches are not that
> bad. But I think there's a more serious issue - explainability. We need
> a solution where we can explain/justify why it makes some decisions. I
> really don't want a black box that produces numbers that you just need
> to take at face value.
>
> The good thing is that the simpler the method, the less expensive and
> more explainable it is.
+1

>
> I tried to create much simpler version of AQO based on auto_explain extension.
> This extension provide all necessary infrastructure to analyze statements with long execution time.
> I have added two new modes to auto_explain:
> 1. Auto generation of multicolumn statistics for variables using in clauses with large estimation error.
>
>
> Interesting! I probably wouldn't consider this part of adaptive query
> optimization, but it probably makes sense to make it part of this. I
> wonder if we might improve this to also suggest "missing" indexes?
>
I like this part of the implementation. I also agree that this can be
used to come up with good hypothetical index suggestions. But, it
needs some additional algorithms. For example, after analysing a set
of queries, we can come up with a minimal set of indexes that needs to
be created to minimize the total cost. I've not checked the internal
implementation of hypogo. Probably, I should do that.

>
> I think that it should be nest step of adaptive query optimization:
> - autogeneration of indexes
> - auto adjustment of optimizer cost parameters (cpu cost, random/sequential page access cost,...)
AFAIK, the need for adjustment of cost parameters are highly dominated
by solving the selectivity estimation errors. But of course, you can
argue with that.

>
> Right. But I think I might have an idea how to address (some of) this.
>
> As I already mentioned, I was experimenting with something similar,
> maybe two or three years ago (I remember chatting about it with Teodor
> at pgcon last week). I was facing the same issues, and my approach was
> based on hooks too.
>
> But my idea was to not to track stats for a plan as a whole, but instead
> decompose it into individual nodes, categoried into three basic groups -
> scans, joins and aggregations. And then use this extracted information
> to other plans, with "matching" nodes.
>
> For example, let's consider a simple "single-scan" query
>
>    SELECT * FROM t1 WHERE a = ? AND b = ? AND c < ?;
>
> Now, if you execute this enought times (say, 100x or 1000x), tracking
> the estimates and actual row counts, you may then compute the average
> misestimate (maybe a geometric mean would be more appropriate here?):
>
>    AVG(actual/estimate)
>
>
> Certainly stats should be collected for each plan node, not for the whole plan.
> And it is done now in Oleg's and my implementation.
> Oleg is using gradient descent method. I first tried to calculate average, but then find out that building something
like"histogram",
 
> where bin is determined as log10 of estimated number of rows.
>
I think maintaining a "histogram" sounds good. I've read a paper
called "Self-tuning Histograms: Building Histograms Without
Looking at Data" which tries to do something similar[1].

>
> and if this is significantly different from 1.0, then we can say there's
> a systemic misestimate, and we can use this as a correction coefficient
> when computing the scan estimate. (And we need to be careful about
> collection new data, because the estimates will include this correction.
> But that can be done by tracking "epoch" of the plan.)
>
> Now, if someone uses this same scan in a join, like for example
>
>    SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
>     WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
>       AND (t2.x = ? AND t2.y = ?)
>
> then we can still apply the same correction to the t1 scan (I think).
> But then we can also collect data for the t1-t2 join, and compute a
> correction coefficient in a similar way. It requires a bit of care
> because we need to compensate for misestimates of inputs, but I think
> that's doable.
>
That'll be an interesting work. For the above query, we can definitely
calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
t1.b = ? AND t1.c < ?) and
(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
extrapolate that value for t1-t2 join.
>
> As far as I know Oleg's AQO is now used by Amason.
> So it is something more than just PoC. But certainly there are still many problems
> and my experiments with JOB benchmark shown that there are still a lot of things to improve.
>
Nice.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.921&rep=rep1&type=pdf
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: Minimal logical decoding on standbys
Next
From: Fabien COELHO
Date:
Subject: Re: fix psql \conninfo & \connect when using hostaddr