Re: Adaptive query optimization - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Adaptive query optimization
Date
Msg-id 20190613133542.bnqcu5hyw2rkhdm7@development
Whole thread Raw
In response to Re: Adaptive query optimization  (Rafia Sabih <rafia.pghackers@gmail.com>)
List pgsql-hackers
On Thu, Jun 13, 2019 at 03:17:07PM +0200, Rafia Sabih wrote:
>On Thu, 13 Jun 2019 at 06:07, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>>
>> On Thu, Jun 13, 2019 at 5:49 AM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> > >> ...
>> > >>
>> > >That'll be an interesting work. For the above query, we can definitely
>> > >calculate the correction coefficient of t1-t2 join given (t1.a = ? AND
>> > >t1.b = ? AND t1.c < ?) and
>> > >(t2.x = ? AND t2.y = ?) are true. But, I'm not sure how we can
>> > >extrapolate that value for t1-t2 join.
>> >
>> > I'm not sure I see the problem? Essentially, we need to know the sizes
>> > of the join inputs, i.e.
>> >
>> >     t1 WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
>> >
>> >     t2 WHERE (t2.x = ? AND t2.y = ?)
>> >
>> > (which we know, and we know how to correct the estimate), and then the
>> > selectivity of the join condition. Which we also know.
>> >
>> > Obviously, there's a chance those parts (clauses at the scan / join
>> > level) are correlated, which could make this less accurate.
>> This is exactly what my concern is. The base predicate selectivities
>> of t1 and t2 should have an impact on the calculation of the
>> correction coefficient. If those selectivities are low, the
>> misestimation (which is actual/estimate) should not affect the t1-t2
>> join correction coefficient much.
>>
>Interesting discussion. Talking of query optimization techniques and
>challenges, isn't the biggest challenge there is of selectivity
>estimation?

Yes, selectivity estimation is the major challenge. It's not the only one,
but we rely on the estimates quite a bit - it's probably the main factor
affecting cost estimates.

> Then instead of working on optimizing the process which
>has been talked of since long, how about skipping the process
>altogether. This reminds of the work I came across sometime back[1].
>Basically, the idea is to not spend any energy on estimation the
>selectivities rather get on with the execution. Precisely, a set of
>plans is kept apriori for different selectivities and at the execution
>time it starts with the plans one by one, starting from the lower
>selectivity one till the query execution completes. It might sound
>like too much work but it isn't, there are some theoretical guarantees
>to bound the worst case execution time. The trick is in choosing the
>plan-set and switching at the time of execution. Another good point
>about this is that it works smoothly for join predicates as well.
>
>Since, we are talking about this problem here, I though it might be a
>good idea to shed some light on such an approach and see if there is
>some interesting trick we might use.
>
>[1] https://dsl.cds.iisc.ac.in/publications/conference/bouquet.pdf
>

AFAIK adaptive execution (switching from one plan to another
mid-execution) is actually quite difficult to implement in practice,
especially when some of the rows might have been already sent to the
user, etc. Which is why databases (outside of academia) use this only
in very limited/specific situations.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Update list of combining characters
Next
From: Oleksii Kliukin
Date:
Subject: Re: upgrades in row-level locks can deadlock