Thread: TwoPO: experimental join order algorithm

TwoPO: experimental join order algorithm

From
Adriano Lange
Date:
Hi,

I'd like to release the last version of my experimental join order
algorithm (TwoPO - Two Phase Optimization [1]):

http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary

This algorithm is not production-ready, but an experimental set of
ideas, which need to be refined and evaluated. As the join order
optimization is a hard problem, the evaluation of a search strategy is
also a hard task. Therefore, I think the most important TODO item
related to replacement of GEQO algorithm is to define a set of
evaluation criteria considered as relevant.

TwoPO is encapsulated in a plug-in called LJQO (Large Join Query
Optimization [2]). This plug-in has two main GUC variables:

ljqo_threshold = N (like geqo_threshold)
ljqo_algorithm = {twopo|geqo}

As its name means, TwoPO has internally two search strategies that
constitute its two phases of optimization:
* Iterative Improvement (II) – Biased Sampling + Local Search* Simulated Annealing (SA)

This algorithm also works in two search spaces:
* deep-trees (subset of bushy-trees)   - list of baserels   - initial states: biased sampling using         Prim
algorithmover join graph (new, very efficient)   - moves: swap, 3cycle [2]* bushy-trees   - binary join tree
representation  - initial states: biased sampling using         Kruskal's algorithm over join graph [3,4].   - moves:
associative

You can modify the functionality of TwoPO through the following parameters:

twopo_bushy_space = {true|false} - set it to false if you want only deep-trees   default=true
twopo_heuristic_states = {true|false} - enables heuristic to initial states   default=true
twopo_ii_stop = Int - number of initial states   default=10
twopo_ii_improve_states = {true|false} - find local-minimum of each initial state   default=true
twopo_sa_phase = {true|false} - enables Simulated Annealing (SA) phase   default=true
twopo_sa_initial_temperature = Float - initial temperature for SA phase   default=0.1
twopo_sa_temperature_reduction = Float - temperature reduction   default=0.95
twopo_sa_equilibrium = Int - number of states generated for each temperature   (Int * State Size)   default=16


References:

[1] Yannis E. Ioannidis e Younkyung Kang. Randomized algorithms for
optimizing large join queries. SIGMOD Rec., 19(2):312-321, 1990.

[2] Arun Swami e Anoop Gupta. Optimization of large join queries. SIGMOD
'88: Proceedings of the 1988 ACM SIGMOD international conference on
Management of data, pages 8-17, New York, NY, USA, 1988. ACM.

[3] P.B. Guttoski, M. S. Sunye, e F. Silva. Kruskal's algorithm for
query tree optimization. IDEAS '07: Proceedings of the 11th
International Database Engineering and Applications Symposium, pages
296-302, Washington, DC, USA, 2007. IEEE Computer Society.

[4] Florian Waas e Arjan Pellenkoft. Join order selection - good enough
is easy. BNCOD, pages 51-67, 2000.


Att,
Adriano Lange



Re: TwoPO: experimental join order algorithm

From
Robert Haas
Date:
On Sat, Jul 24, 2010 at 9:20 AM, Adriano Lange <alange0001@gmail.com> wrote:
> I'd like to release the last version of my experimental join order
> algorithm (TwoPO - Two Phase Optimization [1]):
>
> http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary
>
> This algorithm is not production-ready, but an experimental set of
> ideas, which need to be refined and evaluated. As the join order
> optimization is a hard problem, the evaluation of a search strategy is
> also a hard task. Therefore, I think the most important TODO item
> related to replacement of GEQO algorithm is to define a set of
> evaluation criteria considered as relevant.

As you may know, we're in the middle of a CommitFest right now; I'd
suggest adding this patch to the next one.

https://commitfest.postgresql.org/action/commitfest_view/open

Someone might have time to look at it sooner, but at least if you add
it here we'll not lose track of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: TwoPO: experimental join order algorithm

From
Adriano Lange
Date:
Em 25-07-2010 17:44, Robert Haas escreveu:
> On Sat, Jul 24, 2010 at 9:20 AM, Adriano Lange <alange0001@gmail.com> wrote:
>> I'd like to release the last version of my experimental join order
>> algorithm (TwoPO - Two Phase Optimization [1]):
>>
>> http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary
>>
>> This algorithm is not production-ready, but an experimental set of
>> ideas, which need to be refined and evaluated. As the join order
>> optimization is a hard problem, the evaluation of a search strategy is
>> also a hard task. Therefore, I think the most important TODO item
>> related to replacement of GEQO algorithm is to define a set of
>> evaluation criteria considered as relevant.
> 
> As you may know, we're in the middle of a CommitFest right now; I'd
> suggest adding this patch to the next one.
> 
> https://commitfest.postgresql.org/action/commitfest_view/open
> 
> Someone might have time to look at it sooner, but at least if you add
> it here we'll not lose track of it.
> 

Yes, I know. This is only a notice, not a patch.
As I said, this algorithm is experimental, which do not match with the
CommitFest life cycle.

--
Adriano Lange


Re: TwoPO: experimental join order algorithm

From
Robert Haas
Date:
On Sun, Jul 25, 2010 at 6:45 PM, Adriano Lange <alange0001@gmail.com> wrote:
> Yes, I know. This is only a notice, not a patch.
> As I said, this algorithm is experimental, which do not match with the
> CommitFest life cycle.

It matches just fine - you just want a review and some good feedback,
rather than an actual commit.  It's just... I don't have time to do
that tonight.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: TwoPO: experimental join order algorithm

From
Adriano Lange
Date:
Em 25-07-2010 19:17, Robert Haas escreveu:
> On Sun, Jul 25, 2010 at 6:45 PM, Adriano Lange <alange0001@gmail.com> wrote:
>> Yes, I know. This is only a notice, not a patch.
>> As I said, this algorithm is experimental, which do not match with the
>> CommitFest life cycle.
> 
> It matches just fine - you just want a review and some good feedback,
> rather than an actual commit.  It's just... I don't have time to do
> that tonight.  :-)
> 

:-)
Just a notice of an experimental algorithm for now.
I think there are no clear criteria for review this kind of algorithm yet.

--
Adriano Lange


Re: TwoPO: experimental join order algorithm

From
Jan Urbański
Date:
On 24/07/10 15:20, Adriano Lange wrote:
> Hi,

Hi!

>
> I'd like to release the last version of my experimental join order
> algorithm (TwoPO - Two Phase Optimization [1]):
>
> http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary
>
> This algorithm is not production-ready, but an experimental set of
> ideas, which need to be refined and evaluated. As the join order
> optimization is a hard problem, the evaluation of a search strategy is
> also a hard task. Therefore, I think the most important TODO item
> related to replacement of GEQO algorithm is to define a set of
> evaluation criteria considered as relevant.

Good to hear from you --  I don't know if you are aware about a 
simulated annealing join search module that I'm working on. When I first 
started, I planned to base is on the patch that you sent to -hackers 
some time ago (and actually, if you look at the repo for my module, 
twopo.c is still in there) but ended up doing it from scratch. However, 
reading your code was a big help in the beginning.

I gave a talk at this year's PGCon 
(http://www.pgcon.org/2010/schedule/events/211.en.html) and you will 
find your name in the acknowledgments section of the presentation :)

I'll make sure to read your new code and compare the approaches, my 
results so far are not perfect, but also not very pessimistic. I think 
there is actually a chance to replace GEQO with SA in the future, or at 
least to ship more join search modules with the standard distribution 
and gather field reports.

Cheers,
Jan


Re: TwoPO: experimental join order algorithm

From
Adriano Lange
Date:
On Fri, Jul 30, 2010 at 7:02 AM, Jan Urbański <wulczer@wulczer.org> wrote:
> On 24/07/10 15:20, Adriano Lange wrote:
>>
>> Hi,
>
> Hi!
>
>>
>> I'd like to release the last version of my experimental join order
>> algorithm (TwoPO - Two Phase Optimization [1]):
>>
>> http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary
>>
>> This algorithm is not production-ready, but an experimental set of
>> ideas, which need to be refined and evaluated. As the join order
>> optimization is a hard problem, the evaluation of a search strategy is
>> also a hard task. Therefore, I think the most important TODO item
>> related to replacement of GEQO algorithm is to define a set of
>> evaluation criteria considered as relevant.
>
> Good to hear from you --  I don't know if you are aware about a simulated
> annealing join search module that I'm working on. When I first started, I
> planned to base is on the patch that you sent to -hackers some time ago (and
> actually, if you look at the repo for my module, twopo.c is still in there)
> but ended up doing it from scratch. However, reading your code was a big
> help in the beginning.
>
> I gave a talk at this year's PGCon
> (http://www.pgcon.org/2010/schedule/events/211.en.html) and you will find
> your name in the acknowledgments section of the presentation :)
>
> I'll make sure to read your new code and compare the approaches, my results
> so far are not perfect, but also not very pessimistic. I think there is
> actually a chance to replace GEQO with SA in the future, or at least to ship
> more join search modules with the standard distribution and gather field
> reports.
>
> Cheers,
> Jan
>

Hi Jan,

It's good to know that you are also interested about this issue!

I saw the slides of your presentation at PGCon and I noted excellent
ideas. I think that as more as implementations and ideas will be
raised, more easily will be to converge them in a reliable and
competitive solution.

The current version of TwoPO is suitable for the classic
select-project-join problem. Therefore, there are some issues not
covered yet, as I have observed in a test schema presented by Andres
Freund (http://archives.postgresql.org/pgsql-hackers/2009-07/msg00546.php).
I think we will need to build a robust benchmark to deal with it.

At this moment I'm really busy with a "homework", but I hope to
dedicate more time to this issue next month.

--
Adriano Lange