Thread: TwoPO: experimental join order algorithm
Hi, I'd like to release the last version of my experimental join order algorithm (TwoPO - Two Phase Optimization [1]): http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary This algorithm is not production-ready, but an experimental set of ideas, which need to be refined and evaluated. As the join order optimization is a hard problem, the evaluation of a search strategy is also a hard task. Therefore, I think the most important TODO item related to replacement of GEQO algorithm is to define a set of evaluation criteria considered as relevant. TwoPO is encapsulated in a plug-in called LJQO (Large Join Query Optimization [2]). This plug-in has two main GUC variables: ljqo_threshold = N (like geqo_threshold) ljqo_algorithm = {twopo|geqo} As its name means, TwoPO has internally two search strategies that constitute its two phases of optimization: * Iterative Improvement (II) – Biased Sampling + Local Search* Simulated Annealing (SA) This algorithm also works in two search spaces: * deep-trees (subset of bushy-trees) - list of baserels - initial states: biased sampling using Prim algorithmover join graph (new, very efficient) - moves: swap, 3cycle [2]* bushy-trees - binary join tree representation - initial states: biased sampling using Kruskal's algorithm over join graph [3,4]. - moves: associative You can modify the functionality of TwoPO through the following parameters: twopo_bushy_space = {true|false} - set it to false if you want only deep-trees default=true twopo_heuristic_states = {true|false} - enables heuristic to initial states default=true twopo_ii_stop = Int - number of initial states default=10 twopo_ii_improve_states = {true|false} - find local-minimum of each initial state default=true twopo_sa_phase = {true|false} - enables Simulated Annealing (SA) phase default=true twopo_sa_initial_temperature = Float - initial temperature for SA phase default=0.1 twopo_sa_temperature_reduction = Float - temperature reduction default=0.95 twopo_sa_equilibrium = Int - number of states generated for each temperature (Int * State Size) default=16 References: [1] Yannis E. Ioannidis e Younkyung Kang. Randomized algorithms for optimizing large join queries. SIGMOD Rec., 19(2):312-321, 1990. [2] Arun Swami e Anoop Gupta. Optimization of large join queries. SIGMOD '88: Proceedings of the 1988 ACM SIGMOD international conference on Management of data, pages 8-17, New York, NY, USA, 1988. ACM. [3] P.B. Guttoski, M. S. Sunye, e F. Silva. Kruskal's algorithm for query tree optimization. IDEAS '07: Proceedings of the 11th International Database Engineering and Applications Symposium, pages 296-302, Washington, DC, USA, 2007. IEEE Computer Society. [4] Florian Waas e Arjan Pellenkoft. Join order selection - good enough is easy. BNCOD, pages 51-67, 2000. Att, Adriano Lange
On Sat, Jul 24, 2010 at 9:20 AM, Adriano Lange <alange0001@gmail.com> wrote: > I'd like to release the last version of my experimental join order > algorithm (TwoPO - Two Phase Optimization [1]): > > http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary > > This algorithm is not production-ready, but an experimental set of > ideas, which need to be refined and evaluated. As the join order > optimization is a hard problem, the evaluation of a search strategy is > also a hard task. Therefore, I think the most important TODO item > related to replacement of GEQO algorithm is to define a set of > evaluation criteria considered as relevant. As you may know, we're in the middle of a CommitFest right now; I'd suggest adding this patch to the next one. https://commitfest.postgresql.org/action/commitfest_view/open Someone might have time to look at it sooner, but at least if you add it here we'll not lose track of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Em 25-07-2010 17:44, Robert Haas escreveu: > On Sat, Jul 24, 2010 at 9:20 AM, Adriano Lange <alange0001@gmail.com> wrote: >> I'd like to release the last version of my experimental join order >> algorithm (TwoPO - Two Phase Optimization [1]): >> >> http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary >> >> This algorithm is not production-ready, but an experimental set of >> ideas, which need to be refined and evaluated. As the join order >> optimization is a hard problem, the evaluation of a search strategy is >> also a hard task. Therefore, I think the most important TODO item >> related to replacement of GEQO algorithm is to define a set of >> evaluation criteria considered as relevant. > > As you may know, we're in the middle of a CommitFest right now; I'd > suggest adding this patch to the next one. > > https://commitfest.postgresql.org/action/commitfest_view/open > > Someone might have time to look at it sooner, but at least if you add > it here we'll not lose track of it. > Yes, I know. This is only a notice, not a patch. As I said, this algorithm is experimental, which do not match with the CommitFest life cycle. -- Adriano Lange
On Sun, Jul 25, 2010 at 6:45 PM, Adriano Lange <alange0001@gmail.com> wrote: > Yes, I know. This is only a notice, not a patch. > As I said, this algorithm is experimental, which do not match with the > CommitFest life cycle. It matches just fine - you just want a review and some good feedback, rather than an actual commit. It's just... I don't have time to do that tonight. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Em 25-07-2010 19:17, Robert Haas escreveu: > On Sun, Jul 25, 2010 at 6:45 PM, Adriano Lange <alange0001@gmail.com> wrote: >> Yes, I know. This is only a notice, not a patch. >> As I said, this algorithm is experimental, which do not match with the >> CommitFest life cycle. > > It matches just fine - you just want a review and some good feedback, > rather than an actual commit. It's just... I don't have time to do > that tonight. :-) > :-) Just a notice of an experimental algorithm for now. I think there are no clear criteria for review this kind of algorithm yet. -- Adriano Lange
On 24/07/10 15:20, Adriano Lange wrote: > Hi, Hi! > > I'd like to release the last version of my experimental join order > algorithm (TwoPO - Two Phase Optimization [1]): > > http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary > > This algorithm is not production-ready, but an experimental set of > ideas, which need to be refined and evaluated. As the join order > optimization is a hard problem, the evaluation of a search strategy is > also a hard task. Therefore, I think the most important TODO item > related to replacement of GEQO algorithm is to define a set of > evaluation criteria considered as relevant. Good to hear from you -- I don't know if you are aware about a simulated annealing join search module that I'm working on. When I first started, I planned to base is on the patch that you sent to -hackers some time ago (and actually, if you look at the repo for my module, twopo.c is still in there) but ended up doing it from scratch. However, reading your code was a big help in the beginning. I gave a talk at this year's PGCon (http://www.pgcon.org/2010/schedule/events/211.en.html) and you will find your name in the acknowledgments section of the presentation :) I'll make sure to read your new code and compare the approaches, my results so far are not perfect, but also not very pessimistic. I think there is actually a chance to replace GEQO with SA in the future, or at least to ship more join search modules with the standard distribution and gather field reports. Cheers, Jan
On Fri, Jul 30, 2010 at 7:02 AM, Jan Urbański <wulczer@wulczer.org> wrote: > On 24/07/10 15:20, Adriano Lange wrote: >> >> Hi, > > Hi! > >> >> I'd like to release the last version of my experimental join order >> algorithm (TwoPO - Two Phase Optimization [1]): >> >> http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary >> >> This algorithm is not production-ready, but an experimental set of >> ideas, which need to be refined and evaluated. As the join order >> optimization is a hard problem, the evaluation of a search strategy is >> also a hard task. Therefore, I think the most important TODO item >> related to replacement of GEQO algorithm is to define a set of >> evaluation criteria considered as relevant. > > Good to hear from you -- I don't know if you are aware about a simulated > annealing join search module that I'm working on. When I first started, I > planned to base is on the patch that you sent to -hackers some time ago (and > actually, if you look at the repo for my module, twopo.c is still in there) > but ended up doing it from scratch. However, reading your code was a big > help in the beginning. > > I gave a talk at this year's PGCon > (http://www.pgcon.org/2010/schedule/events/211.en.html) and you will find > your name in the acknowledgments section of the presentation :) > > I'll make sure to read your new code and compare the approaches, my results > so far are not perfect, but also not very pessimistic. I think there is > actually a chance to replace GEQO with SA in the future, or at least to ship > more join search modules with the standard distribution and gather field > reports. > > Cheers, > Jan > Hi Jan, It's good to know that you are also interested about this issue! I saw the slides of your presentation at PGCon and I noted excellent ideas. I think that as more as implementations and ideas will be raised, more easily will be to converge them in a reliable and competitive solution. The current version of TwoPO is suitable for the classic select-project-join problem. Therefore, there are some issues not covered yet, as I have observed in a test schema presented by Andres Freund (http://archives.postgresql.org/pgsql-hackers/2009-07/msg00546.php). I think we will need to build a robust benchmark to deal with it. At this moment I'm really busy with a "homework", but I hope to dedicate more time to this issue next month. -- Adriano Lange