Thread: optimizer questions
Hello everyone, I am working with the Postgres optimizer for the first time, so bear with me... I want to extend the optimizer to deal with aggregate queries a bit better. The idea is from an old paper by Chaudhuri and Shim in VLDB 94. The gist of it is that when computing aggregates over the result of joining multiple tables, under some conditions the aggregate can be pushed into the join tree to reduce the size of join operands making resulting plans cheaper. So here is my problem, due to the path/plan separation of the Postgres optimizer, this is not trivial (joins are decided in path, aggregates decided in plan). As it stands, aggregate nodes can only appear as the top node of subqueries. Before I start trying this (creating aggregate paths seems the reasonable thing to do) I would like your counsel. 1) Regardless of the optimization problem, is the executor able to execute aggregate nodes within join trees (that is, not as the result of subqueries)? 2) Has anyone tried something like this before? 3) For debugging purposes: Has anyone figured out a way to feed hand-crafted plans to the executor? Setting up some of the data structures (PlannerInfo, target lists) etc. does not look trivial. By this I mean, beyond giving explicit join clauses in queries. 4) Any other suggestions? Thank you very much, Hector
On Tue, Feb 14, 2006 at 10:35:12AM -0600, hector Corrada Bravo wrote: > Hello everyone, > > I am working with the Postgres optimizer for the first time, so bear with me... > > I want to extend the optimizer to deal with aggregate queries a bit > better. The idea is from an old paper by Chaudhuri and Shim in VLDB > 94. The gist of it is that when computing aggregates over the result > of joining multiple tables, under some conditions the aggregate can be > pushed into the join tree to reduce the size of join operands making > resulting plans cheaper. > > So here is my problem, due to the path/plan separation of the Postgres > optimizer, this is not trivial (joins are decided in path, aggregates > decided in plan). As it stands, aggregate nodes can only appear as the > top node of subqueries. Well, the basic approach in the beginning should probably be to change the planner to modify the plan so you get a new plan with the aggregation in the right place. > Before I start trying this (creating aggregate paths seems the > reasonable thing to do) I would like your counsel. > > 1) Regardless of the optimization problem, is the executor able to > execute aggregate nodes within join trees (that is, not as the result > of subqueries)? An Aggregate is just a node type that can be stacked above any other node. Ofcourse, it requires the input rows to be in the appropriate order but other than that. > 2) Has anyone tried something like this before? No idea. > 3) For debugging purposes: Has anyone figured out a way to feed > hand-crafted plans to the executor? Setting up some of the data > structures (PlannerInfo, target lists) etc. does not look trivial. By > this I mean, beyond giving explicit join clauses in queries. Nope sorry. > 4) Any other suggestions? Well, I honestly have no idea what kind of transformations you have in mind so not really. However, you should probably realise that in PostgreSQL aggregates are not special and users can make their own. Whatever you think of needs to take that into account. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
hector Corrada Bravo <hcorrada@gmail.com> writes: > 1) Regardless of the optimization problem, is the executor able to > execute aggregate nodes within join trees (that is, not as the result > of subqueries)? Sure. > 3) For debugging purposes: Has anyone figured out a way to feed > hand-crafted plans to the executor? Setting up some of the data > structures (PlannerInfo, target lists) etc. does not look trivial. By > this I mean, beyond giving explicit join clauses in queries. It's not really very practical --- the data structures are too complex to create by hand in any reasonable way. You can probably test by entering modified queries that do the aggregations in sub-selects, though. The most practical way to implement something like this is probably to restructure the query during the "prep" stage, pushing the aggregates down into sub-queries automatically. The 8.1 min/max optimization code does something related, although I'll freely admit that's a bit of a hack. regards, tom lane
--On Dienstag, Februar 14, 2006 10:35:12 -0600 hector Corrada Bravo <hcorrada@gmail.com> wrote: > Before I start trying this (creating aggregate paths seems the > reasonable thing to do) I would like your counsel. > > 1) Regardless of the optimization problem, is the executor able to > execute aggregate nodes within join trees (that is, not as the result > of subqueries)? > > 2) Has anyone tried something like this before? I did and failed because I did not quite understand how the Postgres internal variables should be initialized. My approach was to supply other join pathes if one of the two tables was an aggregate. Mit freundlichem Gruß Jens Schicke -- Jens Schicke j.schicke@asco.de asco GmbH http://www.asco.de Mittelweg 7 Tel 0531/3906-127 38106 Braunschweig Fax 0531/3906-400