Re: [HACKERS] Subselects are in CVS... - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] Subselects are in CVS... |
Date | |
Msg-id | 199802132058.PAA20488@candle.pha.pa.us Whole thread Raw |
In response to | Subselects are in CVS... ("Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>) |
List | pgsql-hackers |
Want to add this as a README in the optimizer/planner directory? > > This is some implementation notes and opened issues... > > First, implementation uses new type of parameters - PARAM_EXEC - to deal > with correlation Vars. When query_planner() is called, it first tries to > replace all upper queries Var referenced in current query with Param of > this type. Some global variables are used to keep mapping of Vars to > Params and Params to Vars. > > After this, all current query' SubLinks are processed: for each SubLink > found in query' qual union_planner() (old planner() function) will be > called to plan corresponding subselect (union_planner() calls > query_planner() for "simple" query and supports UNIONs). After subselect > are planned, optimizer knows about is this correlated, un-correlated or > _undirect_ correlated (references some grand-parent Vars but no parent > ones: uncorrelated from the parent' point of view) query. > > For uncorrelated and undirect correlated subqueries of EXPRession or > EXISTS type SubLinks will be replaced with "normal" clauses from > SubLink->Oper list (I changed this list to be list of EXPR nodes, > not just Oper ones). Right sides of these nodes are replaced with > PARAM_EXEC parameters. This is second use of new parameter type. > At run-time these parameters get value from result of subquery > evaluation (i.e. - from target list of subquery). Execution plan of > subquery itself becomes init plan of parent query. InitPlan knows > what parameters are to get values from subquery' results and will be > executed "on-demand" (for query select * from table where x > 0 and > y > (select max(a) from table_a) subquery will not be executed at all > if there are no tuples with x > 0 _and_ y is not used in index scan). > > SubLinks for subqueries of all other types are transformed into > new type of Expr node - SUBPLAN_EXPR. Expr->args are just correlation > variables from _parent_ query. Expr->oper is new SubPlan node. > > This node is used for InitPlan too. It keeps subquery range table, > indices of Params which are to get value from _parent_ query Vars > (i.e. - from Expr->args), indices of Params into which subquery' > results are to be substituted (this is for InitPlans), SubLink > and subquery' execution plan. > > Plan node was changed to know about dependencies on Params from > parent queries and InitPlans, to keep list of changed Params > (from the above) and so be re-scanned if this list is not NULL. > Also, added list of InitPlans (actually, all of them for current > query are in topmost plan node now) and other SubPlans (from > plan->qual) - to initialize them and let them know about changed > Params (from the list of their "interests"). > > After all SubLinks are processed, query_planner() calls qual' > canonificator and does "normal" work. By using Params optimizer > is mostly unchanged. > > Well, Executor. To get subplans re-evaluated without ExecutorStart() > and ExecutorEnd() (without opening and closing relations and indices > and without many palloc() and pfree() - this is what SQL-funcs does > on each call) ExecReScan() now supports most of Plan types... > > Explanation of EXPLAIN. > > vac=> explain select * from tmp where x >= (select max(x2) from test2 > where y2 = y and exists (select * from tempx where tx = x)); > NOTICE: QUERY PLAN: > > Seq Scan on tmp (cost=40.03 size=101 width=8) > SubPlan > ^^^^^^^ subquery is in Seq Scan' qual, its plan is below > -> Aggregate (cost=2.05 size=0 width=0) > InitPlan > ^^^^^^^^ EXISTS subsubquery is InitPlan of subquery > -> Seq Scan on tempx (cost=4.33 size=1 width=4) > -> Result (cost=2.05 size=0 width=0) > ^^^^^^ EXISTS subsubquery was transformed into Param > and so we have Result node here > -> Index Scan on test2 (cost=2.05 size=1 width=4) > > > Opened issues. > > 1. No read permissions checking (easy, just not done yet). > 2. readfuncs.c can't read subplan-s (easy, not critical, because of > we currently nowhere use ascii representation of execution plans). > 3. ExecReScan() doesn't support all plan types. At least support for > MergeJoin has to be implemented. > 4. Memory leaks in ExecReScan(). > 5. I need in advice: if subquery introduced with NOT IN doesn't return > any tuples then qualification is failed, yes ? > 6. Regression tests !!!!!!!!!!!!!!!!!!!! > (Could we use data/queries from MySQL' crash.me ? > Copyright-ed ? Could they give us rights ?) > 7. Performance. > - Should be good when subquery is transformed into InitPlan. > - Something should be done for uncorrelated subqueries introduced > with ANY/ALL - keep thinking. Currently, subplan will be re-scanned > for each parent tuple - very slow... > > Results of some test. TMP is table with x,y (int4-s), x in 0-9, > y = 100 - x, 1000 tuples (10 duplicates of each tuple). TEST2 is table > with x2, y2 (int4-s), x2 in 1-99, y2 = 100 -x2, 10000 tuples (100 dups). > > Trying > > select * from tmp where x >= (select max(x2) from test2 where y2 = y); > > and > > begin; > select y as ty, max(x2) as mx into table tsub from test2, tmp > where y2 = y group by ty; > vacuum tsub; > select x, y from tmp, tsub where x >= mx and y = ty; > drop table tsub; > end; > > Without index on test2(y2): > > SubSelect -> 320 sec > Using temp table -> 32 sec > > Having index > > SubSelect -> 17 sec (2M of memory) > Using temp table -> 32 sec (12M of memory: -S 8192) > > Vadim > > -- Bruce Momjian maillist@candle.pha.pa.us
pgsql-hackers by date: