Re: Columns correlation and adaptive query optimization - Mailing list pgsql-hackers
From | Yugo NAGATA |
---|---|
Subject | Re: Columns correlation and adaptive query optimization |
Date | |
Msg-id | 20210121213048.b7f958ad5c4beb941a48a7dc@sraoss.co.jp Whole thread Raw |
In response to | Re: Columns correlation and adaptive query optimization (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: Columns correlation and adaptive query optimization
|
List | pgsql-hackers |
Hello, On Thu, 26 Mar 2020 18:49:51 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Attached please find new version of the patch with more comments and > descriptions added. Adaptive query optimization is very interesting feature for me, so I looked into this patch. Here are some comments and questions. (1) This patch needs rebase because clauselist_selectivity was modified to improve estimation of OR clauses. (2) If I understand correctly, your proposal consists of the following two features. 1. Add a feature to auto_explain that creates an extended statistic automatically if an error on estimated rows number is large. 2. Improve rows number estimation of join results by considering functional dependencies between vars in the join qual if the qual has more than one clauses, and also functional dependencies between a var in the join qual and vars in quals of the inner/outer relation. As you said, these two parts are independent each other, so one feature will work even if we don't assume the other. I wonder it would be better to split the patch again, and register them to commitfest separately. (3) + DefineCustomBoolVariable("auto_explain.suggest_only", + "Do not create statistic but just record in WAL suggested create statistics statement.", + NULL, + &auto_explain_suggest_on To imply that this parameter is involving to add_statistics_threshold, it seems better for me to use more related name like add_statistics_suggest_only. Also, additional documentations for new parameters are required. (4) + /* + * Prevent concurrent access to extended statistic table + */ + stat_rel = table_open(StatisticExtRelationId, AccessExclusiveLock); + slot = table_slot_create(stat_rel, NULL); + scan = table_beginscan_catalog(stat_rel, 2, entry); (snip) + table_close(stat_rel, AccessExclusiveLock); + } When I tested the auto_explain part, I got the following WARNING. WARNING: buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, flags=0x83000000, refcount=1 2) WARNING: buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, flags=0x83000000, refcount=1 1) WARNING: relcache reference leak: relation "pg_statistic_ext" not closed WARNING: TupleDesc reference leak: TupleDesc 0x7fa439266338 (12029,-1) still referenced WARNING: Snapshot reference leak: Snapshot 0x55c332c10418 still referenced To suppress this, I think we need table_endscan(scan) and ExecDropSingleTupleTableSlot(slot) before finishing this function. (6) + elog(NOTICE, "Auto_explain suggestion: CREATE STATISTICS %s %s FROM %s", stat_name, create_stat_stmt,rel_name); We should use ereport instead of elog for log messages. (7) + double dep = find_var_dependency(root, innerRelid, var, clauses_attnums); + if (dep != 0.0) + { + s1 *= dep + (1 - dep) * s2; + continue; + } I found the following comment of clauselist_apply_dependencies(): * we actually combine selectivities using the formula * * P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b) so, is it not necessary using the same formula in this patch? That is, s1 *= dep + (1-dep) * s2 (if s1 <= s2) s1 *= dep * (s2/s1) + (1-dep) * s2 (otherwise) . (8) +/* + * Try to find dependency between variables. + * var: varaibles which dependencies are considered + * join_vars: list of variables used in other clauses + * This functions return strongest dependency and some subset of variables from the same relation + * or 0.0 if no dependency was found. + */ +static double +var_depends_on(PlannerInfo *root, Var* var, List* clause_vars) +{ The comment mentions join_vars but the actual argument name is clauses_vars, so it needs unification. (9) Currently, it only consider functional dependencies statistics. Can we also consider multivariate MCV list, and is it useful? (10) To achieve adaptive query optimization (AQO) in PostgreSQL, this patch proposes to use auto_explain for getting feedback from actual results. So, could auto_explain be a infrastructure of AQO in future? Or, do you have any plan or idea to make built-in infrastructure for AQO? Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
pgsql-hackers by date: