Thread: [HACKERS] Discussion on missing optimizations
Hopefully it's alright for me to post this here, please let me know if not.
I ran across an article on blog.jooq.org comparing all the major RDBMS' with their ability to optimize away unnecessary work with queries which are less than optimal, and saw some discussion on hackernews and reddit, but I hadn't seen any discussion here.
The article in question is here: https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/
Commercial databases seem to have a serious leg up in this area, and there are even some that MySQL has that Postgres doesn't.
I was wondering which of these are planned, which have had discussion before and decided not to support, and which just haven't been thought of?
I thought i'd bring it up and hopefully others who are more knowledgeable can chime in.
Hi, On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote: > Hopefully it's alright for me to post this here, please let me know if not. > I ran across an article on blog.jooq.org comparing all the major RDBMS' > with their ability to optimize away unnecessary work with queries which are > less than optimal, and saw some discussion on hackernews and reddit, but I > hadn't seen any discussion here. > > The article in question is here: > https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/ That's interesting. > Commercial databases seem to have a serious leg up in this area, and there > are even some that MySQL has that Postgres doesn't. > > I was wondering which of these are planned, which have had discussion > before and decided not to support, and which just haven't been thought of? Hm, going through the ones we don't [fully] support: 3. JOIN Elimination There's been a lot of discussion and several patches. There's a bunch of problems here, one being that there's cases (during trigger firing, before the constraint checks) where foreign keys don't hold true, so we can't quite generally make these optimization. Another concern is whether the added plan time is actually worthwhile. 4. Removing “Silly” Predicates (i.e stuff like column = column) This has deemed not to be a valuable use of plan time. I'm doubtful about that argument, but that IIRC was the concensus the last time this came up. 6. Predicate Merging (i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar with OR) I can't remember proposals about adding this, but it seems worthwhile to consider. I think we should be able to check for this without a lot of planner overhead. 7. Provably Empty Sets 8. CHECK Constraints I think some of this should actually work with constraint exclusion turned on. 9. Unneeded Self JOIN Can't remember discussions of this. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes: > On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote: >> The article in question is here: >> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/ > That's interesting. The impression I have in a quick scan is that probably hardly any of these are cases that any of the DB designers think are important in themselves. Rather, they fall out of more general optimization attempts, or not, depending on the optimization mechanisms in use in a particular DB. For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing comes out of a constant-subexpression-precalculation mechanism for us, whereas "WHERE column=column" doesn't fall to that approach. ISTM it would be really dumb to expend planner cycles looking specifically for that case, so I guess that DB2 et al are finding it as a side-effect of some more general optimization ... I wonder what that is? (edit: a few minutes later, I seem to remember that equivclass.c has to do something special with the X=X case, so maybe it could do something else special instead, with little new overhead.) > (i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar > with OR) > I can't remember proposals about adding this, but it seems worthwhile to > consider. I think we should be able to check for this without a lot of > planner overhead. It would be enormously expensive to check that in the general case with a bunch of entries in each IN list. Perhaps it would be OK to add on the presumption that few queries would contain multiple IN's on the same column in the first place, though. Not sure. > 9. Unneeded Self JOIN > Can't remember discussions of this. I can't get very excited about that one either. In the end, what the article fails to consider is that all of these are tradeoffs, not unalloyed goods. If you spend planner cycles on every query to look for cases that only the most unabashedly brain-dead ORMs ever generate, you're not really doing your users a favor on balance. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-10-06 22:19:54 -0400, Tom Lane wrote: > The impression I have in a quick scan is that probably hardly any of these > are cases that any of the DB designers think are important in > themselves. > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. Partially agreed. A comment to the article also mentions that some other database performs more optimizations depending on the cost of the plan. That's not easy to do in our current plan structure, but I think it's quite a worthwhile concept. > Rather, they fall out of more general optimization attempts, or not, > depending on the optimization mechanisms in use in a particular DB. > For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing > comes out of a constant-subexpression-precalculation mechanism for us, > whereas "WHERE column=column" doesn't fall to that approach. ISTM it > would be really dumb to expend planner cycles looking specifically for > that case, so I guess that DB2 et al are finding it as a side-effect of > some more general optimization ... I wonder what that is? > > (edit: a few minutes later, I seem to remember that equivclass.c has > to do something special with the X=X case, so maybe it could do > something else special instead, with little new overhead.) Yea, I think this should be inferrable from information we essentially already compute for equivclasses. > > (i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar > > with OR) > > > I can't remember proposals about adding this, but it seems worthwhile to > > consider. I think we should be able to check for this without a lot of > > planner overhead. > > It would be enormously expensive to check that in the general case with > a bunch of entries in each IN list. Perhaps it would be OK to add on > the presumption that few queries would contain multiple IN's on the same > column in the first place, though. Not sure. Merging of ORs should be near free if you leave duplicates in there, and the duplicates aren't a huge concern if your alternative is is to have them, but also have two clauses to evaluate. I think the IN AND IN case should usually end up a clear win as well. Both lists are going to be evaluated for each row anyway - you don't need a lot of rows where clauses are evaluated to make it worth the O(n*log(n)) of sorting the lists. Sorting them would be beneficial for other reasons as well, e.g. it improves access patterns for SAO index scans. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 07/10/17 04:19, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote: >>> The article in question is here: >>> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/ > >> That's interesting. > > The impression I have in a quick scan is that probably hardly any of these > are cases that any of the DB designers think are important in themselves. > Rather, they fall out of more general optimization attempts, or not, > depending on the optimization mechanisms in use in a particular DB. > For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing > comes out of a constant-subexpression-precalculation mechanism for us, > whereas "WHERE column=column" doesn't fall to that approach. ISTM it > would be really dumb to expend planner cycles looking specifically for > that case, so I guess that DB2 et al are finding it as a side-effect of > some more general optimization ... I wonder what that is? > > (edit: a few minutes later, I seem to remember that equivclass.c has > to do something special with the X=X case, so maybe it could do > something else special instead, with little new overhead.) > What it actually does is to specifically skip the processing for X=X (the const expression will be simplified by estimate_expression_value/eval_const_expressions separately). There is comment there that specifically states that it's not worth it to process this as it's rare clause which is equal to X IS NOT NULL. I don't actually agree with the argument of the comment there, since in practice the if the "silly" equality is not there, we'll just waste equal() call and if it is there the optimization seems worth it as it will lead to orders of magnitude better estimation in many cases. So I wrote prototype of achieving this optimization and it seems to be really quite simple code-wise (see attached). I did only a limited manual testing of this but I don't see any negative impact on planning time. Thoughts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 7 October 2017 at 15:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 9. Unneeded Self JOIN > >> Can't remember discussions of this. > > I can't get very excited about that one either. > > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. I think that final sentence lacks imagination. I've seen plenty of views being called where some column is unavailable, but the caller joins the very same table again on the primary key to add the column. There was no brain-dead ORM involved, just perhaps questionable design. This was very common in my last job where we had some rats nest of views several layers deep, the core of which often had some complex logic that nobody dared to try and replicate. It would be fairly cheap to check if any of the rtekind==RTE_RELATION joinlist items have above 1 RangeTblEntry with the same relid. The joinlist is never that big anyway, and if it was the join search would be slow. The more expensive part would be to build the join clauses, check if the expressions on either side of all OpExpr matches and that nothing else will cause a non-match, then perform the uniqueness check on those OpExpr operands. We do have some infrastructure to do the unique checks. Likely the slowdown in planning would be just for cases with a legitimately useful self-join, I doubt checking for a duplicate RangeTblEntry->relid would cause much of a concern. Anyway, this is giving me some feeling of Deja vu.. Perhaps we need some pg_stats view that shows us planning time and execution time so that we can get a better idea on how much these things matter in the average case. We tend to never fare so well in these sorts of comparisons with commercial databases. It's not hard to imagine someone with migration plans loading some rats nest of views into Postgres and taking it for a spin and finding performance is not quite what they need. It's a bit sad that often the people with the loudest voices are always so fast to stomp on the ideas for improvements. It would be much nicer if you'd at least wait for benchmarks before shooting. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: > On 07/10/17 04:19, Tom Lane wrote: >> (edit: a few minutes later, I seem to remember that equivclass.c has >> to do something special with the X=X case, so maybe it could do >> something else special instead, with little new overhead.) > So I wrote prototype of achieving this optimization and it seems to be > really quite simple code-wise (see attached). I did only a limited > manual testing of this but I don't see any negative impact on planning time. No, I'm afraid you didn't read that comment closely enough. This will flat out fail for cases like "select ... where x=x order by x", because there will already be a single-element EC for x and so the clause will just disappear entirely. If that doesn't happen, then instead you're creating an EC with duplicate entries, which is an idea I seriously dislike; the semantics of that don't seem clear at all to me. What I was imagining was that having detected X=X, instead of "throwing back" the clause as-is we could throw back an X IS NOT NULL clause, along the lines of the attached. This passes the smell test for me in the sense of not adding any significant number of planner cycles except when the weird case occurs. It leaves something on the table in that there are some situations where X=X could be converted, but we won't do it because we don't see the clause as being a potential EC (because it's not at top level), as in the second new regression test case below. I think that's probably all right; I don't see any way to be more thorough without adding a lot of new cycles all the time, and I don't believe this is worth that. regards, tom lane diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 7997f50..a225414 100644 *** a/src/backend/optimizer/path/equivclass.c --- b/src/backend/optimizer/path/equivclass.c *************** *** 27,32 **** --- 27,33 ---- #include "optimizer/paths.h" #include "optimizer/planmain.h" #include "optimizer/prep.h" + #include "optimizer/restrictinfo.h" #include "optimizer/var.h" #include "utils/lsyscache.h" *************** static bool reconsider_full_join_clause( *** 71,78 **** * any delay by an outer join, so its two sides can be considered equal * anywhere they are both computable; moreover that equality can be * extended transitively. Record this knowledge in the EquivalenceClass ! * data structure. Returns TRUE if successful, FALSE if not (in which ! * case caller should treat the clause as ordinary, not an equivalence). * * If below_outer_join is true, then the clause was found below the nullable * side of an outer join, so its sides might validly be both NULL rather than --- 72,85 ---- * any delay by an outer join, so its two sides can be considered equal * anywhere they are both computable; moreover that equality can be * extended transitively. Record this knowledge in the EquivalenceClass ! * data structure, if applicable. Returns TRUE if successful, FALSE if not ! * (in which case caller should treat the clause as ordinary, not an ! * equivalence). ! * ! * In some cases, although we cannot convert a clause into EquivalenceClass ! * knowledge, we can still modify it to a more useful form than the original. ! * Then, *p_restrictinfo will be replaced by a new RestrictInfo, which is what ! * the caller should use for further processing. * * If below_outer_join is true, then the clause was found below the nullable * side of an outer join, so its sides might validly be both NULL rather than *************** static bool reconsider_full_join_clause( *** 104,112 **** * memory context. */ bool ! process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo, bool below_outer_join) { Expr *clause = restrictinfo->clause; Oid opno, collation, --- 111,121 ---- * memory context. */ bool ! process_equivalence(PlannerInfo *root, ! RestrictInfo **p_restrictinfo, bool below_outer_join) { + RestrictInfo *restrictinfo = *p_restrictinfo; Expr *clause = restrictinfo->clause; Oid opno, collation, *************** process_equivalence(PlannerInfo *root, R *** 154,169 **** collation); /* ! * Reject clauses of the form X=X. These are not as redundant as they ! * might seem at first glance: assuming the operator is strict, this is ! * really an expensive way to write X IS NOT NULL. So we must not risk ! * just losing the clause, which would be possible if there is already a ! * single-element EquivalenceClass containing X. The case is not common ! * enough to be worth contorting the EC machinery for, so just reject the ! * clause and let it be processed as a normal restriction clause. */ if (equal(item1, item2)) ! return false; /* X=X is not a useful equivalence */ /* * If below outer join, check for strictness, else reject. --- 163,207 ---- collation); /* ! * Clauses of the form X=X cannot be translated into EquivalenceClasses. ! * We'd either end up with a single-entry EC, losing the knowledge that ! * the clause was present at all, or else make an EC with duplicate ! * entries, causing other issues. */ if (equal(item1, item2)) ! { ! /* ! * If the operator is strict, then the clause can be treated as just ! * "X IS NOT NULL". (Since we know we are considering a top-level ! * qual, we can ignore the difference between FALSE and NULL results.) ! * It's worth making the conversion because we'll typically get a much ! * better selectivity estimate than we would for X=X. ! * ! * If the operator is not strict, we can't be sure what it will do ! * with NULLs, so don't attempt to optimize it. ! */ ! set_opfuncid((OpExpr *) clause); ! if (func_strict(((OpExpr *) clause)->opfuncid)) ! { ! NullTest *ntest = makeNode(NullTest); ! ! ntest->arg = item1; ! ntest->nulltesttype = IS_NOT_NULL; ! ntest->argisrow = false; /* correct even if composite arg */ ! ntest->location = -1; ! ! *p_restrictinfo = ! make_restrictinfo((Expr *) ntest, ! restrictinfo->is_pushed_down, ! restrictinfo->outerjoin_delayed, ! restrictinfo->pseudoconstant, ! restrictinfo->security_level, ! NULL, ! restrictinfo->outer_relids, ! restrictinfo->nullable_relids); ! } ! return false; ! } /* * If below outer join, check for strictness, else reject. *************** reconsider_outer_join_clause(PlannerInfo *** 1741,1747 **** bms_copy(inner_relids), bms_copy(inner_nullable_relids), cur_ec->ec_min_security); ! if (process_equivalence(root, newrinfo, true)) match = true; } --- 1779,1785 ---- bms_copy(inner_relids), bms_copy(inner_nullable_relids), cur_ec->ec_min_security); ! if (process_equivalence(root, &newrinfo, true)) match = true; } *************** reconsider_full_join_clause(PlannerInfo *** 1884,1890 **** bms_copy(left_relids), bms_copy(left_nullable_relids), cur_ec->ec_min_security); ! if (process_equivalence(root, newrinfo, true)) matchleft = true; } eq_op = select_equality_operator(cur_ec, --- 1922,1928 ---- bms_copy(left_relids), bms_copy(left_nullable_relids), cur_ec->ec_min_security); ! if (process_equivalence(root, &newrinfo, true)) matchleft = true; } eq_op = select_equality_operator(cur_ec, *************** reconsider_full_join_clause(PlannerInfo *** 1899,1905 **** bms_copy(right_relids), bms_copy(right_nullable_relids), cur_ec->ec_min_security); ! if (process_equivalence(root, newrinfo, true)) matchright = true; } } --- 1937,1943 ---- bms_copy(right_relids), bms_copy(right_nullable_relids), cur_ec->ec_min_security); ! if (process_equivalence(root, &newrinfo, true)) matchright = true; } } diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 9931ddd..974eb58 100644 *** a/src/backend/optimizer/plan/initsplan.c --- b/src/backend/optimizer/plan/initsplan.c *************** distribute_qual_to_rels(PlannerInfo *roo *** 1964,1973 **** if (maybe_equivalence) { if (check_equivalence_delay(root, restrictinfo) && ! process_equivalence(root, restrictinfo, below_outer_join)) return; /* EC rejected it, so set left_ec/right_ec the hard way ... */ ! initialize_mergeclause_eclasses(root, restrictinfo); /* ... and fall through to distribute_restrictinfo_to_rels */ } else if (maybe_outer_join && restrictinfo->can_join) --- 1964,1974 ---- if (maybe_equivalence) { if (check_equivalence_delay(root, restrictinfo) && ! process_equivalence(root, &restrictinfo, below_outer_join)) return; /* EC rejected it, so set left_ec/right_ec the hard way ... */ ! if (restrictinfo->mergeopfamilies) /* EC might have changed this */ ! initialize_mergeclause_eclasses(root, restrictinfo); /* ... and fall through to distribute_restrictinfo_to_rels */ } else if (maybe_outer_join && restrictinfo->can_join) diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index a15eee5..ea886b6 100644 *** a/src/include/optimizer/paths.h --- b/src/include/optimizer/paths.h *************** typedef bool (*ec_matches_callback_type) *** 127,133 **** EquivalenceMember *em, void *arg); ! extern bool process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo, bool below_outer_join); extern Expr *canonicalize_ec_expression(Expr *expr, Oid req_type, Oid req_collation); --- 127,134 ---- EquivalenceMember *em, void *arg); ! extern bool process_equivalence(PlannerInfo *root, ! RestrictInfo **p_restrictinfo, bool below_outer_join); extern Expr *canonicalize_ec_expression(Expr *expr, Oid req_type, Oid req_collation); diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out index a96b2a1..c448d85 100644 *** a/src/test/regress/expected/equivclass.out --- b/src/test/regress/expected/equivclass.out *************** reset session authorization; *** 421,423 **** --- 421,441 ---- revoke select on ec0 from regress_user_ectest; revoke select on ec1 from regress_user_ectest; drop user regress_user_ectest; + -- check that X=X is converted to X IS NOT NULL when appropriate + explain (costs off) + select * from tenk1 where unique1 = unique1 and unique2 = unique2; + QUERY PLAN + ------------------------------------------------------------- + Seq Scan on tenk1 + Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL)) + (2 rows) + + -- this could be converted, but isn't at present + explain (costs off) + select * from tenk1 where unique1 = unique1 or unique2 = unique2; + QUERY PLAN + -------------------------------------------------------- + Seq Scan on tenk1 + Filter: ((unique1 = unique1) OR (unique2 = unique2)) + (2 rows) + diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql index 0e4aa0c..85aa65d 100644 *** a/src/test/regress/sql/equivclass.sql --- b/src/test/regress/sql/equivclass.sql *************** revoke select on ec0 from regress_user_e *** 254,256 **** --- 254,264 ---- revoke select on ec1 from regress_user_ectest; drop user regress_user_ectest; + + -- check that X=X is converted to X IS NOT NULL when appropriate + explain (costs off) + select * from tenk1 where unique1 = unique1 and unique2 = unique2; + + -- this could be converted, but isn't at present + explain (costs off) + select * from tenk1 where unique1 = unique1 or unique2 = unique2; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes: > It would be much nicer if you'd at least wait for benchmarks before > shooting. Benchmarks of what? We'd have to expend quite a bit of effort just to get to a place where we'd have something to benchmark. I do not think it's unreasonable of me to express an opinion that that would likely be wasted effort. If you disagree, and are willing to expend such effort speculatively, I'm not stopping you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 07/10/17 18:15, Tom Lane wrote: > Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: >> On 07/10/17 04:19, Tom Lane wrote: >>> (edit: a few minutes later, I seem to remember that equivclass.c has >>> to do something special with the X=X case, so maybe it could do >>> something else special instead, with little new overhead.) > >> So I wrote prototype of achieving this optimization and it seems to be >> really quite simple code-wise (see attached). I did only a limited >> manual testing of this but I don't see any negative impact on planning time. > > No, I'm afraid you didn't read that comment closely enough. This will > flat out fail for cases like "select ... where x=x order by x", because > there will already be a single-element EC for x and so the clause will > just disappear entirely. If that doesn't happen, then instead you're > creating an EC with duplicate entries, which is an idea I seriously > dislike; the semantics of that don't seem clear at all to me. Hmm it did not disappear (and worked fine in SQL level tests). I don't think I fully understand the "EC with duplicate entries" part and what's the problem with it so I'll defer to your wisdom there. > What I was imagining was that having detected X=X, instead of "throwing > back" the clause as-is we could throw back an X IS NOT NULL clause, > along the lines of the attached. Right, I wanted to avoid messing with the incoming result info, but if we don't want to call the code bellow or write tons of code for this, I guess it's the best we can do. > This passes the smell test for me in the sense of not adding any > significant number of planner cycles except when the weird case occurs. > It leaves something on the table in that there are some situations > where X=X could be converted, but we won't do it because we don't see > the clause as being a potential EC (because it's not at top level), > as in the second new regression test case below. I think that's > probably all right; I don't see any way to be more thorough without > adding a lot of new cycles all the time, and I don't believe this is > worth that. > My code had same issue. I think going deeper would require quite a bit of new code (and cycles) for something that's even less likely to happen than simple X=X while the current patch is quite cheap win. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: > On 07/10/17 18:15, Tom Lane wrote: >> No, I'm afraid you didn't read that comment closely enough. This will >> flat out fail for cases like "select ... where x=x order by x", because >> there will already be a single-element EC for x and so the clause will >> just disappear entirely. If that doesn't happen, then instead you're >> creating an EC with duplicate entries, which is an idea I seriously >> dislike; the semantics of that don't seem clear at all to me. > Hmm it did not disappear (and worked fine in SQL level tests). I may not be correctly remembering what the query would need to look like for there to be single-element ECs in existence at this point; but I surely would not like this code to assume that none will exist till later. > I don't > think I fully understand the "EC with duplicate entries" part and what's > the problem with it so I'll defer to your wisdom there. Well, as one example, assume that we use your patch and consider what happens withwhere x = y and x = x vswhere x = x and x = y In the first case we end up with an EC that's just {x,y}, because the second process_equivalence() will find both sides in the same EC and conclude that the second clause is redundant. (Which it is, if the equality operators have the same notion of what to do with nulls.) In the second case we end up with an EC containing {x,x,y}, which at minimum will result in emitting redundant generated quals. I'm not sure if it could have any worse consequences than that, but I'm not sure it couldn't either. But this is bogus in any case, because those WHERE clauses surely should produce identical results. Looking further ahead, if ECs have to support being multisets rather than pure sets, that would put a crimp in ever improving this logic to use a smarter UNION-FIND algorithm. (I've not yet seen queries where the number of EC members is large enough to make that a serious issue, but I think it could happen, particularly with inheritance/partitioning.) >> This passes the smell test for me in the sense of not adding any >> significant number of planner cycles except when the weird case occurs. >> It leaves something on the table in that there are some situations >> where X=X could be converted, but we won't do it because we don't see >> the clause as being a potential EC (because it's not at top level), >> as in the second new regression test case below. I think that's >> probably all right; I don't see any way to be more thorough without >> adding a lot of new cycles all the time, and I don't believe this is >> worth that. > My code had same issue. I think going deeper would require quite a bit > of new code (and cycles) for something that's even less likely to happen > than simple X=X while the current patch is quite cheap win. Yeah. I'm not really convinced it's a big win, but it's so cheap we might as well do it. The only case where it would expend cycles and fail to give an improvement is if we have X=X with a non-strict operator, which I think is a case that never arises in practice at present, because btree effectively assumes that all btree operators are strict. (Someday we might want to relax that, though, which is why I don't want to let the planner just assume it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 06, 2017 at 10:19:54PM -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote: > >> The article in question is here: > >> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/ > > > That's interesting. > > The impression I have in a quick scan is that probably hardly any of these > are cases that any of the DB designers think are important in themselves. That's true for some of those. But some of them might become important when you start pushing WHERE constraints from outside into inner table sources and subqueries, as dumb-looking constraints can simply appear from pushing non-dumb-looking constraints. More than the op optimizations would make a big difference for me: - turning subqueries into joins - turning ORs into UNIONs It is easy enough to work around the lack of this optimization in many cases, but it does make queries more verbose. - pushing WHERE constraints from outer queries into the table source queries (_including_ VIEWs) - determining that some table in a query that had WHERE constraints pushed into it... now has a very well-filled outlookup key, therefore it's the one that should be the table source to start the plan with, i.e., that it shouldbe first in the outermost loop of a nested loop join For me these two would be huge wins. I have to resort to functions with roughly the same body as views just sothat I can have the optimizer pick the correct plan. This causes a lot of code duplication in my schemas. - pushing WHERE constraints from outer queries into HAVING thence WHERE constraints on GROUP BY queries where the outerconstraints are on columns used to GROUP BY I find myself making two versions of views that do aggregation: one that does not, and one that does. This allows meto use the non-aggregating view in contexts where I need this optimization, but then I have to re-code the aggregationat that layer. Again, lots of duplication. These sorts of optimizations are huge. > Rather, they fall out of more general optimization attempts, or not, > depending on the optimization mechanisms in use in a particular DB. > For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing > comes out of a constant-subexpression-precalculation mechanism for us, > whereas "WHERE column=column" doesn't fall to that approach. ISTM it > would be really dumb to expend planner cycles looking specifically for > that case, so I guess that DB2 et al are finding it as a side-effect of > some more general optimization ... I wonder what that is? If you can reduce the number of compilations / optimization passes for statements, then spending more time in the optimizer is not a big deal. So, when invoked via PREPARE I would say spending more cycles looking for this sort of thing is OK, but in many other cases it's not. Also, sometimes these cases crop up do to pushing constraints into VIEWs and sub-queries. In those cases then constant sub-expression elimination can be a win. > (edit: a few minutes later, I seem to remember that equivclass.c has > to do something special with the X=X case, so maybe it could do > something else special instead, with little new overhead.) I'd expect that column = column is not trivial to turn into TRUE, not unless those columns are NOT NULLable. > > 9. Unneeded Self JOIN > > > Can't remember discussions of this. > > I can't get very excited about that one either. > > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. I can't get very excited about this one either, though I do believe it can arise as the author says, "when you build complex views and JOIN them to each other". Maybe I'm not excited about it because I've not needed it :) Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 07/10/17 19:59, Tom Lane wrote: > Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: >> On 07/10/17 18:15, Tom Lane wrote: >>> No, I'm afraid you didn't read that comment closely enough. This will >>> flat out fail for cases like "select ... where x=x order by x", because >>> there will already be a single-element EC for x and so the clause will >>> just disappear entirely. If that doesn't happen, then instead you're >>> creating an EC with duplicate entries, which is an idea I seriously >>> dislike; the semantics of that don't seem clear at all to me. > >> Hmm it did not disappear (and worked fine in SQL level tests). > > I may not be correctly remembering what the query would need to look > like for there to be single-element ECs in existence at this point; but > I surely would not like this code to assume that none will exist till > later. > >> I don't >> think I fully understand the "EC with duplicate entries" part and what's >> the problem with it so I'll defer to your wisdom there. > > Well, as one example, assume that we use your patch and consider what > happens with > where x = y and x = x > vs > where x = x and x = y > > In the first case we end up with an EC that's just {x,y}, because the > second process_equivalence() will find both sides in the same EC and > conclude that the second clause is redundant. (Which it is, if the > equality operators have the same notion of what to do with nulls.) > In the second case we end up with an EC containing {x,x,y}, which > at minimum will result in emitting redundant generated quals. I'm > not sure if it could have any worse consequences than that, but I'm > not sure it couldn't either. But this is bogus in any case, because > those WHERE clauses surely should produce identical results. > > Looking further ahead, if ECs have to support being multisets rather > than pure sets, that would put a crimp in ever improving this logic to > use a smarter UNION-FIND algorithm. (I've not yet seen queries where the > number of EC members is large enough to make that a serious issue, but > I think it could happen, particularly with inheritance/partitioning.) > Okay, that makes sense, thanks for explanation. Your patch is the way to go then. >>> This passes the smell test for me in the sense of not adding any >>> significant number of planner cycles except when the weird case occurs. >>> It leaves something on the table in that there are some situations >>> where X=X could be converted, but we won't do it because we don't see >>> the clause as being a potential EC (because it's not at top level), >>> as in the second new regression test case below. I think that's >>> probably all right; I don't see any way to be more thorough without >>> adding a lot of new cycles all the time, and I don't believe this is >>> worth that. > >> My code had same issue. I think going deeper would require quite a bit >> of new code (and cycles) for something that's even less likely to happen >> than simple X=X while the current patch is quite cheap win. > > Yeah. I'm not really convinced it's a big win, but it's so cheap we > might as well do it. The only case where it would expend cycles and > fail to give an improvement is if we have X=X with a non-strict operator, > which I think is a case that never arises in practice at present, > because btree effectively assumes that all btree operators are strict. > (Someday we might want to relax that, though, which is why I don't > want to let the planner just assume it.) > In real-life probably not, but seems like these small bits can be useful marketing wise, given articles like the one which started this. And it should not hurt anything either. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> I can't get very excited about this one either, though I do believe it > can arise as the author says, "when you build complex views and JOIN > them to each other". Maybe I'm not excited about it because I've not > needed it :) This is one that I know would help with my database. There is a ton of logic stored in views, which get joined to to the original table to filter the set rather than imposing that set of conditions in every separate query. It would be really nice if the optimizer could simplify those to eliminate the self join. It's almost always on the primary key of a table that the join would happen on, and if not it'd be a non-nullable column for sure. On another note: > turning ORs into UNIONs This is another one which would be incredibly useful for me. I've had to do this manually for performance reasons far too often. > Partially agreed. A comment to the article also mentions that some other > database performs more optimizations depending on the cost of the > plan. That's not easy to do in our current plan structure, but I think > it's quite a worthwhile concept. I would love to see this in Postgres. It would allow the planner to not waste cycles unnecessarily on queries where it's just not needed, and to potentially spend a few more cycles planning on very costly queries to save a ton while executing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Adam Brusselback <adambrusselback@gmail.com> writes: > On another note: >> turning ORs into UNIONs > This is another one which would be incredibly useful for me. I've had > to do this manually for performance reasons far too often. Well, maybe you could sign up to help review the open patch for that then: https://commitfest.postgresql.org/15/1001/ The reason that's not in v10 is we haven't been able to convince ourselves whether it's 100% correct. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-10-08 11:28:09 -0400, Tom Lane wrote: > Adam Brusselback <adambrusselback@gmail.com> writes: > > On another note: > >> turning ORs into UNIONs > > > This is another one which would be incredibly useful for me. I've had > > to do this manually for performance reasons far too often. > > Well, maybe you could sign up to help review the open patch for that then: > https://commitfest.postgresql.org/15/1001/ > > The reason that's not in v10 is we haven't been able to convince > ourselves whether it's 100% correct. Unfortunately it won't help in this specific case (no support for UNION, just UNION ALL), but I thought it might be interesting to reference https://medium.com/@uwdb/introducing-cosette-527898504bd6 here. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: > Okay, that makes sense, thanks for explanation. Your patch is the way to > go then. Hearing no further comment, pushed. Thanks for reviewing it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes: > On 2017-10-08 11:28:09 -0400, Tom Lane wrote: >> https://commitfest.postgresql.org/15/1001/ >> The reason that's not in v10 is we haven't been able to convince >> ourselves whether it's 100% correct. > Unfortunately it won't help in this specific case (no support for UNION, > just UNION ALL), but I thought it might be interesting to reference > https://medium.com/@uwdb/introducing-cosette-527898504bd6 > here. Huh, that is an interesting project indeed. Although I'm not sure that it quite addresses the question of whether an optimization transform is valid. IIUC, it could prove that a particular query having been fed through the transform didn't change semantics, but that offers only limited insight into whether some other query fed through the transform might change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-10-08 17:11:44 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-10-08 11:28:09 -0400, Tom Lane wrote: > >> https://commitfest.postgresql.org/15/1001/ > >> The reason that's not in v10 is we haven't been able to convince > >> ourselves whether it's 100% correct. > > > Unfortunately it won't help in this specific case (no support for UNION, > > just UNION ALL), but I thought it might be interesting to reference > > https://medium.com/@uwdb/introducing-cosette-527898504bd6 > > here. > > Huh, that is an interesting project indeed. Although I'm not sure that > it quite addresses the question of whether an optimization transform > is valid. IIUC, it could prove that a particular query having been fed > through the transform didn't change semantics, but that offers only > limited insight into whether some other query fed through the transform > might change. According to the guide it offers some support for more general transformations: http://cosette.cs.washington.edu/guide#24-symbolic-predicates That's still only going to be sufficient for some of the interesting cases, but still... Wonder about pinging them about the OR -> UNION case, they've been responsive to problem in some threads I found online. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 7 October 2017 at 14:48, Andres Freund <andres@anarazel.de> wrote: > 3. JOIN Elimination > > There's been a lot of discussion and several patches. There's a bunch of > problems here, one being that there's cases (during trigger firing, > before the constraint checks) where foreign keys don't hold true, so we > can't quite generally make these optimization. Another concern is > whether the added plan time is actually worthwhile. I looked over this and it seems there's some pretty low hanging fruit in there that we can add with just a handful of lines of new code. This is the case for LEFT JOINs with a DISTINCT clause. Normally we can only remove an unused LEFT JOINed relation if there are some unique properties that ensure that the join does not duplicate any outer row. We don't need to worry about this when there's a DISTINCT clause, as the DISTINCT would remove any duplicates anyway. If I'm not mistaken, we also don't need to bother looking at the actual distinct clause's exprs since we'll already know that nothing is in there regarding the relation being removed. The benefit to this could be two-fold, as 1) we don't need to join to the unused relation and 2) we don't need to remove any row duplication caused by the join. While someone might argue that this is not going to be that common a case to hit, if we consider how cheap this is to implement, it does seem to be worth doing a couple of NULL checks in the planner for it. The only slight downside I can see to this is that we're letting a few more cases through rel_supports_distinctness() which is also used for unique joins, and these proofs are not valid in those. However, it may not be worth the trouble doing anything about that as relations without unique indexes are pretty uncommon (at least in my world). Thoughts? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 9 October 2017 at 17:41, David Rowley <david.rowley@2ndquadrant.com> wrote: > Thoughts? Actually, I was a little inconsistent with my List NULL/NIL checks in that last one. I've attached an updated patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sun, Oct 8, 2017 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: >> Okay, that makes sense, thanks for explanation. Your patch is the way to >> go then. > > Hearing no further comment, pushed. Thanks for reviewing it. The tautological col = col comparison on is occasionally used as a planner "hint" to correct for row count overestimates. Not a great solution, but PostgreSQL doesn't really have a better way to guide the planner. Those queries will now have to do something else, like col = col + 0, which still works. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 10:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 9. Unneeded Self JOIN > >> Can't remember discussions of this. > > I can't get very excited about that one either. My memories of being a PostgreSQL user rather than a developer are getting a bit distant by now, but I definitely remember hitting this problem especially in cases involving DELETE FROM and UPDATE USING. You can't specify a left join between the result relation and some other table directly, so you end up having to do DELETE FROM x USING x LEFT JOIN y ... which then has lousy performance. I think you could hit it in other cases, too, e.g. a join between two views which share a common base table. Of course maybe you wouldn't have two views in the first place if join removal worked better than it currently does, but without, at least, inner join removal you can't really rely on the query planner to prune things down to what's really needed. > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. I think what you're failing to consider is that the author of the article is a user. When he says these optimizations are cool, he means that they would benefit his use cases (and presumably that he's willing to pay some number of additional cycles to get them). We haven't really done a very good job figuring out what to do about optimizations that some people (mostly you) think are marginal. It's obviously true that we can't just burn infinite planner cycles on things that don't usually help, but at the same time, we can't just keep ignoring requests by users for the same features and saying "you'll be sad if we give this to you". Those people don't give up on wanting the optimization; they just don't use PostgreSQL. I think we need to stop just saying "no" and start thinking about what we could do that would let us say "yes". One trick that some system use is avoid replanning as much as we do by, for example, saving plans in a shared cache and reusing them even in other sessions. That's hard to do in our architecture because the controlling GUCs can be different in every session and there's not even any explicit labeling of which GUCs control planner behavior. But if you had it, then extra planning cycles would be, perhaps, more tolerable. Another trick is to plan harder when the cost or complexity of the query exceeds some threshold, but that's hard to do because we don't really know what the cost or complexity is until after we get done planning, by which time it's too late to (cheaply) go back and apply those optimizations. This problem of not really knowing whether we've got an expensive query is also a problem in terms of being smart about how many parallel workers to pick or whether to consider parallelism at all; we'd like to consider more options for expensive queries than cheap ones. But there's not really any good way to do this as things stand today. Maybe one way to attack this problem is to make it more general: what's the next big thing for the planner? It wasn't altogether clear to me that the path->plan conversion was going to be a huge improvement, but it definitely has been. Nothing really got much faster as a direct result of that work, but it made the code simpler and more understandable and thus easier to enhance, unblocking improvements like postgres_fdw aggregate pushdown. I think we're a long way from reaching the end of the benefits of that commit - there is a lot more that can usefully be done. But, what's next? You told me a few years back when I was thinking about what to do next that there was still a lot of gains to be had from improving the planner, and I wasn't sure I quite believed it. But it's becoming more clear to me that this is true, and that the kinds of things this article is talking about are one approach. I don't really have the answers here, but I don't think our planner is anywhere close to as good as it can be, even if in certain respects it is stuck at some kind of local maximum. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote: > One trick that some system use is avoid replanning as much as we do > by, for example, saving plans in a shared cache and reusing them even > in other sessions. That's hard to do in our architecture because the > controlling GUCs can be different in every session and there's not > even any explicit labeling of which GUCs control planner behavior. But > if you had it, then extra planning cycles would be, perhaps, more > tolerable. From my experience with Oracle I would say that that is a can of worms. Perhaps it really brings the performance benefits they claim, but a) there have been a number of bugs where the wrong plan got used (you have to keep several plans for the same statementaround, since - as you say - different sessions have different environments) b) it is a frequent problem that this shared memory area grows too large if the application does not use prepared statements but dynamic SQL with varying constants. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote: > On 2017-10-08 11:28:09 -0400, Tom Lane wrote: > > Adam Brusselback <adambrusselback@gmail.com> writes: > > > On another note: > > >> turning ORs into UNIONs > > > > > This is another one which would be incredibly useful for me. I've had > > > to do this manually for performance reasons far too often. > > > > Well, maybe you could sign up to help review the open patch for that then: > > https://commitfest.postgresql.org/15/1001/ > > > > The reason that's not in v10 is we haven't been able to convince > > ourselves whether it's 100% correct. > > Unfortunately it won't help in this specific case (no support for UNION, > just UNION ALL), but I thought it might be interesting to reference > https://medium.com/@uwdb/introducing-cosette-527898504bd6 > here. Interesting. I thought about a completely different approach -- use a fuzzer, which runs each generated query on two servers, one patched one not, and compare the results. Would it be possible to tweak sqlsmith to do this? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Laurenz Albe <laurenz.albe@cybertec.at> writes: > Robert Haas wrote: >> One trick that some system use is avoid replanning as much as we do >> by, for example, saving plans in a shared cache and reusing them even >> in other sessions. That's hard to do in our architecture because the >> controlling GUCs can be different in every session and there's not >> even any explicit labeling of which GUCs control planner behavior. But >> if you had it, then extra planning cycles would be, perhaps, more >> tolerable. > From my experience with Oracle I would say that that is a can of worms. Yeah, I'm pretty suspicious of the idea too. We've had an awful lot of bad experience with local plan caching, to the point where people wonder why we don't just auto-replan every time. How would a shared cache make that better? (Even assuming it was otherwise free, which it surely won't be.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 12, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> From my experience with Oracle I would say that that is a can of worms. > > Yeah, I'm pretty suspicious of the idea too. We've had an awful lot of > bad experience with local plan caching, to the point where people wonder > why we don't just auto-replan every time. How would a shared cache > make that better? (Even assuming it was otherwise free, which it > surely won't be.) Obviously it wouldn't. But it might make other things better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 13 October 2017 at 03:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: >> Robert Haas wrote: >>> One trick that some system use is avoid replanning as much as we do >>> by, for example, saving plans in a shared cache and reusing them even >>> in other sessions. That's hard to do in our architecture because the >>> controlling GUCs can be different in every session and there's not >>> even any explicit labeling of which GUCs control planner behavior. But >>> if you had it, then extra planning cycles would be, perhaps, more >>> tolerable. > >> From my experience with Oracle I would say that that is a can of worms. > > Yeah, I'm pretty suspicious of the idea too. We've had an awful lot of > bad experience with local plan caching, to the point where people wonder > why we don't just auto-replan every time. How would a shared cache > make that better? (Even assuming it was otherwise free, which it > surely won't be.) One idea I had when working on unique joins was that we could use it to determine if a plan could only return 0 or 1 row by additionally testing baserestrictinfo of each rel to see if an equality OpExpr with a const Const matches a unique constraint which would serve as a guarantee that only 0 or 1 row would match. I thought that these single row plans could be useful as a start for some pro-active plan cache. The plan here should be stable as they're not prone to any data skew that could cause a plan change. You might think it would be very few plans would fit the bill, but you'd likely find that the majority of UPDATE and DELETE queries run in an OTLP application would be eligible, and likely some decent percentage of SELECTs too. I had imagined this would be some backend local cache that saved MRU plans up to some size of memory defined by a GUC, where 0 would disable the feature. I never got any further than those thoughts. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Laurenz, * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > Robert Haas wrote: > > One trick that some system use is avoid replanning as much as we do > > by, for example, saving plans in a shared cache and reusing them even > > in other sessions. That's hard to do in our architecture because the > > controlling GUCs can be different in every session and there's not > > even any explicit labeling of which GUCs control planner behavior. But > > if you had it, then extra planning cycles would be, perhaps, more > > tolerable. > > >From my experience with Oracle I would say that that is a can of worms. > > Perhaps it really brings the performance benefits they claim, but > a) there have been a number of bugs where the wrong plan got used > (you have to keep several plans for the same statement around, > since - as you say - different sessions have different environments) I'm not sure this is really a fair strike against this concept- bugs happen, even bugs in planning, and what we need is more testing, imv, to reduce the number and minimize the risk as much as we can. > b) it is a frequent problem that this shared memory area grows > too large if the application does not use prepared statements > but dynamic SQL with varying constants. This just requires that the memory area be managed somehow, not unlike how our shared buffers have to deal with evicting out old pages. There's a challenge there around making sure that it doesn't make the performance of the system be much worse than not having a cache at all, naturally, but given that a lot of people use pg_stat_statements to good effect and without much in the way of complaints, it seems like it might be possible make it work reasonably (just imagining a generic plan being attached to pg_stat_statements with some information about if the generic plan works well or not, blah blah, hand waving goes here). Thanks! Stephen
Stephen Frost wrote: > * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > > Robert Haas wrote: > > > One trick that some system use is avoid replanning as much as we do > > > by, for example, saving plans in a shared cache and reusing them even > > > in other sessions. That's hard to do in our architecture because the > > > controlling GUCs can be different in every session and there's not > > > even any explicit labeling of which GUCs control planner behavior. But > > > if you had it, then extra planning cycles would be, perhaps, more > > > tolerable. > > > From my experience with Oracle I would say that that is a can of worms. > > > > Perhaps it really brings the performance benefits they claim, but > > a) there have been a number of bugs where the wrong plan got used > > (you have to keep several plans for the same statement around, > > since - as you say - different sessions have different environments) > > I'm not sure this is really a fair strike against this concept- bugs > happen, even bugs in planning, and what we need is more testing, imv, to > reduce the number and minimize the risk as much as we can. Right, I guess I didn't express my concern properly. Bugs can certainly happen, but if a certain feature is particularly rich in them, I take it as an indication that it is something difficult to get right. > > b) it is a frequent problem that this shared memory area grows > > too large if the application does not use prepared statements > > but dynamic SQL with varying constants. > > This just requires that the memory area be managed somehow, not unlike > how our shared buffers have to deal with evicting out old pages. > There's a challenge there around making sure that it doesn't make the > performance of the system be much worse than not having a cache at all, > naturally, but given that a lot of people use pg_stat_statements to good > effect and without much in the way of complaints, it seems like it might > be possible make it work reasonably (just imagining a generic plan being > attached to pg_stat_statements with some information about if the > generic plan works well or not, blah blah, hand waving goes here). pg_stat_statements is quite different, since it ignores constants. I don't know how often I have seen the advice to use dynamic SQL because a generic plan was not so great, and in OLAP you usually want each individual query to be planned. So I think it is important that a plan is only reused if it matches the query exactly. Oracle has an option CURSOR_SHARING = FORCE to reuse plans even if they were planned with different constants, but their own documentation warns against using it: http://docs.oracle.com/database/122/TGSQL/improving-rwp-cursor-sharing. htm#GUID-7FF4E133-06A7-401E-9BFC-3B0B9C902346 Maybe it is an idea to only cache generic plans in a shared area? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
So from following this discussion and others focused on making the planner "smarter", there is always an argument to be had over wasting planner cycles, and it's always a hard fought battle to get any changes made. Now, i'm speaking without any knowledge of the Postgres internals, so please bear with me. It seems like the underlying "feature" which would allow more edge case optimizations into the planner would be some framework for knowing when to consider using those optimizations, and when to not waste time planning and skip expensive planning optimizations. It seems with a system like that, maybe Postgres could still maintain (or even improve) it's hard fought quick planning time for OLTP queries, while being able to spend more time planning for OLAP queries where spending an extra 500ms planning may save minutes of execution time. I know in my database, I have analytical queries which I wouldn't mind spending multiple seconds planning if necessary to produce an optimal plan, because execution time dominates everything. Now what that system looks like is something I have no real opinion or authority on. It seems like there are three options though, automatic, manual, and automatic with manual override. Two of those, manual / automatic with manual override seem to be a little too close to query hints for them to be considered from all previous discussions I've heard (correct me if i'm wrong). So that leaves automatic as the only option I can see being considered viable. Now comes the question of if it's possible to automatically classify queries in such a way that we can cheaply know what optimizations we should attempt, and what should not even be considered because the planning time would dominate. I'll leave my thoughts at that, and await comments from the smarter people in the room who know exactly why this idea wouldn't work. The general thought behind it is to make it easier for more and more query optimizations to make it into the planner, while not hurting queries which shouldn't spend their time on it. If there are better ways of going about it, great! Thanks for listening to me ramble, -Adam -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 12 October 2017 at 04:50, Robert Haas <robertmhaas@gmail.com> wrote: > We haven't really done a very good job figuring out what to do about > optimizations that some people (mostly you) think are marginal. It's > obviously true that we can't just burn infinite planner cycles on > things that don't usually help, but at the same time, we can't just > keep ignoring requests by users for the same features and saying > "you'll be sad if we give this to you". Those people don't give up on > wanting the optimization; they just don't use PostgreSQL. I think we > need to stop just saying "no" and start thinking about what we could > do that would let us say "yes". I'm with Robert on this. Planning time *is* important, but if we were to do a quick tally on the number of patches that we've seen in the last few years to improve execution time by either improving the executor code or adding some smarts to the planner to reduce executor work vs patches aimed to reduce planning time, I think we'd find the general focus is on the executor. Personally, I don't recall a single patch aimed to just speed up the planner. Perhaps I missed some? It looks like there's plenty we could do in there, just nobody seems interested enough to go and do it, everyone who cares about performance is too busy trying to make execution run faster. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi, On 2017-10-14 10:38:13 +1300, David Rowley wrote: > On 12 October 2017 at 04:50, Robert Haas <robertmhaas@gmail.com> wrote: > > We haven't really done a very good job figuring out what to do about > > optimizations that some people (mostly you) think are marginal. It's > > obviously true that we can't just burn infinite planner cycles on > > things that don't usually help, but at the same time, we can't just > > keep ignoring requests by users for the same features and saying > > "you'll be sad if we give this to you". Those people don't give up on > > wanting the optimization; they just don't use PostgreSQL. I think we > > need to stop just saying "no" and start thinking about what we could > > do that would let us say "yes". +1 to the general sentiment. > I'm with Robert on this. Planning time *is* important, but if we were > to do a quick tally on the number of patches that we've seen in the > last few years to improve execution time by either improving the > executor code or adding some smarts to the planner to reduce executor > work vs patches aimed to reduce planning time, I think we'd find the > general focus is on the executor. That's true. But I think that's partially because people benchmarking with the goal to identify bottlenecks just habitually use prepared statements to remove planning overhead. That works well enough in benchmarking scenarios, but in practice I think that's less clearly a good tradeoff - e.g in OLTP workloads you'll often see custom plans defeating the intent of using prepared statements. > Personally, I don't recall a single patch aimed to just speed up the > planner. There were some a couple years back. > It looks like there's plenty we could do in there, just nobody seems > interested enough to go and do it, everyone who cares about > performance is too busy trying to make execution run faster. I think that's largely because our executor speed is quite bad. To some degree because we paid much less attention to seed degradations there for a couple years, absurdly enough. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera writes: > Andres Freund wrote: >> Unfortunately it won't help in this specific case (no support for UNION, >> just UNION ALL), but I thought it might be interesting to reference >> https://medium.com/@uwdb/introducing-cosette-527898504bd6 >> here. > > Interesting. I thought about a completely different approach -- use a > fuzzer, which runs each generated query on two servers, one patched one > not, and compare the results. Would it be possible to tweak sqlsmith to > do this? I think the tweaking needed would be: 1. Log successful queries as well along with a result description. Maybe logging returned/affected rows is sufficent. 2. Make it avoid nondeterministic things such as joining pg_stat_activity or calling non-immutable functions The second one is a bit harder and I can't think of a more elegant solution than adding a blacklisting/whitelisting feature and let the user do the hard work… If these are solved though, one could make multiple runs with the same random seed and query the logging database for differences in the result descriptions. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers