Thread: nested loop semijoin estimates
Hi, while looking at this post from pgsql-performance about plan changes http://www.postgresql.org/message-id/flat/20150529095117.GB15813@hjp.at I noticed that initial_cost_nestloop() does this in (9.1, mentioned in the pgsql-performance post uses the same logic): if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { double outer_matched_rows; Selectivity inner_scan_frac; run_cost += inner_run_cost; outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac); inner_scan_frac = 2.0 / (semifactors->match_count + 1.0); if (outer_matched_rows > 1) run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac; ... } I wonder whether the run_cost += inner_run_cost; is actually correct, because this pretty much means we assume scanning the whole inner relation (once). Wouldn't something like this be more appropriate? run_cost += inner_run_cost * inner_scan_frac; i.e. only counting the proportional part of the inner run cost, just like we do for the rescans (i.e. until the first match)? Imagine a simple EXISTS() query that gets turned into a semijoin, with an inner index scan. The current code pretty much means we'll scan the whole result, even though we only really need a single matching query (to confirm the EXISTS clause). For cheap index scans (e.g. using a PK) this does not make much difference, but the example posted to pgsql-performance results in index scans matching ~50% of the table, which makes the whole index scan quite expensive, and much higher than the rescan cost (multiplied by inner_scan_frac). While investigating the pgsql-performance report I've prepared the attached testcase, producing similar data set (attached). So let's see how the code change impacts plan choice. With the current code, the planner produces a plan like this: QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=169627.96..169636.03 rows=1 width=74) Join Filter: ((t.term)::text = (f.berechnungsart)::text) -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) -> HashAggregate (cost=169627.41..169627.43 rows=2 width=2) Group Key: (f.berechnungsart)::text -> Seq Scan on facttable_stat_fta4 f (cost=0.00..145274.93 rows=9740993 width=2) which seems slightly inefficient, as ~50% of the facttable_stat_fta4 matches the condition (so building the whole hash is a waste of time). Also notice this is a regular join, not a semijoin - that seems to confirm the planner does not realize the cost difference, believes both plans (regular and semijoin) are equally expensive and simply uses the first one. With the run_cost change, I do get this plan: QUERY PLAN ----------------------------------------------------------------------- Nested Loop Semi Join (cost=111615.33..111623.42 rows=1 width=74) -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) -> Bitmap Heap Scan on facttable_stat_fta4 f (cost=111614.78..220360.98 rows=4870496 width=2) Recheck Cond: ((berechnungsart)::text = (t.term)::text) -> Bitmap Index Scan on facttable_stat_fta4_berechnungsart_idx (cost=0.00..110397.15 rows=4870496 width=0) Index Cond: ((berechnungsart)::text = (t.term)::text) and this runs about ~3x faster than the original plan (700ms vs. 2000ms), at least when using the testcase dataset on my laptop. This however is not the whole story, because after disabling the bitmap index scan, I do get this plan, running in ~1ms (so ~700x faster than the bitmap index scan): QUERY PLAN ------------------------------------------------------------------------ Nested Loop Semi Join (cost=0.99..9.20 rows=1 width=74) -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) -> Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..310525.02 rows=4870496 width=2) Index Cond: (berechnungsart = (t.term)::text) Notice the cost - it's way lover than the previous plan (9.2 vs ~111k), yet this plan was not chosen. So either the change broke something (e.g. by violating some optimizer assumption), or maybe there's a bug somewhere else ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 05/30/15 01:20, Tomas Vondra wrote: > > Notice the cost - it's way lover than the previous plan (9.2 vs > ~111k), yet this plan was not chosen. So either the change broke > something (e.g. by violating some optimizer assumption), or maybe > there's a bug somewhere else ... After a bit more investigation, what I think is happening here is add_path() does not realize this is a SEMI join and a single tuple is enough, and discards the simple indexscan path in favor of the bitmap index scan as that seems cheaper when scanning everything. So not really a bug, but maybe 'consider_startup' would help? -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/30/15 03:52, Tomas Vondra wrote: > > > On 05/30/15 01:20, Tomas Vondra wrote: >> >> Notice the cost - it's way lover than the previous plan (9.2 vs >> ~111k), yet this plan was not chosen. So either the change broke >> something (e.g. by violating some optimizer assumption), or maybe >> there's a bug somewhere else ... > > After a bit more investigation, what I think is happening here is > add_path() does not realize this is a SEMI join and a single tuple is > enough, and discards the simple indexscan path in favor of the bitmap > index scan as that seems cheaper when scanning everything. > > So not really a bug, but maybe 'consider_startup' would help? A bit more info - the reason why the IndexPath gets discarded (in favor of BitmapHeapScan) seems to be twofold: (a) Semijoin does no imply 'consider_startup' for the inner path, so it gets discarded by the BitmapHeapScan, as it has lower total cost. I'm not sure where exactly the consider_startup should be set, so I've forced that at the beginning on add_path() to see what happens. Sadly, this alone is not sufficient, because of the next point. (b) To actually use startup costs in compare_path_costs_fuzzily, there is an additional condition about handling parameterized paths, as explained in this comment: * This function also includes special hacks to support a policy * enforced by its sole caller, add_path(): paths that have any * parameterization cannot win comparisons on the grounds of having * cheaper startup cost, since we deem only total cost to be of * interest for a parameterized path. (Unparameterized paths are * more common, so we check for this case last.) so the startup_cost comparisons look like this: if (consider_startup && path2->startup_cost > path1->startup_cost * fuzz_factor && path1->param_info == NULL) Of course, path1 (IndexPath) actually has parameters, which makes the whole condition false, and the IndexPath gets discarded :-( ISTM the reported test case seems like a counter example to the policy, so maybe this should really be relaxed somehow. To see what happens, I tweaked both (a) and (b) by forcing consider_startup = true at the beginning of add_path(), and removing the (param_info==NULL) conditions (patch attached). It also includes the run_cost tweak mentioned in the first message. I'm not claiming this is the correct fix, the only goal was to see what happens. And it really does produce the 'right' plan: QUERY PLAN ----------------------------------------------------------------------- Nested Loop Semi Join (cost=0.99..9.20 rows=1 width=74) -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) -> Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..310525.02 rows=4870496 width=2) Index Cond: (berechnungsart = (t.term)::text) (5 rows) If I get the comments in pathnode.c about the startup_cost policy, it's "only" an optimization to reduce the number of parameterized paths, so I believe this does not break anything. Also, it's no fun if the planning is a tad faster, but the query runtime is 1000x higher. Ideas how to relax the policy without significant impact on optimizer? -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I wonder whether the > run_cost += inner_run_cost; > is actually correct, because this pretty much means we assume scanning > the whole inner relation (once). Wouldn't something like this be more > appropriate? > run_cost += inner_run_cost * inner_scan_frac; Well, not entirely. The reason why it's like that is explained (evidently not well enough) in the comment just above: * A complicating factor is that rescans may be cheaper than first * scans. If we never scan all the way tothe end of the inner rel, * it might be (depending on the plan type) that we'd never pay the * whole innerfirst-scan run cost. However it is difficult to * estimate whether that will happen, so be conservative andalways * charge the whole first-scan cost once. The case this is concerned about is something like a Materialize node above a scan node. The Materialize node makes rescans much cheaper than the original scan, but only once the data has been fetched to begin with. So, while the first successful probe should indeed have cost something like inner_run_cost * inner_scan_frac, once we make this change it's no longer legitimate for final_cost_nestloop to suppose that the cost of an unsuccessful probe is just inner_rescan_run_cost. Any unsuccessful probe will mean we must incur all of inner_run_cost in order to complete the underlying scan and fully load the Materialize node. However, in the case where has_indexed_join_quals() is true, I think your change is correct, because then all the scans will either stop on the first tuple or quickly determine that no tuples satisfy the indexquals. (Also, in the cases where has_indexed_join_quals() can succeed, rescan cost is the same as initial scan cost anyhow.) So what this seems to mean is that for SEMI/ANTI join cases, we have to postpone all of the inner scan cost determination to final_cost_nestloop, so that we can do this differently depending on whether has_indexed_join_quals() is true. That's a little bit annoying because it will mean we take the shortcut exit less often; but since SEMI/ANTI joins aren't that common, it's probably not going to be a big planning time hit. Not sure yet about your other point about the indexscan getting rejected too soon. That doesn't seem to be happening for me, at least not in HEAD. regards, tom lane
On 05/30/15 21:50, Tom Lane wrote: > So what this seems to mean is that for SEMI/ANTI join cases, we have > to postpone all of the inner scan cost determination to > final_cost_nestloop, so that we can do this differently depending on > whether has_indexed_join_quals() is true. That's a little bit > annoying because it will mean we take the shortcut exit less often; > but since SEMI/ANTI joins aren't that common, it's probably not > going to be a big planning time hit. Yay! So I wasn't entirely wrong on this. Do you plan to make this change, or will you leave that on me? > > Not sure yet about your other point about the indexscan getting > rejected too soon. That doesn't seem to be happening for me, at least > not in HEAD. FWIW I can reproduce that reliably on current HEAD, with the test case I sent yesterday. I'm using default config (as produced by initdb). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I wrote: > So what this seems to mean is that for SEMI/ANTI join cases, we have to > postpone all of the inner scan cost determination to final_cost_nestloop, > so that we can do this differently depending on whether > has_indexed_join_quals() is true. That's a little bit annoying because it > will mean we take the shortcut exit less often; but since SEMI/ANTI joins > aren't that common, it's probably not going to be a big planning time hit. Attached is a draft patch for that. It fixes the problem for me: Nested Loop Semi Join (cost=0.99..9.09 rows=1 width=74) (actual time=0.591..1.554 rows=2 loops=1) -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) (actual time=0.022..0.025rows=2 loops=1) Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) -> Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..143244.98 rows=5015134width=2) (actual time=0.759..0.759 rows=1 loops=2) Index Cond: (berechnungsart = (t.term)::text) Heap Fetches: 0 Planning time: 0.545 ms Execution time: 1.615 ms > Not sure yet about your other point about the indexscan getting rejected > too soon. That doesn't seem to be happening for me, at least not in HEAD. I do see something of the sort if I turn off enable_indexonlyscan. Not sure about a good fix for that aspect. It may not be terribly critical, since AFAICS index-only scans generally ought to apply in these cases. regards, tom lane diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index ac865be..0d302f6 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *************** cost_group(Path *path, PlannerInfo *root *** 1767,1773 **** * estimate and getting a tight lower bound. We choose to not examine the * join quals here, since that's by far the most expensive part of the * calculations. The end result is that CPU-cost considerations must be ! * left for the second phase. * * 'workspace' is to be filled with startup_cost, total_cost, and perhaps * other data to be used by final_cost_nestloop --- 1767,1774 ---- * estimate and getting a tight lower bound. We choose to not examine the * join quals here, since that's by far the most expensive part of the * calculations. The end result is that CPU-cost considerations must be ! * left for the second phase; and for SEMI/ANTI joins, we must also postpone ! * incorporation of the inner path's run cost. * * 'workspace' is to be filled with startup_cost, total_cost, and perhaps * other data to be used by final_cost_nestloop *************** initial_cost_nestloop(PlannerInfo *root, *** 1815,1858 **** if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { - double outer_matched_rows; - Selectivity inner_scan_frac; - /* * SEMI or ANTI join: executor will stop after first match. * ! * For an outer-rel row that has at least one match, we can expect the ! * inner scan to stop after a fraction 1/(match_count+1) of the inner ! * rows, if the matches are evenly distributed. Since they probably ! * aren't quite evenly distributed, we apply a fuzz factor of 2.0 to ! * that fraction. (If we used a larger fuzz factor, we'd have to ! * clamp inner_scan_frac to at most 1.0; but since match_count is at ! * least 1, no such clamp is needed now.) ! * ! * A complicating factor is that rescans may be cheaper than first ! * scans. If we never scan all the way to the end of the inner rel, ! * it might be (depending on the plan type) that we'd never pay the ! * whole inner first-scan run cost. However it is difficult to ! * estimate whether that will happen, so be conservative and always ! * charge the whole first-scan cost once. ! */ ! run_cost += inner_run_cost; ! ! outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac); ! inner_scan_frac = 2.0 / (semifactors->match_count + 1.0); ! ! /* Add inner run cost for additional outer tuples having matches */ ! if (outer_matched_rows > 1) ! run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac; ! ! /* ! * The cost of processing unmatched rows varies depending on the ! * details of the joinclauses, so we leave that part for later. */ /* Save private data for final_cost_nestloop */ ! workspace->outer_matched_rows = outer_matched_rows; ! workspace->inner_scan_frac = inner_scan_frac; } else { --- 1816,1831 ---- if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { /* * SEMI or ANTI join: executor will stop after first match. * ! * Getting decent estimates requires inspection of the join quals, ! * which we choose to postpone to final_cost_nestloop. */ /* Save private data for final_cost_nestloop */ ! workspace->inner_run_cost = inner_run_cost; ! workspace->inner_rescan_run_cost = inner_rescan_run_cost; } else { *************** initial_cost_nestloop(PlannerInfo *root, *** 1869,1875 **** workspace->total_cost = startup_cost + run_cost; /* Save private data for final_cost_nestloop */ workspace->run_cost = run_cost; - workspace->inner_rescan_run_cost = inner_rescan_run_cost; } /* --- 1842,1847 ---- *************** final_cost_nestloop(PlannerInfo *root, N *** 1893,1899 **** double inner_path_rows = inner_path->rows; Cost startup_cost = workspace->startup_cost; Cost run_cost = workspace->run_cost; - Cost inner_rescan_run_cost = workspace->inner_rescan_run_cost; Cost cpu_per_tuple; QualCost restrict_qual_cost; double ntuples; --- 1865,1870 ---- *************** final_cost_nestloop(PlannerInfo *root, N *** 1912,1953 **** if (!enable_nestloop) startup_cost += disable_cost; ! /* cost of source data */ if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI) { - double outer_matched_rows = workspace->outer_matched_rows; - Selectivity inner_scan_frac = workspace->inner_scan_frac; - /* * SEMI or ANTI join: executor will stop after first match. */ ! /* Compute number of tuples processed (not number emitted!) */ ntuples = outer_matched_rows * inner_path_rows * inner_scan_frac; /* ! * For unmatched outer-rel rows, there are two cases. If the inner ! * path is an indexscan using all the joinquals as indexquals, then an ! * unmatched row results in an indexscan returning no rows, which is ! * probably quite cheap. We estimate this case as the same cost to ! * return the first tuple of a nonempty scan. Otherwise, the executor ! * will have to scan the whole inner rel; not so cheap. */ if (has_indexed_join_quals(path)) { run_cost += (outer_path_rows - outer_matched_rows) * inner_rescan_run_cost / inner_path_rows; /* ! * We won't be evaluating any quals at all for these rows, so * don't add them to ntuples. */ } else { run_cost += (outer_path_rows - outer_matched_rows) * inner_rescan_run_cost; ntuples += (outer_path_rows - outer_matched_rows) * inner_path_rows; } --- 1883,1983 ---- if (!enable_nestloop) startup_cost += disable_cost; ! /* cost of inner-relation source data (we already dealt with outer rel) */ if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI) { /* * SEMI or ANTI join: executor will stop after first match. */ + Cost inner_run_cost = workspace->inner_run_cost; + Cost inner_rescan_run_cost = workspace->inner_rescan_run_cost; + double outer_matched_rows; + Selectivity inner_scan_frac; ! /* ! * For an outer-rel row that has at least one match, we can expect the ! * inner scan to stop after a fraction 1/(match_count+1) of the inner ! * rows, if the matches are evenly distributed. Since they probably ! * aren't quite evenly distributed, we apply a fuzz factor of 2.0 to ! * that fraction. (If we used a larger fuzz factor, we'd have to ! * clamp inner_scan_frac to at most 1.0; but since match_count is at ! * least 1, no such clamp is needed now.) ! */ ! outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac); ! inner_scan_frac = 2.0 / (semifactors->match_count + 1.0); ! ! /* ! * Compute number of tuples processed (not number emitted!). First, ! * account for successfully-matched outer rows. ! */ ntuples = outer_matched_rows * inner_path_rows * inner_scan_frac; /* ! * Now we need to estimate the actual costs of scanning the inner ! * relation, which may be quite a bit less than N times inner_run_cost ! * due to early scan stops. We consider two cases. If the inner path ! * is an indexscan using all the joinquals as indexquals, then an ! * unmatched outer row results in an indexscan returning no rows, ! * which is probably quite cheap. Otherwise, the executor will have ! * to scan the whole inner rel for an unmatched row; not so cheap. */ if (has_indexed_join_quals(path)) { + /* + * Successfully-matched outer rows will only require scanning + * inner_scan_frac of the inner relation. In this case, we don't + * need to charge the full inner_run_cost even when that's more + * than inner_rescan_run_cost, because we can assume that none of + * the inner scans ever scan the whole inner relation. So it's + * okay to assume that all the inner scan executions can be + * fractions of the full cost, even if materialization is reducing + * the rescan cost. At this writing, it's impossible to get here + * for a materialized inner scan, so inner_run_cost and + * inner_rescan_run_cost will be the same anyway; but just in + * case, use inner_run_cost for the first matched tuple and + * inner_rescan_run_cost for additional ones. + */ + run_cost += inner_run_cost * inner_scan_frac; + if (outer_matched_rows > 1) + run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac; + + /* + * Add the cost of inner-scan executions for unmatched outer rows. + * We estimate this as the same cost as returning the first tuple + * of a nonempty scan. We consider that these are all rescans, + * since we used inner_run_cost once already. + */ run_cost += (outer_path_rows - outer_matched_rows) * inner_rescan_run_cost / inner_path_rows; /* ! * We won't be evaluating any quals at all for unmatched rows, so * don't add them to ntuples. */ } else { + /* + * Here, a complicating factor is that rescans may be cheaper than + * first scans. If we never scan all the way to the end of the + * inner rel, it might be (depending on the plan type) that we'd + * never pay the whole inner first-scan run cost. However it is + * difficult to estimate whether that will happen (and it could + * not happen if there are any unmatched outer rows!), so be + * conservative and always charge the whole first-scan cost once. + */ + run_cost += inner_run_cost; + + /* Add inner run cost for additional outer tuples having matches */ + if (outer_matched_rows > 1) + run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac; + + /* Add inner run cost for unmatched outer tuples */ run_cost += (outer_path_rows - outer_matched_rows) * inner_rescan_run_cost; + + /* And count the unmatched join tuples as being processed */ ntuples += (outer_path_rows - outer_matched_rows) * inner_path_rows; } diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 279051e..706b6d1 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** typedef struct JoinCostWorkspace *** 1719,1730 **** Cost run_cost; /* non-startup cost components */ /* private for cost_nestloop code */ Cost inner_rescan_run_cost; - double outer_matched_rows; - Selectivity inner_scan_frac; /* private for cost_mergejoin code */ - Cost inner_run_cost; double outer_rows; double inner_rows; double outer_skip_rows; --- 1719,1728 ---- Cost run_cost; /* non-startup cost components */ /* private for cost_nestloop code */ + Cost inner_run_cost; /* also used by cost_mergejoin code */ Cost inner_rescan_run_cost; /* private for cost_mergejoin code */ double outer_rows; double inner_rows; double outer_skip_rows;
Hi, On 05/30/15 23:16, Tom Lane wrote: > I wrote: >> So what this seems to mean is that for SEMI/ANTI join cases, we have to >> postpone all of the inner scan cost determination to final_cost_nestloop, >> so that we can do this differently depending on whether >> has_indexed_join_quals() is true. That's a little bit annoying because it >> will mean we take the shortcut exit less often; but since SEMI/ANTI joins >> aren't that common, it's probably not going to be a big planning time hit. > > Attached is a draft patch for that. It fixes the problem for me: > > Nested Loop Semi Join (cost=0.99..9.09 rows=1 width=74) (actual time=0.591..1.554 rows=2 loops=1) > -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.55..8.57 rows=1 width=74) (actual time=0.022..0.025rows=2 loops=1) > Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) > -> Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..143244.98 rows=5015134width=2) (actual time=0.759..0.759 rows=1 loops=2) > Index Cond: (berechnungsart = (t.term)::text) > Heap Fetches: 0 > Planning time: 0.545 ms > Execution time: 1.615 ms Seems to be working OK, but I still do get a Bitmap Heap Scan there (but more about that later). Do you plan to push that into 9.5, or 9.6? I assume it's a behavior change so that no back-patching, right? > >> Not sure yet about your other point about the indexscan getting >> rejected too soon. That doesn't seem to be happening for me, at >> least not in HEAD. > > I do see something of the sort if I turn off enable_indexonlyscan. > Not sure about a good fix for that aspect. It may not be terribly > critical, since AFAICS index-only scans generally ought to apply > in these cases. Hmmm, a VACUUM FREEZE fixed that for me. The reason is that right after loading the testcase, I do get this: Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..280220.51 rows=5000016width=2) and after VACUUM FREEZE I do get this: Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.43..142344.43 rows=5000000width=2) and the Bitmap Heap Scan case looks like this: Bitmap Heap Scan on facttable_stat_fta4 f (cost=93594.56..200342.76 rows=5000016 width=2) so it's cheaper (total cost) than the index only scan before freezing, and more expensive than index only scan after freezing. I still think this is wrong (or rather "suboptimal") - there are probably cases where even the "freezed" index only scan is more expensive than a bitmap heap scan, and in that case the it won't be used, although it'd be much faster. Another example is a query with a plain index scan, e.g. consider this slight modification of the query: SELECT facttablename, columnname, term FROM term t WHERE facttablename='facttable_stat_fta4' AND columnname='berechnungsart' AND EXISTS (SELECT 1 FROM facttable_stat_fta4 f WHERE f.berechnungsart=t.term AND einheit IS NOT NULL); This will result in bitmap index scan no matter the visibility. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 05/30/15 23:16, Tom Lane wrote: >> Attached is a draft patch for that. It fixes the problem for me: > Seems to be working OK, but I still do get a Bitmap Heap Scan there (but > more about that later). Attached is an incremental patch (on top of the previous one) to allow startup cost of parameterized paths to be considered when the relation is the RHS of a semi or anti join. It seems reasonably clean except for one thing: logically, we perhaps should remove the checks on path->param_info from the last half of compare_path_costs_fuzzily(), so as to increase the symmetry between parameterized paths and unparameterized ones. However, when I did that, I got changes in some of the regression test plans, and they didn't seem to be for the better. So I left that alone. As-is, this patch doesn't seem to affect the results for any existing regression tests. > Do you plan to push that into 9.5, or 9.6? I assume it's a behavior > change so that no back-patching, right? Mumble. It's definitely a planner bug fix, and I believe that the effects are narrowly constrained to cases where significantly better plans are possible. So personally I'd be willing to back-patch it. But others might see that differently, especially since it's been like this for a long time and we only have one field complaint. regards, tom lane diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 4775acf..87304ba 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *************** _outRelOptInfo(StringInfo str, const Rel *** 1844,1849 **** --- 1844,1850 ---- WRITE_FLOAT_FIELD(rows, "%.0f"); WRITE_INT_FIELD(width); WRITE_BOOL_FIELD(consider_startup); + WRITE_BOOL_FIELD(consider_param_startup); WRITE_NODE_FIELD(reltargetlist); WRITE_NODE_FIELD(pathlist); WRITE_NODE_FIELD(ppilist); diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index d51fd57..d1349aa 100644 *** a/src/backend/optimizer/README --- b/src/backend/optimizer/README *************** a nestloop that provides parameters to t *** 795,801 **** do not ignore merge joins entirely, joinpath.c does not fully explore the space of potential merge joins with parameterized inputs. Also, add_path treats parameterized paths as having no pathkeys, so that they compete ! only on total cost and rowcount; they don't get preference for producing a special sort order. This creates additional bias against merge joins, since we might discard a path that could have been useful for performing a merge without an explicit sort step. Since a parameterized path must --- 795,801 ---- do not ignore merge joins entirely, joinpath.c does not fully explore the space of potential merge joins with parameterized inputs. Also, add_path treats parameterized paths as having no pathkeys, so that they compete ! only on cost and rowcount; they don't get preference for producing a special sort order. This creates additional bias against merge joins, since we might discard a path that could have been useful for performing a merge without an explicit sort step. Since a parameterized path must *************** uninteresting, these choices do not affe *** 804,809 **** --- 804,817 ---- output order of a query --- they only make it harder to use a merge join at a lower level. The savings in planning work justifies that. + Similarly, parameterized paths do not normally get preference in add_path + for having cheap startup cost; that's seldom of much value when on the + inside of a nestloop, so it seems not worth keeping extra paths solely for + that. An exception occurs for parameterized paths for the RHS relation of + a SEMI or ANTI join: in those cases, we can stop the inner scan after the + first match, so that it's primarily startup not total cost that we care + about. + LATERAL subqueries ------------------ diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 4e6d90d..0b83189 100644 *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *************** set_rel_pathlist_hook_type set_rel_pathl *** 61,66 **** --- 61,67 ---- join_search_hook_type join_search_hook = NULL; + static void set_base_rel_consider_startup(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, *************** make_one_rel(PlannerInfo *root, List *jo *** 152,157 **** --- 153,161 ---- root->all_baserels = bms_add_member(root->all_baserels, brel->relid); } + /* Mark base rels as to whether we care about fast-start plans */ + set_base_rel_consider_startup(root); + /* * Generate access paths for the base rels. */ *************** make_one_rel(PlannerInfo *root, List *jo *** 172,177 **** --- 176,224 ---- } /* + * set_base_rel_consider_startup + * Set the consider_[param_]startup flags for each base-relation entry. + * + * For the moment, we only deal with consider_param_startup here; because the + * logic for consider_startup is pretty trivial and is the same for every base + * relation, we just let build_simple_rel() initialize that flag correctly to + * start with. If that logic ever gets more complicated it would probably + * be better to move it here. + */ + static void + set_base_rel_consider_startup(PlannerInfo *root) + { + /* + * Since parameterized paths can only be used on the inside of a nestloop + * join plan, there is usually little value in considering fast-start + * plans for them. However, for relations that are on the RHS of a SEMI + * or ANTI join, a fast-start plan can be useful because we're only going + * to care about fetching one tuple anyway. + * + * To minimize growth of planning time, we currently restrict this to + * cases where the RHS is a single base relation, not a join; there is no + * provision for consider_param_startup to get set at all on joinrels. + * Also we don't worry about appendrels. costsize.c's costing rules for + * nestloop semi/antijoins don't consider such cases either. + */ + ListCell *lc; + + foreach(lc, root->join_info_list) + { + SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); + int varno; + + if ((sjinfo->jointype == JOIN_SEMI || sjinfo->jointype == JOIN_ANTI) && + bms_get_singleton_member(sjinfo->syn_righthand, &varno)) + { + RelOptInfo *rel = find_base_rel(root, varno); + + rel->consider_param_startup = true; + } + } + } + + /* * set_base_rel_sizes * Set the size estimates (rows and widths) for each base-relation entry. * diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 7f7aa24..2afd59a 100644 *** a/src/backend/optimizer/util/pathnode.c --- b/src/backend/optimizer/util/pathnode.c *************** compare_fractional_path_costs(Path *path *** 138,156 **** * total cost, we just say that their costs are "different", since neither * dominates the other across the whole performance spectrum. * ! * If consider_startup is false, then we don't care about keeping paths with ! * good startup cost, so we'll never return COSTS_DIFFERENT. ! * ! * This function also includes special hacks to support a policy enforced ! * by its sole caller, add_path(): paths that have any parameterization ! * cannot win comparisons on the grounds of having cheaper startup cost, ! * since we deem only total cost to be of interest for a parameterized path. ! * (Unparameterized paths are more common, so we check for this case last.) */ static PathCostComparison ! compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor, ! bool consider_startup) { /* * Check total cost first since it's more likely to be different; many * paths have zero startup cost. --- 138,154 ---- * total cost, we just say that their costs are "different", since neither * dominates the other across the whole performance spectrum. * ! * This function also enforces a policy rule that paths for which the relevant ! * one of parent->consider_startup and parent->consider_param_startup is false ! * cannot win comparisons on the grounds of good startup cost, so we never ! * return COSTS_DIFFERENT when that is true for the total-cost loser. */ static PathCostComparison ! compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor) { + #define CONSIDER_PATH_STARTUP_COST(p) \ + ((p)->param_info == NULL ? (p)->parent->consider_startup : (p)->parent->consider_param_startup) + /* * Check total cost first since it's more likely to be different; many * paths have zero startup cost. *************** compare_path_costs_fuzzily(Path *path1, *** 158,166 **** if (path1->total_cost > path2->total_cost * fuzz_factor) { /* path1 fuzzily worse on total cost */ ! if (consider_startup && ! path2->startup_cost > path1->startup_cost * fuzz_factor && ! path1->param_info == NULL) { /* ... but path2 fuzzily worse on startup, so DIFFERENT */ return COSTS_DIFFERENT; --- 156,163 ---- if (path1->total_cost > path2->total_cost * fuzz_factor) { /* path1 fuzzily worse on total cost */ ! if (CONSIDER_PATH_STARTUP_COST(path1) && ! path2->startup_cost > path1->startup_cost * fuzz_factor) { /* ... but path2 fuzzily worse on startup, so DIFFERENT */ return COSTS_DIFFERENT; *************** compare_path_costs_fuzzily(Path *path1, *** 171,179 **** if (path2->total_cost > path1->total_cost * fuzz_factor) { /* path2 fuzzily worse on total cost */ ! if (consider_startup && ! path1->startup_cost > path2->startup_cost * fuzz_factor && ! path2->param_info == NULL) { /* ... but path1 fuzzily worse on startup, so DIFFERENT */ return COSTS_DIFFERENT; --- 168,175 ---- if (path2->total_cost > path1->total_cost * fuzz_factor) { /* path2 fuzzily worse on total cost */ ! if (CONSIDER_PATH_STARTUP_COST(path2) && ! path1->startup_cost > path2->startup_cost * fuzz_factor) { /* ... but path1 fuzzily worse on startup, so DIFFERENT */ return COSTS_DIFFERENT; *************** compare_path_costs_fuzzily(Path *path1, *** 181,188 **** /* else path1 dominates */ return COSTS_BETTER1; } ! /* fuzzily the same on total cost */ ! /* (so we may as well compare startup cost, even if !consider_startup) */ if (path1->startup_cost > path2->startup_cost * fuzz_factor && path2->param_info == NULL) { --- 177,189 ---- /* else path1 dominates */ return COSTS_BETTER1; } ! ! /* ! * Fuzzily the same on total cost (so we might as well compare startup ! * cost, even when that would otherwise be uninteresting; but ! * parameterized paths aren't allowed to win this way, we'd rather move on ! * to other comparison heuristics) ! */ if (path1->startup_cost > path2->startup_cost * fuzz_factor && path2->param_info == NULL) { *************** compare_path_costs_fuzzily(Path *path1, *** 197,202 **** --- 198,205 ---- } /* fuzzily the same on both costs */ return COSTS_EQUAL; + + #undef CONSIDER_PATH_STARTUP_COST } /* *************** compare_path_costs_fuzzily(Path *path1, *** 212,222 **** * * The cheapest_parameterized_paths list collects all parameterized paths * that have survived the add_path() tournament for this relation. (Since ! * add_path ignores pathkeys and startup cost for a parameterized path, ! * these will be paths that have best total cost or best row count for their ! * parameterization.) cheapest_parameterized_paths always includes the ! * cheapest-total unparameterized path, too, if there is one; the users of ! * that list find it more convenient if that's included. * * This is normally called only after we've finished constructing the path * list for the rel node. --- 215,225 ---- * * The cheapest_parameterized_paths list collects all parameterized paths * that have survived the add_path() tournament for this relation. (Since ! * add_path ignores pathkeys for a parameterized path, these will be paths ! * that have best cost or best row count for their parameterization.) ! * cheapest_parameterized_paths always includes the cheapest-total ! * unparameterized path, too, if there is one; the users of that list find ! * it more convenient if that's included. * * This is normally called only after we've finished constructing the path * list for the rel node. *************** set_cheapest(RelOptInfo *parent_rel) *** 361,374 **** * cases do arise, so we make the full set of checks anyway. * * There are two policy decisions embedded in this function, along with ! * its sibling add_path_precheck: we treat all parameterized paths as ! * having NIL pathkeys, and we ignore their startup costs, so that they ! * compete only on parameterization, total cost and rowcount. This is to ! * reduce the number of parameterized paths that are kept. See discussion ! * in src/backend/optimizer/README. * ! * Another policy that is enforced here is that we only consider cheap ! * startup cost to be interesting if parent_rel->consider_startup is true. * * The pathlist is kept sorted by total_cost, with cheaper paths * at the front. Within this routine, that's simply a speed hack: --- 364,378 ---- * cases do arise, so we make the full set of checks anyway. * * There are two policy decisions embedded in this function, along with ! * its sibling add_path_precheck. First, we treat all parameterized paths ! * as having NIL pathkeys, so that they cannot win comparisons on the ! * basis of sort order. This is to reduce the number of parameterized ! * paths that are kept; see discussion in src/backend/optimizer/README. * ! * Second, we only consider cheap startup cost to be interesting if ! * parent_rel->consider_startup is true for an unparameterized path, or ! * parent_rel->consider_param_startup is true for a parameterized one. ! * Again, this allows discarding useless paths sooner. * * The pathlist is kept sorted by total_cost, with cheaper paths * at the front. Within this routine, that's simply a speed hack: *************** add_path(RelOptInfo *parent_rel, Path *n *** 433,440 **** * Do a fuzzy cost comparison with 1% fuzziness limit. (XXX does this * percentage need to be user-configurable?) */ ! costcmp = compare_path_costs_fuzzily(new_path, old_path, 1.01, ! parent_rel->consider_startup); /* * If the two paths compare differently for startup and total cost, --- 437,443 ---- * Do a fuzzy cost comparison with 1% fuzziness limit. (XXX does this * percentage need to be user-configurable?) */ ! costcmp = compare_path_costs_fuzzily(new_path, old_path, 1.01); /* * If the two paths compare differently for startup and total cost, *************** add_path(RelOptInfo *parent_rel, Path *n *** 501,508 **** accept_new = false; /* old dominates new */ else if (compare_path_costs_fuzzily(new_path, old_path, ! 1.0000000001, ! parent_rel->consider_startup) == COSTS_BETTER1) remove_old = true; /* new dominates old */ else accept_new = false; /* old equals or --- 504,510 ---- accept_new = false; /* old dominates new */ else if (compare_path_costs_fuzzily(new_path, old_path, ! 1.0000000001) == COSTS_BETTER1) remove_old = true; /* new dominates old */ else accept_new = false; /* old equals or *************** add_path_precheck(RelOptInfo *parent_rel *** 622,632 **** --- 624,638 ---- List *pathkeys, Relids required_outer) { List *new_path_pathkeys; + bool consider_startup; ListCell *p1; /* Pretend parameterized paths have no pathkeys, per add_path policy */ new_path_pathkeys = required_outer ? NIL : pathkeys; + /* Decide whether new path's startup cost is interesting */ + consider_startup = required_outer ? parent_rel->consider_param_startup : parent_rel->consider_startup; + foreach(p1, parent_rel->pathlist) { Path *old_path = (Path *) lfirst(p1); *************** add_path_precheck(RelOptInfo *parent_rel *** 644,651 **** */ if (total_cost >= old_path->total_cost) { ! /* can win on startup cost only if unparameterized */ ! if (startup_cost >= old_path->startup_cost || required_outer) { /* new path does not win on cost, so check pathkeys... */ List *old_path_pathkeys; --- 650,657 ---- */ if (total_cost >= old_path->total_cost) { ! /* new path can win on startup cost only if consider_startup */ ! if (startup_cost >= old_path->startup_cost || !consider_startup) { /* new path does not win on cost, so check pathkeys... */ List *old_path_pathkeys; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 1d635cd..be2ef3b 100644 *** a/src/backend/optimizer/util/relnode.c --- b/src/backend/optimizer/util/relnode.c *************** build_simple_rel(PlannerInfo *root, int *** 101,106 **** --- 101,107 ---- rel->width = 0; /* cheap startup cost is interesting iff not all tuples to be retrieved */ rel->consider_startup = (root->tuple_fraction > 0); + rel->consider_param_startup = false; /* might get changed later */ rel->reltargetlist = NIL; rel->pathlist = NIL; rel->ppilist = NIL; *************** build_join_rel(PlannerInfo *root, *** 361,366 **** --- 362,368 ---- joinrel->width = 0; /* cheap startup cost is interesting iff not all tuples to be retrieved */ joinrel->consider_startup = (root->tuple_fraction > 0); + joinrel->consider_param_startup = false; joinrel->reltargetlist = NIL; joinrel->pathlist = NIL; joinrel->ppilist = NIL; diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 706b6d1..33b0874 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** typedef struct PlannerInfo *** 321,328 **** * clauses have been applied (ie, output rows of a plan for it) * width - avg. number of bytes per tuple in the relation after the * appropriate projections have been done (ie, output width) ! * consider_startup - true if there is any value in keeping paths for * this rel on the basis of having cheap startup cost * reltargetlist - List of Var and PlaceHolderVar nodes for the values * we need to output from this relation. * List is in no particular order, but all rels of an --- 321,329 ---- * clauses have been applied (ie, output rows of a plan for it) * width - avg. number of bytes per tuple in the relation after the * appropriate projections have been done (ie, output width) ! * consider_startup - true if there is any value in keeping plain paths for * this rel on the basis of having cheap startup cost + * consider_param_startup - the same for parameterized paths * reltargetlist - List of Var and PlaceHolderVar nodes for the values * we need to output from this relation. * List is in no particular order, but all rels of an *************** typedef struct RelOptInfo *** 437,442 **** --- 438,444 ---- /* per-relation planner control flags */ bool consider_startup; /* keep cheap-startup-cost paths? */ + bool consider_param_startup; /* ditto, for parameterized paths? */ /* materialization information */ List *reltargetlist; /* Vars to be output by scan of relation */
On 06/01/15 00:08, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 05/30/15 23:16, Tom Lane wrote: >>> Attached is a draft patch for that. It fixes the problem for me: > >> Seems to be working OK, but I still do get a Bitmap Heap Scan there (but >> more about that later). > > Attached is an incremental patch (on top of the previous one) to > allow startup cost of parameterized paths to be considered when the > relation is the RHS of a semi or anti join. It seems reasonably clean > except for one thing: logically, we perhaps should remove the checks > on path->param_info from the last half of > compare_path_costs_fuzzily(), so as to increase the symmetry between > parameterized paths and unparameterized ones. However, when I did > that, I got changes in some of the regression test plans, and they > didn't seem to be for the better. So I left that alone. As-is, this > patch doesn't seem to affect the results for any existing regression > tests. Seems to be working fine. I've tried a bunch of queries modifying the test case in various ways, and all seem to be planned fine. I've been unable to come up with a query that'd get planned badly. Regarding the remaining checks in compare_path_costs_fuzzily(), isn't that simply caused by very small data sets? For example the first "failing" plan in join.sql looks like this: Nested Loop Left Join (cost=0.29..22.80 rows=2 width=12) Nested Loop Left Join (cost=0.29..22.80 rows=2 width=12) Output: "*VALUES*".column1, i1.f1, (666) Join Filter: ("*VALUES*".column1= i1.f1) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) Output: "*VALUES*".column1 -> Materialize (cost=0.29..22.64 rows=5 width=8) Output: i1.f1, (666) -> Nested LoopLeft Join (cost=0.29..22.61 rows=5 width=8) Output: i1.f1, 666 -> Seq Scan on public.int4_tbli1 (cost=0.00..1.05 ... Output: i1.f1 -> Index Only Scan using tenk1_unique2on public.... Output: i2.unique2 Index Cond: (i2.unique2 = i1.f1) while with the changes it'd look like this: Hash Right Join (cost=0.34..22.70 rows=2 width=12) Output: "*VALUES*".column1, i1.f1, (666) Hash Cond: (i1.f1 = "*VALUES*".column1) -> Nested Loop Left Join (cost=0.29..22.61 rows=5 width=8) Output: i1.f1, 666 -> Seq Scan on public.int4_tbl i1 (cost=0.00..1.05 ... Output: i1.f1 -> Index Only Scan using tenk1_unique2on public.tenk1 ... Output: i2.unique2 Index Cond: (i2.unique2 = i1.f1) -> Hash (cost=0.03..0.03 rows=2 width=4) Output: "*VALUES*".column1 -> Values Scan on "*VALUES*" (cost=0.00..0.03rows=2 ... Output: "*VALUES*".column1 (14 rows) So the planner actually believes the plan to be cheaper, although only by a tiny margin. And I see pretty much no difference in planning/exec time (but I'm on a machine with power-management and VMs, so a lot of random noise). But once the int4_tbl gets bigger (say, 160k rows instead of the 5), even the current the hash join clearly wins. Actually, it switches from the current plan way sooner (at 500 rows it's already using the hash join, and I see about the same timings). I don't really see why you think those plan changes to be bad? And even if they are, isn't that simply a matter of tuning the cost parameters? >> Do you plan to push that into 9.5, or 9.6? I assume it's a >> behavior change so that no back-patching, right? > > Mumble. It's definitely a planner bug fix, and I believe that the > effects are narrowly constrained to cases where significantly better > plans are possible. So personally I'd be willing to back-patch it. > But others might see that differently, especially since it's been > like this for a long time and we only have one field complaint. +1 to back-patching from me. It's true we only have one field complaint, but I believe there are more users impacted by this. They just didn't notice - we only really got this complaint because of the plan discrepancy between queries that are almost exactly the same. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 06/01/15 00:08, Tom Lane wrote: >> Attached is an incremental patch (on top of the previous one) to >> allow startup cost of parameterized paths to be considered when the >> relation is the RHS of a semi or anti join. It seems reasonably clean >> except for one thing: logically, we perhaps should remove the checks >> on path->param_info from the last half of >> compare_path_costs_fuzzily(), so as to increase the symmetry between >> parameterized paths and unparameterized ones. However, when I did >> that, I got changes in some of the regression test plans, and they >> didn't seem to be for the better. So I left that alone. As-is, this >> patch doesn't seem to affect the results for any existing regression >> tests. > Regarding the remaining checks in compare_path_costs_fuzzily(), isn't > that simply caused by very small data sets? No, I don't think the size of the data set is the issue. The point is that previously, if we had two parameterized paths of fuzzily the same total cost, we'd choose which to keep based on factors other than startup cost. If we change that part of compare_path_costs_fuzzily(), we'll be changing the behavior in cases that are unrelated to the problem we are trying to solve, because same-total-cost isn't going to apply to the bitmap indexscan vs. plain indexscan cases we're worried about here. (The places where it tends to happen are things like hash left join vs hash right join with the inputs swapped.) I don't want to change such cases in a patch that's meant to be back-patched; people don't like plan instability in stable branches. It's also not quite clear what the new rule ought to be. Rather than treating parameterized paths exactly like regular ones, maybe we should consider their startup cost only if consider_param_startup is true. It remains the case that in most scenarios for parameterized paths, startup cost really isn't that interesting and shouldn't drive choices. Another practical problem is that the regression tests that are being affected are specifically meant to test particular scenarios in construction of nested-loop join plans. If the planner stops selecting a nestloop plan there, the test is effectively broken. We could probably rejigger the test queries to restore the intended test scenario, but that wasn't an exercise I was in a hurry to undertake. Anyway, bottom line is that if we want to change that, it ought to be a separate HEAD-only patch. >>> Do you plan to push that into 9.5, or 9.6? I assume it's a >>> behavior change so that no back-patching, right? >> Mumble. It's definitely a planner bug fix, and I believe that the >> effects are narrowly constrained to cases where significantly better >> plans are possible. So personally I'd be willing to back-patch it. >> But others might see that differently, especially since it's been >> like this for a long time and we only have one field complaint. > +1 to back-patching from me. It's true we only have one field complaint, > but I believe there are more users impacted by this. They just didn't > notice - we only really got this complaint because of the plan > discrepancy between queries that are almost exactly the same. Perhaps. It seems like in many cases the planner would accidentally choose the "right" plan anyway, because even though it was overestimating the cost, the other alternatives would usually look even worse. So it's hard to tell how many users will be benefited. But I'm fairly sure the patch as posted is narrow enough that very few people would be negatively affected. Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's releases, but I will push it later this week if there are not objections. regards, tom lane
On 06/01/2015 02:18 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 06/01/15 00:08, Tom Lane wrote: >>> Attached is an incremental patch (on top of the previous one) to >>> allow startup cost of parameterized paths to be considered when the >>> relation is the RHS of a semi or anti join. It seems reasonably clean >>> except for one thing: logically, we perhaps should remove the checks >>> on path->param_info from the last half of >>> compare_path_costs_fuzzily(), so as to increase the symmetry between >>> parameterized paths and unparameterized ones. However, when I did >>> that, I got changes in some of the regression test plans, and they >>> didn't seem to be for the better. So I left that alone. As-is, this >>> patch doesn't seem to affect the results for any existing regression >>> tests. > >> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't >> that simply caused by very small data sets? > > No, I don't think the size of the data set is the issue. The point is > that previously, if we had two parameterized paths of fuzzily the same > total cost, we'd choose which to keep based on factors other than startup > cost. If we change that part of compare_path_costs_fuzzily(), we'll be > changing the behavior in cases that are unrelated to the problem we are > trying to solve, because same-total-cost isn't going to apply to the > bitmap indexscan vs. plain indexscan cases we're worried about here. > (The places where it tends to happen are things like hash left join > vs hash right join with the inputs swapped.) I don't want to change such > cases in a patch that's meant to be back-patched; people don't like plan > instability in stable branches. > > It's also not quite clear what the new rule ought to be. Rather than > treating parameterized paths exactly like regular ones, maybe we should > consider their startup cost only if consider_param_startup is true. > It remains the case that in most scenarios for parameterized paths, > startup cost really isn't that interesting and shouldn't drive choices. > > Another practical problem is that the regression tests that are being > affected are specifically meant to test particular scenarios in > construction of nested-loop join plans. If the planner stops selecting > a nestloop plan there, the test is effectively broken. We could probably > rejigger the test queries to restore the intended test scenario, but that > wasn't an exercise I was in a hurry to undertake. > > Anyway, bottom line is that if we want to change that, it ought to be > a separate HEAD-only patch. > >>>> Do you plan to push that into 9.5, or 9.6? I assume it's a >>>> behavior change so that no back-patching, right? > >>> Mumble. It's definitely a planner bug fix, and I believe that the >>> effects are narrowly constrained to cases where significantly better >>> plans are possible. So personally I'd be willing to back-patch it. >>> But others might see that differently, especially since it's been >>> like this for a long time and we only have one field complaint. > >> +1 to back-patching from me. It's true we only have one field complaint, >> but I believe there are more users impacted by this. They just didn't >> notice - we only really got this complaint because of the plan >> discrepancy between queries that are almost exactly the same. > > Perhaps. It seems like in many cases the planner would accidentally > choose the "right" plan anyway, because even though it was overestimating > the cost, the other alternatives would usually look even worse. So it's > hard to tell how many users will be benefited. But I'm fairly sure the > patch as posted is narrow enough that very few people would be negatively > affected. > > Anybody else want to speak for or against back-patching the patch as > posted? I intentionally didn't push it in before today's releases, > but I will push it later this week if there are not objections. I would like Mark Wong to test this on DBT3, if that's possible for him.I'm very worried about an unanticipated regression. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/01/15 23:47, Josh Berkus wrote: > On 06/01/2015 02:18 PM, Tom Lane wrote: > >> Anybody else want to speak for or against back-patching the patch as >> posted? I intentionally didn't push it in before today's releases, >> but I will push it later this week if there are not objections. > > I would like Mark Wong to test this on DBT3, if that's possible for him. > I'm very worried about an unanticipated regression. AFAIK Mark is busy with other stuff at the moment, but I can do the TPC-H (which DBT3 is equal to, IIRC). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/01/2015 03:22 PM, Tomas Vondra wrote: > > On 06/01/15 23:47, Josh Berkus wrote: >> On 06/01/2015 02:18 PM, Tom Lane wrote: >> >>> Anybody else want to speak for or against back-patching the patch as >>> posted? I intentionally didn't push it in before today's releases, >>> but I will push it later this week if there are not objections. >> >> I would like Mark Wong to test this on DBT3, if that's possible for him. >> I'm very worried about an unanticipated regression. > > AFAIK Mark is busy with other stuff at the moment, but I can do the > TPC-H (which DBT3 is equal to, IIRC). Yeah, I just want something which has a chance of catching unanticipated regression in queries which should be unaffected by the patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/02/15 01:47, Josh Berkus wrote: > On 06/01/2015 03:22 PM, Tomas Vondra wrote: >> >> On 06/01/15 23:47, Josh Berkus wrote: >>> On 06/01/2015 02:18 PM, Tom Lane wrote: >>> >>>> Anybody else want to speak for or against back-patching the patch as >>>> posted? I intentionally didn't push it in before today's releases, >>>> but I will push it later this week if there are not objections. >>> >>> I would like Mark Wong to test this on DBT3, if that's possible >>> for him. I'm very worried about an unanticipated regression. >> >> AFAIK Mark is busy with other stuff at the moment, but I can do >> the TPC-H (which DBT3 is equal to, IIRC). > > Yeah, I just want something which has a chance of catching > unanticipated regression in queries which should be unaffected by the > patch. OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The results are good, IMHO. With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan changes at all. I don't plan to run the tests on larger data sets, I do expect the behavior to remain the same, considering the uniformity of TPC-H data sets. With TPC-DS (using the 63 queries supported by PostgreSQL), I've seen two cases of plan changes - see the plans attached. In both cases however the plan change results in much better performance. While on master the queries took 23 and 18 seconds, with the two patches it's only 7 and 3. This is just the 1GB dataset. I'll repeat the test with the 4GB dataset and post an update if there are any changes. While this can't prove there are no regressions, in these two benchmarks the patches actually improve some of the queries. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The > results are good, IMHO. > With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan > changes at all. I don't plan to run the tests on larger data sets, I do > expect the behavior to remain the same, considering the uniformity of > TPC-H data sets. > With TPC-DS (using the 63 queries supported by PostgreSQL), I've seen > two cases of plan changes - see the plans attached. In both cases > however the plan change results in much better performance. While on > master the queries took 23 and 18 seconds, with the two patches it's > only 7 and 3. This is just the 1GB dataset. I'll repeat the test with > the 4GB dataset and post an update if there are any changes. I'm a bit disturbed by that, because AFAICS from the plans, these queries did not involve any semi or anti joins, which should mean that the patch would not have changed the planner's behavior. You were using the second patch as-posted, right, without further hacking on compare_path_costs_fuzzily? It's possible that the change was due to random variation in ANALYZE statistics, in which case it was just luck. regards, tom lane
On 06/02/15 16:37, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The >> results are good, IMHO. > > I'm a bit disturbed by that, because AFAICS from the plans, these > queries did not involve any semi or anti joins, which should mean > that the patch would not have changed the planner's behavior. You > were using the second patch as-posted, right, without further hacking > on compare_path_costs_fuzzily? Yes, no additional changes. > It's possible that the change was due to random variation in ANALYZE > statistics, in which case it was just luck. I don't think so. I simply loaded the data, ran ANALYZE, and then simply started either master or patched master. There should be no difference in statistics, I believe. Also, the plans contain pretty much the same row counts, but the costs differ. For example look at the 'cs_ui' CTE, right at the beginning of the analyze logs. The row counts are exactly the same, but the costs are different. And it's not using semijoins or not nested loops ... -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 06/02/15 16:37, Tom Lane wrote: >> It's possible that the change was due to random variation in ANALYZE >> statistics, in which case it was just luck. > I don't think so. I simply loaded the data, ran ANALYZE, and then simply > started either master or patched master. There should be no difference > in statistics, I believe. Also, the plans contain pretty much the same > row counts, but the costs differ. > For example look at the 'cs_ui' CTE, right at the beginning of the > analyze logs. The row counts are exactly the same, but the costs are > different. And it's not using semijoins or not nested loops ... The cost estimates in that CTE all look exactly the same to me, and the actual runtime's not all that different either. The key runtime difference in the first query seems to be in the first couple of joins to the cs_ui CTE: -> Nested Loop (cost=4.63..724.34 rows=16 width=67) (actual time=8346.904..14024.947 rows=117 loops=1) Join Filter: (item.i_item_sk = store_returns.sr_item_sk) -> Nested Loop (cost=0.29..662.07 rows=1 width=59)(actual time=8324.352..13618.106 rows=8 loops=1) -> CTE Scan on cs_ui (cost=0.00..2.16 rows=108width=4) (actual time=7264.670..7424.096 rows=17169 loops=1) -> Index Scan using item_pkey onitem (cost=0.29..6.10 rows=1 width=55) (actual time=0.356..0.356 rows=0 loops=17169) Index Cond:(i_item_sk = cs_ui.cs_item_sk) Filter: ((i_current_price >= '79'::numeric) AND (i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND (i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[]))) Rows Removed byFilter: 1 -> Bitmap Heap Scan on store_returns (cost=4.34..62.05 rows=18 width=8) (actual time=32.525..50.662rows=15 loops=8) Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk) HeapBlocks: exact=117 -> Bitmap Index Scan on idx_sr_item_sk (cost=0.00..4.34 rows=18 width=0) (actualtime=19.747..19.747 rows=15 loops=8) Index Cond: (sr_item_sk = cs_ui.cs_item_sk) vs patched -> Nested Loop (cost=4.63..724.34 rows=16 width=67) (actual time=6867.921..7001.417 rows=117 loops=1) Join Filter: (item.i_item_sk = store_returns.sr_item_sk) -> Nested Loop (cost=0.29..662.07 rows=1 width=59)(actual time=6867.874..7000.211 rows=8 loops=1) -> CTE Scan on cs_ui (cost=0.00..2.16 rows=108width=4) (actual time=6865.792..6924.816 rows=17169 loops=1) -> Index Scan using item_pkey onitem (cost=0.29..6.10 rows=1 width=55) (actual time=0.003..0.003 rows=0 loops=17169) Index Cond:(i_item_sk = cs_ui.cs_item_sk) Filter: ((i_current_price >= '79'::numeric) AND (i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND (i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[]))) Rows Removed byFilter: 1 -> Bitmap Heap Scan on store_returns (cost=4.34..62.05 rows=18 width=8) (actual time=0.025..0.116rows=15 loops=8) Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk) HeapBlocks: exact=117 -> Bitmap Index Scan on idx_sr_item_sk (cost=0.00..4.34 rows=18 width=0) (actualtime=0.017..0.017 rows=15 loops=8) Index Cond: (sr_item_sk = cs_ui.cs_item_sk) Those are the exact same plans, and same cost estimates, but for some reason the master run is 2X slower. The only explanation I can think of is disk caching effects, or maybe hint-bit setting during the first run. It's certainly not the planner that deserves either credit or blame for that. The part of this plan that's actually different is a different join order sequence for a lot of follow-on joins that are expected to get single rows from their other table. I think that's basically irrelevant really, but again, this patch should not have changed anything there, since those were plain joins not semijoins. regards, tom lane
On 06/02/15 17:50, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 06/02/15 16:37, Tom Lane wrote: >>> It's possible that the change was due to random variation in ANALYZE >>> statistics, in which case it was just luck. > >> I don't think so. I simply loaded the data, ran ANALYZE, and then simply >> started either master or patched master. There should be no difference >> in statistics, I believe. Also, the plans contain pretty much the same >> row counts, but the costs differ. > >> For example look at the 'cs_ui' CTE, right at the beginning of the >> analyze logs. The row counts are exactly the same, but the costs are >> different. And it's not using semijoins or not nested loops ... > > The cost estimates in that CTE all look exactly the same to me, and the > actual runtime's not all that different either. The key runtime > difference in the first query seems to be in the first couple of joins > to the cs_ui CTE: D'oh! I was looking at the timings, not costs. EINOTENOUGHCAFFEINE ... > Those are the exact same plans, and same cost estimates, but for > some reason the master run is 2X slower. The only explanation I can > think of is disk caching effects, or maybe hint-bit setting during > the first run. It's certainly not the planner that deserves either > credit or blame for that. Yes, seems like that. > The part of this plan that's actually different is a different join > order sequence for a lot of follow-on joins that are expected to get > single rows from their other table. I think that's basically > irrelevant really, but again, this patch should not have changed > anything there, since those were plain joins not semijoins. Yes, it's interesting. I've repeated the tests on 1GB TPC-DS dataset, and this time I got slightly different set of plan changes. I don't see the change in join order, for example. Attached is the whole explain output (all 63 queries), explain.master.1 and explain.patched.1. The are two kinds of plan changes: 1) minor cost changes (these queries contain semijoins, so it's expected), but the plan is exactly the same 2) queries with nested loop replaced by hash join, i.e. plan on master looks like this: -> Nested Loop -> Nested Loop -> Nested Loop -> Seq Scan -> Index Scan -> Index Scan -> Materialize -> Seq Scan while with the patches: -> Hash Join -> Hash Join -> Nested Loop -> Seq Scan -> Index Scan -> Hash -> Seq Scan -> Hash -> Seq Scan This is slightly surprising, because those queries do not contain any semijoins. Detailed explain analyze for the three queries with different plans is attached. At the end, I repeated the explain on master, and got exactly the same results as at the beginning (same as explain.master.1) - so there's no variability in statistics at all. Another interesting thing is that these plan changes only happen after the second patch, i.e. with nestloop-semijoin-costing-fix everything is exactly as on master, but with consider-startup-costs-for-semijoins the plans change. I suspected the problem is somewhere in compare_path_costs_fuzzily() but that doesn't seem to be the case - I've however noticed that the number of calls to that function changes with the patch. For example when running the query in tpcds-3.log, compare_fuzzily os called ~1050x, but with the patch attached it's called only ~730x. I've checked backtraces for a few of the paths that are not considered with the patch, and all of them look like this (detailed backtrace attached): #3 0x0000000000656cd5 in compare_path_costs_fuzzily at pathnode.c:154 #4 0x0000000000657146 in add_path at pathnode.c:438 #5 0x0000000000631bfb in try_nestloop_path at joinpath.c:347 #6 0x0000000000632931 in match_unsorted_outer at joinpath.c:900 #7 add_paths_to_joinrel at joinpath.c:225 That particular part of try_nestloop_path looks like this: if (add_path_precheck(joinrel, workspace.startup_cost, workspace.total_cost, pathkeys, required_outer)) { add_path(joinrel, (Path *) create_nestloop_path(...); } I wouldn't be surprised if the problem was in add_path_precheck. Actually, I'm quite sure that's where the problem is - particularly in how required_outer was replaced with (!consider_startup): consider_startup = required_outer ? consider_param_startup : consider_startup; If both _startup flags are false, the original required_outer value is lost, and that changes the if() condition which was changed from if (startup_cost >= old_path->startup_cost || required_outer) to if (startup_cost >= old_path->startup_cost || !consider_startup) So, if required_outer=false and both p->_startup=false, we get consider_startup=false irrespectedly of the required_outer value, so (!consider_startupe) != required_outer so that the conditions are not equivalent. And indeed, by reverting the if condition to the previous form, we get the same plans as on master. I don't know whether this is correct behavior or not, but in all three cases I've observed on TPC-DS, the new plans performed better Q1 Q2 Q3 -------------------------------------------------------------------- master 518.330 ms 466.539 ms 560.585 ms patched 405.310 ms 340.506 ms 122.072 ms This time I've been careful not to produce bogus numbers (also, the explain analyze timing I mentioned in previous post were significantly influenced by timing overhead). These results are quite stable. A few more comments on the patch: 1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like this? I mean, if it's a parametric query, use any of the flags? And same in add_path_precheck()? #define CONSIDER_PATH_STARTUP_COST(p) \ ((p)->param_info == NULL ? (p)->parent->consider_startup : ((p)->parent->consider_param_startup || (p)->parent->consider_startup)) 2) Is it really possible to get different values for the startup flags on path1 and path2 in compare_path_costs_fuzzily()? If not (if I get it right, path1->parent == path2->parent anyway), then maybe passing that as a function parameter (just like before) would be better. The macro probably won't be as nice, but I find it easier to understand. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I wrote: > I'm a bit disturbed by that, because AFAICS from the plans, these queries > did not involve any semi or anti joins, which should mean that the patch > would not have changed the planner's behavior. After contemplating my navel for awhile, it occurred to me that maybe the patch's changes in add_path_precheck could have caused behavioral changes at the margins. Could you try a run without that (without the last two hunks in pathnode.c)? regards, tom lane
On 06/02/15 21:39, Tom Lane wrote: > I wrote: >> I'm a bit disturbed by that, because AFAICS from the plans, these queries >> did not involve any semi or anti joins, which should mean that the patch >> would not have changed the planner's behavior. > > After contemplating my navel for awhile, it occurred to me that maybe the > patch's changes in add_path_precheck could have caused behavioral changes > at the margins. Could you try a run without that (without the last two > hunks in pathnode.c)? Yes, I think that's the cause. See the end of the message I sent just before you posted this. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > So, if required_outer=false and both p->_startup=false, we get > consider_startup=false irrespectedly of the required_outer value, so > (!consider_startupe) != required_outer > so that the conditions are not equivalent. And indeed, by reverting the > if condition to the previous form, we get the same plans as on master. Ah, I see we arrived at the same conclusions independently. > I don't know whether this is correct behavior or not, but in all three > cases I've observed on TPC-DS, the new plans performed better Hm. In principle, add_path_precheck shouldn't be doing anything except rejecting paths that would get rejected anyway later on. However it strikes me that there's a logic error in it; specifically, this simplifying assumption is wrong: * For speed, we make exact rather than fuzzy cost comparisons. If an * old path dominates the new path exactlyon both costs, it will * surely do so fuzzily. The old path's cost might be fuzzily the same, not fuzzily better. This is less likely to be an issue for total cost (since even if they're fuzzily the same at this point, they probably won't be after we get done adding more components to the new path's cost estimate). But it could well be an issue for startup cost since that's often exactly zero. So I guess this must be rejecting or letting through a slightly different set of non-parameterized paths than it did before. I'm not prepared to assume the results are always superior, because we haven't fixed the actual logic oversight. > 1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like > this? I mean, if it's a parametric query, use any of the flags? No. The paths are parametric, not the query as a whole. > 2) Is it really possible to get different values for the startup > flags on path1 and path2 in compare_path_costs_fuzzily()? If we were comparing a parameterized and an unparameterized path, it might be that one's startup cost is of interest while the other one's isn't. This is written to let a path win on startup cost as long as its startup cost is actually of interest. > If not > (if I get it right, path1->parent == path2->parent anyway), then > maybe passing that as a function parameter (just like before) > would be better. The macro probably won't be as nice, but I find > it easier to understand. Well, we could pass the parent RelOptInfo explicitly instead of passing the two flags, but I don't find that to be an improvement particularly. regards, tom lane
I wrote: > Hm. In principle, add_path_precheck shouldn't be doing anything except > rejecting paths that would get rejected anyway later on. However it > strikes me that there's a logic error in it; specifically, this > simplifying assumption is wrong: > * For speed, we make exact rather than fuzzy cost comparisons. If an > * old path dominates the new path exactly on both costs, it will > * surely do so fuzzily. After staring at this awhile longer, I still don't see exactly why it's changing the results. Do we have instructions around here anyplace on how to set up/use TPC-DS? I couldn't find anything about it on the wiki ... regards, tom lane
On 06/02/15 23:27, Tom Lane wrote: > I wrote: >> Hm. In principle, add_path_precheck shouldn't be doing anything except >> rejecting paths that would get rejected anyway later on. However it >> strikes me that there's a logic error in it; specifically, this >> simplifying assumption is wrong: >> * For speed, we make exact rather than fuzzy cost comparisons. If an >> * old path dominates the new path exactly on both costs, it will >> * surely do so fuzzily. > > After staring at this awhile longer, I still don't see exactly why > it's changing the results. So the required_outer vs. (!consider_startup) inconsistency does not explain that? > Do we have instructions around here anyplace on how to set up/use > TPC-DS? I couldn't find anything about it on the wiki ... Not that I'm aware of, but it's not really all that difficult. 1) get the TPC-DS toolkit from http://www.tpc.org/information/current_specifications.asp 2) build the 'tools' directory (just "make") 3) generate data "dsdgen -scale 1" (1GB dataset) 4) fix the data file formatting (pipe at the end) for f in *.dat; do sed 's/|$//g' $f > ${f/dat/csv}; done; 5) create database structure (attached 'create.sql') 6) load data (attached 'load.sql', fix paths first) 7) indexes.sql 8) run queries.sql or explain.sql -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 06/02/15 23:27, Tom Lane wrote: >> Do we have instructions around here anyplace on how to set up/use >>> TPC-DS? I couldn't find anything about it on the wiki ... > Not that I'm aware of, but it's not really all that difficult. > [ instructions... ] Thanks. I added some debugging printouts to trace the logic, and found that once in awhile we get cases like this: patch would change precheck result to fail for path cost 133752.12..136070.23 vs old path 135378.72..135550.22 phantom path causes rejection of old path with cost 135378.72..135550.22 phantom path has final cost 133752.12..136161.64, accepted That is, we have an unparameterized path whose (initial estimate of) total cost is just slightly more than the total cost of some old path, but whose startup cost is less. The existing logic in add_path_precheck allows construction of this path to proceed. If its finished total cost is still within 1% of the old path's cost, it is able to supplant the old path on the grounds that the total costs are fuzzily the same while its startup cost is less. The patch's change to add_path_precheck causes the startup cost to be disregarded so that the new path is rejected immediately, and the path replacement doesn't happen. Now, what this demonstrates is that add_path_precheck is not operating as intended --- as I said earlier, it's supposed to just save path construction cycles, not change the outcome of any path replacement tests. What we ought to do is fix it to do the cost comparisons fuzzily, and then it should be possible for it to disregard startup cost when appropriate without changing the results beyond that. However making the comparisons fuzzy will certainly result in some plan changes at the margin, because it will now let through some paths that it should have let through and didn't. Some of those changes will be for the better and others for the worse. (I don't have any faith in your conclusion that the patch as-posted always results in better plans. There's no reason to believe that sub-one-percent differences in planner cost estimates will reliably predict real-world wins.) What it seems like we should do, if we want to back-patch this, is apply it without the add_path_precheck changes. Then as an independent HEAD-only patch, change add_path_precheck so that it's behaving as designed. It looks to me like that will save some planning time in any case --- changing add_path_precheck to disregard startup cost when appropriate seems to let it reject a lot more paths than it used to. regards, tom lane
On Tue, Jun 2, 2015 at 10:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What it seems like we should do, if we want to back-patch this, is apply > it without the add_path_precheck changes. Then as an independent > HEAD-only patch, change add_path_precheck so that it's behaving as > designed. It looks to me like that will save some planning time in any > case --- changing add_path_precheck to disregard startup cost when > appropriate seems to let it reject a lot more paths than it used to. I'd just like to mention that I really appreciate the time and thought that went into keeping the back-patched portion of this fix narrow. Thanks! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
FWIW, I've repeated the TPC-DS tests on a much larger data set (50GB) today, and I see that (a) 3f59be836c555fa679bbe0ec76de50a8b5cb23e0 (ANTI/SEMI join costing) changes nothing - there are some small cost changes,but only in plans involving semi/anti-joins (which is expected). Nevertheless, all the plans remain the same. (b) 3b0f77601b9f9f3a2e36a813e4cd32c00e0864d6 (add_path fixes) This changes join order in one of the queries, with lots of nested loops (this is the join order change we've seenin this thread). Anyway, this is mostly expected consequence of the add_path changes. So both changes seem fine. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services