Thread: Missing constant propagation in planner on hash quals causes joinslowdown
Hi Hackers, By optimising our application I stumbled over the join quals used very often in our application. In general this concerns datasets, which are subdivided into chunks, like years, seasons (here half a year), multiple tenantsin OLTP system etc. In these cases many tables are joined only to data of the same chunk, identified always by a separating column in the table(here: xx_season). (I tested it on PG 12.0 on Windows 64 bit, but similar results on older stable releases and other OS) Here is the test case, also in the attached file: (I choose to join 2 tables with 2 seasons (2 and 3) of about 1 million of records for every season. I put some randomnessin the table creation to simulate the normal situation in OLTP systems) ----------------------------------- Source start drop table if exists tmaster; create table tmaster ( id_t1 integer, t1_season integer, t1_id_t2 integer, t1_value integer, t1_cdescr varchar, primary key (id_t1) ); -- select setseed (0.34512); insert into tmaster select inum ,iseason ,row_number () over () as irow ,irandom ,'TXT: '||irandom::varchar from ( select inum::integer ,((inum>>20)+2)::integer as iseason ,inum::integer + (500000*random())::integer as irandom from generate_series (1,(1<<21)) as inum order by irandom )qg ; alter table tmaster add constraint uk_master_season_id unique (t1_season,id_t1); drop table if exists tfact; create table tfact ( id_t2 integer, t2_season integer, t2_value integer, t2_cdescr varchar, primary key (id_t2) ); -- select setseed (-0.76543); insert into tfact select qg.* ,'FKT: '||irandom::varchar from ( select inum::integer ,((inum>>20)+2)::integer as iseason ,inum::integer + (500000*random())::integer as irandom from generate_series (1,(1<<21)) as inum order by irandom )qg ; alter table tfact add constraint uk_fact_season_id unique (t2_season,id_t2); ----------------- -- slower: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=t1_season where t1_season=3 ; -- faster by setting a constant in left join on condition: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season where t1_season=3 ; ----------------------------------- Source end The results for the first query: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=t1_season where t1_season=3 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Left Join (cost=53436.01..111610.15 rows=1046129 width=52) (actual time=822.784..2476.573 rows=1048576 loops=1) Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr, tfact.id_t2, tfact.t2_season,tfact.t2_value, tfact.t2_cdescr Inner Unique: true Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2)) Buffers: shared hit=2102193, temp read=10442 written=10442 -> Index Scan using uk_master_season_id on public.tmaster (cost=0.43..32263.38 rows=1046129 width=28) (actual time=0.008..565.222rows=1048576 loops=1) Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr Index Cond: (tmaster.t1_season = 3) Buffers: shared hit=1051086 -> Hash (cost=31668.49..31668.49 rows=1043473 width=24) (actual time=820.960..820.961 rows=1048576 loops=1) Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Buckets: 524288 (originally 524288) Batches: 4 (originally 2) Memory Usage: 28673kB Buffers: shared hit=1051107, temp written=4316 -> Index Scan using uk_fact_season_id on public.tfact (cost=0.43..31668.49 rows=1043473 width=24) (actual time=0.024..598.648rows=1048576 loops=1) Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Index Cond: (tfact.t2_season = 3) Buffers: shared hit=1051107 Settings: effective_cache_size = '8GB', random_page_cost = '1', temp_buffers = '32MB', work_mem = '32MB' Planning Time: 0.627 ms Execution Time: 2502.702 ms (20 rows) and for the second one: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season where t1_season=3 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Left Join (cost=50827.33..106255.38 rows=1046129 width=52) (actual time=758.086..2313.175 rows=1048576 loops=1) Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr, tfact.id_t2, tfact.t2_season,tfact.t2_value, tfact.t2_cdescr Inner Unique: true Hash Cond: (tmaster.t1_id_t2 = tfact.id_t2) Buffers: shared hit=2102193, temp read=9024 written=9024 -> Index Scan using uk_master_season_id on public.tmaster (cost=0.43..32263.38 rows=1046129 width=28) (actual time=0.009..549.793rows=1048576 loops=1) Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr Index Cond: (tmaster.t1_season = 3) Buffers: shared hit=1051086 -> Hash (cost=31668.49..31668.49 rows=1043473 width=24) (actual time=756.125..756.125 rows=1048576 loops=1) Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Buckets: 524288 Batches: 4 Memory Usage: 18711kB Buffers: shared hit=1051107, temp written=4317 -> Index Scan using uk_fact_season_id on public.tfact (cost=0.43..31668.49 rows=1043473 width=24) (actual time=0.025..584.652rows=1048576 loops=1) Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Index Cond: (tfact.t2_season = 3) Buffers: shared hit=1051107 Settings: effective_cache_size = '8GB', random_page_cost = '1', temp_buffers = '32MB', work_mem = '32MB' Planning Time: 0.290 ms Execution Time: 2339.651 ms (20 rows) By replacing the =t1_season with =3 the query took about 160 ms less or about 7 percent. Both queries are logically equivalent. The planner correctly identifies the Index Cond: (tfact.t2_season = 3) for selectingfrom the index uk_fact_season_id. But in the slower query the outer hash condition still hashes with the column t1_season and t2_season as in Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2)). This can only be detected with explain analyze verbose when the hash cond are shown. The first query notation with and t2_season=t1_season is much more natural, as it requires only one numerical constant toget good query Speed (often many fact tables are joined). The inclusion of the xx_season quals reduces the processed dataset and helps also when the seasons columns are used for listpartitioning of all the involved tables. When omitting it, the whole fact table will be joined. To me it seems that the "constantness" is not propagated to all equivalence columns and not considered in hash joining. Unfortunely I am not in the position to write a patch, so I would appreciate any help to get this optimization realized. Much thanks in advance Hans Buschmann
Attachment
Re: Missing constant propagation in planner on hash quals causesjoin slowdown
From
Tomas Vondra
Date:
On Fri, Oct 18, 2019 at 03:40:34PM +0000, Hans Buschmann wrote: > > ... > >Both queries are logically equivalent. The planner correctly identifies >the Index Cond: (tfact.t2_season = 3) for selecting from the index >uk_fact_season_id. > Are those queries actually equivalent? I've been repeatedly bitten by nullability in left join queries, when playing with optimizations like this, so maybe this is one of such cases? This seems to be happening because distribute_qual_to_rels() does this: ... else if (bms_overlap(relids, outerjoin_nonnullable)) { /* * The qual is attached to an outer join and mentions (some of the) * rels on the nonnullable side, so it's not degenerate. * * We can't use such a clause to deduce equivalence (the left and * right sides might be unequal above the join because one of them has * gone to NULL) ... but we might be able to use it for more limited * deductions, if it is mergejoinable. So consider adding it to the * lists of set-aside outer-join clauses. */ is_pushed_down = false; ... } ... and the clause does indeed reference the nullable side of the join, preventing us from marking the clause as pushed-down. I haven't managed to construct a query that would break this, though. I.e. a case where the two queries would give different results. So maybe those queries actually are redundant. Or maybe the example would need to be more complicated (requiring more joins, or something like that). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
AW: Missing constant propagation in planner on hash quals causes joinslowdown
From
Hans Buschmann
Date:
Thanks for looking at it. I think these two queries are equivalent, as shown by the explain. In both cases the index scan only selects tuples with xx_season=3 as shown in both explains: Index Cond: (tmaster.t1_season = 3) Index Cond: (tfact.t2_season = 3) So no tuple can have a null value for xx_season. My point is the construction of the hash table, wich includes the t2_season even if it is constant and not null. From explain: with overhead: Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2)) optimized: Hash Cond: (tmaster.t1_id_t2 = tfact.id_t2) The planner correctly sets the index conditions (knows that the xx_season columns are constant), but fails to apply thisconstantness to the hash conditions by discarding a constant column in a hash table. In my real application most of the xx_season columns are declared not null, but this should not change the outcome. The performance difference is slightly lower when the created tables are previously analyzed (what I forgot). But the percentual gain is much higher considering only the construction of the hash table, the only part of the query executionaltered by this optimization. In my opinion this scenario could be quite common in multi-tenant cases, in logging, time based data sets etc. I tried to look at the pg source code but could not yet find the place where the hash conditions are selected and potentiallytested. When optimizing the constants away there my be a special case where all hash conditions are constants, so a hash table hasnot to be build (or at least one hash cond has to be preserved). Hans Buschmann
Re: AW: Missing constant propagation in planner on hash quals causes join slowdown
From
Tom Lane
Date:
Hans Buschmann <buschmann@nidsa.net> writes: > The planner correctly sets the index conditions (knows that the xx_season columns are constant), but fails to apply thisconstantness to the hash conditions by discarding a constant column in a hash table. Yeah. The reason for this behavior is that, after reconsider_outer_join_clauses has decided that it can derive tfact.t2_season = 3 from the given conditions, it still "throws back" the original join condition tmaster.t1_season = tfact.t2_season into the set of ordinary join clauses, so that that condition will still get applied at the time the join is computed. The comment about it claims * If we don't find any match for a set-aside outer join clause, we must * throw it back into the regular joinclause processing by passing it to * distribute_restrictinfo_to_rels(). If we do generate a derived clause, * however, the outer-join clause is redundant. We still throw it back, * because otherwise the join will be seen as a clauseless join and avoided * during join order searching; but we mark it as redundant to keep from * messing up the joinrel's size estimate. However, this seems to be a lie, or at least not the whole truth. If you try diking out the throw-back logic, which is simple enough to do, you'll immediately find that some test cases in join.sql give the wrong answers. The reason is that once we've derived tfact.t2_season = 3 and asserted that that's an equivalence condition, the eclass logic believes that tmaster.t1_season = tfact.t2_season must hold everywhere (that's more or less the definition of an eclass). But *it doesn't hold* above the left join, because tfact.t2_season could be null instead. In particular this can break any higher joins involving tfact.t2_season. By treating tmaster.t1_season = tfact.t2_season as an ordinary join clause we force the necessary tests to be made anyway, independently of the eclass logic. (There's no bug in Hans' example because there's only one join; the problem is not really with this particular clause, but with other columns that might also be thought equal to tfact.t2_season. It's those upper join clauses that can't safely be thrown away.) I've had a bee in my bonnet for a long time about redesigning all this to be less klugy. Fundamentally what we lack is an honest representation that a given value might be NULL instead of the original value from its table; this results in assorted compromises both here and elsewhere in the planner. The rough sketch that's been lurking in my hindbrain is 1) Early in the planner where we flatten join alias variables, do so only when they actually are formally equivalent to the input variables, ie only for the non-nullable side of any outer join. This gives us the desired representation distinction between original and possibly-nulled values: the former are base-table Vars, the latter are join Vars. 2) tmaster.t1_season = tfact.t2_season could be treated as an honest equivalence condition *between those variables*, but it would not imply anything about the join output variable "j.t2_season". I think reconsider_outer_join_clauses goes away entirely. 3) Places where we're trying to estimate variable values would have to be taught to look through unflattened join alias variables to see what they're based on. For extra credit they could assign some higher-than-normal probability that the value is null (though that could be done later, since there's certainly nothing accounting for that today). 4) The final flattening of these alias variables would probably not happen till setrefs.c. 5) I have not thought through what this implies for full joins. The weird COALESCE business might go away entirely, or it might be something that setrefs.c still has to insert. 6) There are lots of other, related kluges that could stand to be revisited at the same time. The business with "outer-join delayed" clauses is a big example. Maybe that all just magically goes away once we have a unique understanding of what value a Var represents, but I've not thought hard about it. We'd certainly need some new understanding of how to schedule outer-join clauses, since their Var membership would no longer correspond directly to sets of baserels. There might be connections to the "PlaceHolderVar" mess as well. This is a pretty major change and will doubtless break stuff throughout the planner. I'm also not clear on the planner performance implications; both point (3) and the need for two rounds of alias flattening seem like they'd slow things down. But maybe we could buy some speed back by eliminating kluges, and there is a hope that we'd end up with better plans in a useful number of cases. Anyway, the large amount of work involved and the rather small benefits we'd probably get have discouraged me from working on this. But maybe someday it'll get to the top of the queue. Basically this is all technical debt left over from the way we bolted outer joins onto the original planner design, so we really oughta fix it sometime. regards, tom lane