Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path |
Date | |
Msg-id | ed041b8e-1440-a09a-3632-85b3985df598@enterprisedb.com Whole thread Raw |
In response to | Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path |
List | pgsql-hackers |
Pushed, after clarifying the comments a bit. I also looked into what would it take to consider incremental paths, and in principle it doesn't seem all that complicated. The attached PoC patch extends get_cheapest_fractional_path_for_pathkeys() to optionally build incremental sort on paths if needed. There are two GUCs that make experimenting simpler: * enable_fractional_paths - disables fractional paths entirely, i.e. we get behavior without the part I already pushed * enable_fractional_incremental_paths - disables the incremental sort part added by the attached patch With this, I get this plan (see the test in partitionwise_join.sql) test=# EXPLAIN (COSTS OFF) test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 ASC, id2 ASC LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------ Limit -> Merge Left Join Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2)) -> Append -> Index Only Scan using fract_t0_id1_id2_idx on fract_t0 x_1 -> Incremental Sort Sort Key: x_2.id1, x_2.id2 Presorted Key: x_2.id1 -> Index Scan using fract_t1_pkey on fract_t1 x_2 -> Materialize -> Append -> Incremental Sort Sort Key: y_1.id1, y_1.id2 Presorted Key: y_1.id1 -> Index Scan using fract_t0_pkey on fract_t0 y_1 Index Cond: (id1 = id1) Filter: (id2 = id2) -> Incremental Sort Sort Key: y_2.id1, y_2.id2 Presorted Key: y_2.id1 -> Index Scan using fract_t1_pkey on fract_t1 y_2 Index Cond: (id1 = id1) Filter: (id2 = id2) (23 rows) instead of QUERY PLAN ------------------------------------------------------------------------------ Limit -> Incremental Sort Sort Key: x.id1, x.id2 Presorted Key: x.id1 -> Merge Left Join Merge Cond: (x.id1 = y.id1) Join Filter: (x.id2 = y.id2) -> Append -> Index Scan using fract_t0_pkey on fract_t0 x_1 -> Index Scan using fract_t1_pkey on fract_t1 x_2 -> Materialize -> Append -> Index Scan using fract_t0_pkey on fract_t0 y_1 -> Index Scan using fract_t1_pkey on fract_t1 y_2 (14 rows) i.e. the incremental sorts moved below the merge join (and the cost is lower, but that's not shown here). So that seems reasonable, but there's a couple issues too: 1) Planning works (hence we can run explain), but execution fails because of segfault in CheckVarSlotCompatibility - it gets NULL slot for some reason. I haven't looked into the details, but I'd guess we need to pass a different rel to create_incrementalsort_path, not childrel. 2) enable_partitionwisejoin=on seems to cause some confusion, because it results in picking a different plan with higher cost. But that's clearly not because of this patch. 3) I'm still a bit skeptical about the cost of this implementation - it builds the incrementalsort path, just to throw most of the paths away. It'd be much better to just calculate the cost, which should be much cheaper, and only do the heavy work for the one "best" path. 4) One thing I did not realize before is what pathkeys we even consider here. Imagine you have two tables: CREATE TABLE t1 (a int, b int, primary key (a)); CREATE TABLE t2 (a int, b int, primary key (a)); and query SELECT * FROM t1 JOIN t2 USING (a,b); It seems reasonable to also consider pathkeys (a,b) because that's make e.g. mergejoin much cheaper, right? But no, we'll not do that - we only consider pathkeys that at least one child relation has, so in this case it's just (a). Which means we'll never consider incremental sort for this particular example. It's a bit strange, because it's enough to create index on (a,b) for just one of the relations, and it'll suddenly consider building incremental sort on both sides. I don't plan to pursue this further at this point, so I pushed the first part because that's a fairly simple improvement over what we have now. But it seems like a fairly promising area for improvements. Also, the non-intuitive effects of enabling partition-wise joins (i.e. picking plans with higher estimated costs) is something worth exploring, I guess. But that's a separate issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: