Thread: index scan with index cond on first column doesn't recognize sort order of second column
index scan with index cond on first column doesn't recognize sort order of second column
From
Greg Stark
Date:
Here's a corner case where the optimizer is doing a redundant sort. I'm not sure if I'm doing something wrong or if it's just something the optimizer doesn't notice. The second index scan, the one on cache_foo, is on a two-column index. Since it has an Index Cond on the first column, it's effectively scanning in the order of the second column. That second column is precisely the join condition, so it could do a merge join without an extra sort. It's actually doing the merge join but it's doing a useless sort first. db=> explain analyze select * from foo_bar join cache_foo using (foo_id) where key_id = 839; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=4053.86..5060.00 rows=2641 width=32) (actual time=111.47..562.41 rows=8640 loops=1) Merge Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using idx_foo_bar_foo on foo_bar (cost=0.00..853.34 rows=45288 width=8) (actual time=0.03..239.75 rows=45140loops=1) -> Sort (cost=4053.86..4060.46 rows=2640 width=24) (actual time=111.37..121.70 rows=8641 loops=1) Sort Key: cache_foo.foo_id -> Index Scan using idx_cache_foo_foo on cache_foo (cost=0.00..3903.82 rows=2640 width=24) (actual time=0.05..47.48rows=8666 loops=1) Index Cond: (key_id = 839) Total runtime: 577.10 msec (8 rows) Time: 580.41 ms db=> \d cache_foo Table "public.cache_foo" Column | Type | Modifiers -------------------+------------------+----------- key_id | integer | foo_id | integer | Indexes: idx_cache_foo_foo btree (key_id, foo_id) [Sorry, but I have to search+replace on the names at the client's request] -- greg
On 13 Feb 2003, Greg Stark wrote: > Here's a corner case where the optimizer is doing a redundant sort. I'm not > sure if I'm doing something wrong or if it's just something the optimizer > doesn't notice. I'm guessing that it doesn't realize that in this case the sort is redundant since I think it's only necessarily redundant for = singlevalue with no ors.
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Greg Stark
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On 13 Feb 2003, Greg Stark wrote: > > > Here's a corner case where the optimizer is doing a redundant sort. I'm not > > sure if I'm doing something wrong or if it's just something the optimizer > > doesn't notice. > > I'm guessing that it doesn't realize that in this case the sort is > redundant since I think it's only necessarily redundant for = singlevalue > with no ors. I'm not sure. reading the code there does seem to be a special code path for ors anyways. This codepath claims to be for non-'or' restriction clauses. I'm not clear on what truncate_useless_pathkeys is doing. At a guess it's keeping just the prefix, rather than keeping just the suffix. I think at least for purposes of merge joins, that this is precisely the wrong thing to do. Of course in all likelihood that just means I've misinterpreted the code. If it were changed to just keep the suffix would that break other things? /* * 2. Match the index against non-'or' restriction clauses. */ restrictclauses = group_clauses_by_indexkey(rel, index); /* * 3. Compute pathkeys describing index's ordering, if any, then * see how many of them are actually useful for this query. */ index_pathkeys = build_index_pathkeys(root, rel, index, ForwardScanDirection); index_is_ordered = (index_pathkeys != NIL); useful_pathkeys = truncate_useless_pathkeys(root, rel, index_pathkeys); Incidentally, I tried being cleverer and it still doesn't notice the index path ordering. I think because of the truncate_useless_pathkeys above. db=> explain select * from cache_foo join (select *, 893 as key_id from foo_bar) as x using (key_id,foo_id) ; QUERY PLAN ----------------------------------------------------------------------------------------------------- Merge Join (cost=4107.53..5119.85 rows=2674 width=40) Merge Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using idx_foo_bar_foo on foo_bar (cost=0.00..859.04 rows=45288 width=8) -> Sort (cost=4107.53..4114.21 rows=2673 width=32) Sort Key: cache_foo.foo_id -> Index Scan using idx_cache_foo_foo on cache_foo (cost=0.00..3955.38 rows=2673 width=32) Index Cond: (key_id = 893) -- greg
On 13 Feb 2003, Greg Stark wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > On 13 Feb 2003, Greg Stark wrote: > > > > > Here's a corner case where the optimizer is doing a redundant sort. I'm not > > > sure if I'm doing something wrong or if it's just something the optimizer > > > doesn't notice. > > > > I'm guessing that it doesn't realize that in this case the sort is > > redundant since I think it's only necessarily redundant for = singlevalue > > with no ors. > > I'm not sure. reading the code there does seem to be a special code path for > ors anyways. This codepath claims to be for non-'or' restriction clauses. Right, but it's more than just 'or' restrictions. It's any case where the condition returns more than one value of the first column that causes the sort to be non-redundant AFAICS. I'm at work so I can't really spend time going through code, so I don't know if the code you're pointing to only applies to conditions using equality, but I think greater than, less than, would be examples where it'd not be redundant.
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > I'm not clear on what truncate_useless_pathkeys is doing. It's throwing away pathkey info that isn't relevant for the current query (and would cause the planner to consider a multi-column-index scan as better-ordered than a scan of an index with fewer columns, when no such thing is true if the extra columns aren't relevant). > If it were changed to just keep the suffix would that break other things? Yes, that would be completely backwards. regards, tom lane
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > I'm not clear on what truncate_useless_pathkeys is doing. > > It's throwing away pathkey info that isn't relevant for the current > query (and would cause the planner to consider a multi-column-index > scan as better-ordered than a scan of an index with fewer columns, > when no such thing is true if the extra columns aren't relevant). But it might be better ordered. I think you may have missed the original context. In a query like: SELECT * FROM table_a join table_b using (col2) WHERE a.col1 = $0 Where there's an index on table_a(col1,col2) could use a merge join without a sort. Having truncate_useless_pathkeys removing the prefix means removing col2 from the pathkeys leaving col1. In fact once the restriction is taken into account the ordering of the path is in fact simply col2, which is just what's needed for the mergejoin. > > If it were changed to just keep the suffix would that break other things? > > Yes, that would be completely backwards. I'm not clear on how removing the suffix is ever useful. Even in a query like: SELECT * FROM table_a join table_b using (col1,col2) WHERE a.col1 between $0 and $1 The suffix is still relevant, the merge join is still possible. -- greg
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > But it might be better ordered. I think you may have missed the original > context. In a query like: > SELECT * FROM table_a join table_b using (col2) WHERE a.col1 = $0 > Where there's an index on table_a(col1,col2) could use a merge join without a > sort. Yes. The appropriate way to recognize that (without breaking things) is to consider that the indexscan generates pathkeys (col2) because col1 can be assumed constant. This is not truncate_useless_pathkeys's business, however. It would have to be done in the code that creates the indexscan path to begin with. (Ideally, we'd mark the path to show that it could be considered to have either sort ordering (col1, col2) or sort ordering (col2), so that it could match to either requirement. I'm not sure if that's feasible with the current datastructures though. Might have to make two copies of the path :-() > I'm not clear on how removing the suffix is ever useful. It reduces the number of alternatives that add_path will keep track of, thus making for a combinatorial reduction in the cost of planning. Actually you are barking up the wrong tree entirely; I'm pretty certain that truncate_useless_pathkeys *doesn't* remove this pathkey, because it should notice that it is relevant to the mergejoin condition. regards, tom lane
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > But it might be better ordered. I think you may have missed the original > > context. In a query like: > > SELECT * FROM table_a join table_b using (col2) WHERE a.col1 = $0 > > Where there's an index on table_a(col1,col2) could use a merge join without a > > sort. > > Yes. The appropriate way to recognize that (without breaking things) > is to consider that the indexscan generates pathkeys (col2) because col1 > can be assumed constant. This is not truncate_useless_pathkeys's > business, however. It would have to be done in the code that creates > the indexscan path to begin with. Er, right, I think that's the code I originally quoted from indxpath.c: /* * 2. Match the index against non-'or' restriction clauses. */ restrictclauses = group_clauses_by_indexkey(rel, index); /* * 3. Compute pathkeys describing index's ordering, if any, then * see how many of them are actually useful for this query. */ index_pathkeys = build_index_pathkeys(root, rel, index, ForwardScanDirection); index_is_ordered = (index_pathkeys != NIL); useful_pathkeys = truncate_useless_pathkeys(root, rel, index_pathkeys); > (Ideally, we'd mark the path to show that it could be considered to have > either sort ordering (col1, col2) or sort ordering (col2), so that it could > match to either requirement. I'm not sure if that's feasible with the > current datastructures though. Might have to make two copies of the path > :-() I don't think that's a problem because the optimizer seems to be doing a pretty good job of constant propagation and it tracks down the transitive closure of all the equijoin clauses. So if we have a col1=$0 clause for one table then any other tables that are equijoined on col1 will also have a col1=$0 clause. I'm hoping that's been done already, I'm not sure. > Actually you are barking up the wrong tree entirely; I'm pretty certain > that truncate_useless_pathkeys *doesn't* remove this pathkey, because it > should notice that it is relevant to the mergejoin condition. <tries gdb out on postgres> Hm... <rebuilds postgres with -O0> Hm... No, truncate_useless_pathkeys actually returns NULL. It appears it decides the pathkey is actually completely useless for the mergejoin and both pathkeys_useful_for_merging and pathkeys_useful_for_ordering return 0. I guess ltruncate returns NULL if the first argument is 0. I think indxpath.c has to try truncate_useless_pathkeys and if it returns null try removing any leading elements that it has an = clause for. It should try truncate_useless_pathkeys after removing each element. I think it can just stop after finding the first match. Basic postgres mm question, is there a cdr function, and is it safe to leave references to the cdr of this list around? It all gets cleaned up with a ripcord deallocator? -- greg
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > [ other stuff I'll respond to later ] > Basic postgres mm question, is there a cdr function, lfirst() == car(). lnext() == cdr(). Practically all the basic planner datastructures were lifted from an original implementation in Lisp, and other than these gratuitous(?) renamings, the semantics are still the same. > It all gets cleaned up with a ripcord deallocator? Right. Garbage collection r us; don't worry about leaking a cons cell here or there. regards, tom lane
Re: index scan with index cond on first column doesn't recognize sort order of second column
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Actually you are barking up the wrong tree entirely; I'm pretty certain >> that truncate_useless_pathkeys *doesn't* remove this pathkey, because it >> should notice that it is relevant to the mergejoin condition. > No, truncate_useless_pathkeys actually returns NULL. Mmm, you're right. It would accept the second pathkey if it got to it --- but it doesn't, because the first pathkey (col1) isn't useful for any merge condition. Still, truncate_useless_pathkeys isn't the place to be dealing with this issue. It's not in a position to generate multiple interpretations of the same path, which is what we are really after here. regards, tom lane