Thread: I have a question about using index in order statement.
Question:=20 I have a question about using index in order statement. Why index ix_2 work by Seq Scan and index ix_3 work by Index Scan. Example : ix_2 condition : When I try explain select * from a_test=20 order by code_ desc Postgresql response=20 Sort (cost=3D100001815.08..100001852.56 rows=3D14990 width=3D56) Sort Key: code_ -> Seq Scan on a_test (cost=3D100000000.00..100000260.90 rows=3D14990= width=3D56) ix_3 condition : When I try explain select * from a_test=20 order by lower(code_) desc Postgresql response=20 Index Scan using ix_3 on a_test (cost=3D0.00..769.27 rows=3D14990 widt= h=3D18) =20=20=20=20 Table schema : CREATE TABLE a_test ( t_key_ bigint NOT NULL, code_ character varying(15) ) WITH (OIDS=3DTRUE); ALTER TABLE a_test OWNER TO postgres; CREATE INDEX ix_2 ON a_test USING btree (code_ DESC); CREATE INDEX ix_3 ON a_test USING btree (lower(code_::text) DESC);
kevin wrote: > Question: > I have a question about using index in order statement. > Why index ix_2 work by Seq Scan and index ix_3 work by Index Scan. > > Example : > > ix_2 condition : > When I try > > explain > select * from a_test > order by code_ desc > > Postgresql response > Sort (cost=100001815.08..100001852.56 rows=14990 width=56) > Sort Key: code_ > -> Seq Scan on a_test (cost=100000000.00..100000260.90 rows=14990 width=56) > > ix_3 condition : > When I try > > explain > select * from a_test > order by lower(code_) desc > > Postgresql response > Index Scan using ix_3 on a_test (cost=0.00..769.27 rows=14990 width=18) Thanks for the report. This seems to have been broken by this patch back in May: http://archives.postgresql.org/pgsql-committers/2007-05/msg00394.php that wraps pathkey expressions with a relabel node. Because of that, get_eclass_for_sort_expr doesn't recognize that the ordering of the index matches that of the query. Attached is a patch that fixes that test case. I'm not very familiar with that piece of code, though, and I have a sneaking suspicion that the patch is either not general enough, there may be other places where we should ignore relabel nodes, or it brakes something else. I'm surprised this hasn't been noticed before. It doesn't happen with text datatype, but varchar is very common datatype as well. PS. Kevin, in the future, please specify which PostgreSQL version you're using. The fact that the above DDL statements don't work until 8.3beta releases gave it away this time :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/optimizer/path/equivclass.c =================================================================== RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/optimizer/path/equivclass.c,v retrieving revision 1.3 diff -c -r1.3 equivclass.c *** src/backend/optimizer/path/equivclass.c 7 Jul 2007 20:46:45 -0000 1.3 --- src/backend/optimizer/path/equivclass.c 2 Nov 2007 11:48:12 -0000 *************** *** 373,378 **** --- 373,388 ---- EquivalenceMember *newem; ListCell *lc1; MemoryContext oldcontext; + Expr *stripped_expr; + + /* + * Strip any relabel nodes first; they're not meaningful + * for ordering purposes. + */ + if (IsA(expr, RelabelType)) + stripped_expr = ((RelabelType *)expr)->arg; + else + stripped_expr = expr; /* * Scan through the existing EquivalenceClasses for a match *************** *** 390,395 **** --- 400,406 ---- foreach(lc2, cur_ec->ec_members) { EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2); + Expr *em_expr; /* * If below an outer join, don't match constants: they're not *************** *** 399,406 **** cur_em->em_is_const) continue; if (expr_datatype == cur_em->em_datatype && ! equal(expr, cur_em->em_expr)) return cur_ec; /* Match! */ } } --- 410,421 ---- cur_em->em_is_const) continue; + em_expr = cur_em->em_expr; + if (IsA(em_expr, RelabelType)) + em_expr = ((RelabelType *)em_expr)->arg; + if (expr_datatype == cur_em->em_datatype && ! equal(stripped_expr, em_expr)) return cur_ec; /* Match! */ } }
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Attached is a patch that fixes that test case. I'm not very familiar > with that piece of code, though, and I have a sneaking suspicion that > the patch is either not general enough, there may be other places where > we should ignore relabel nodes, or it brakes something else. If we were going to go this route, we'd need to uniformly make the equivclass.c code strip RelabelType from *all* expressions it deals with. Which might be a reasonable thing to do, since it always considers their types to be the declared input types of the operators, anyway. But it's a bigger patch than you have here. I'm a bit annoyed with the idea because I had hoped to move in the direction of getting rid of all the ad hoc RelabelType-stripping that the planner does in various places. A lot of it is pretty darn questionable from a semantic standpoint. However, because equivclass.c is only worried about opfamily semantics, any RelabelTypes on the input expressions don't matter, and so that objection doesn't apply. The basic reason that there's a problem here is that the parser is playing fast and loose by generating ORDER BY information that cites "text < text" as the sort operator but applies it to a bare varchar Var node. So I thought about a Plan B of changing the parser to put a correct RelabelType on the sort key. I'm not sure of all the implications of that, though, and you could argue that it's an initdb-forcing change (because stored rules involving ORDER BY on varchar columns would need to change to work right). Seems a bit late in the 8.3 cycle for that. I guess the right answer is to fix equivclass.c to strip RelabelTypes, and hope to maybe take that out again someday when all this gets cleaned up. Comments? regards, tom lane
Tom Lane wrote: > The basic reason that there's a problem here is that the parser is > playing fast and loose by generating ORDER BY information that cites > "text < text" as the sort operator but applies it to a bare varchar > Var node. So I thought about a Plan B of changing the parser to put > a correct RelabelType on the sort key. I'm not sure of all the > implications of that, though, and you could argue that it's an > initdb-forcing change (because stored rules involving ORDER BY on > varchar columns would need to change to work right). Seems a bit > late in the 8.3 cycle for that. I think mentioning it in the release notes would be enough. You would just have to recreate the rules to make them work again, right? > I guess the right answer is to fix equivclass.c to strip RelabelTypes, > and hope to maybe take that out again someday when all this gets cleaned > up. That certainly looks like the easier solution. We still strip RelabelTypes in many places anyway, so doing it in one more place doesn't seem too bad to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> I guess the right answer is to fix equivclass.c to strip RelabelTypes, >> and hope to maybe take that out again someday when all this gets cleaned >> up. > That certainly looks like the easier solution. We still strip > RelabelTypes in many places anyway, so doing it in one more place > doesn't seem too bad to me. I reconsidered this after further thought. If we do that it'd also mean that clauses generated from EquivalenceClasses don't have RelabelType, which is definitely going in the wrong direction in a big way. What I'm just about to experiment with is the idea that lines 498-517 of pathkeys.c are in the wrong place: that should be done within make_pathkey_from_sortinfo, so that it would also happen for expressions coming from SortClauses (see the other caller of make_pathkey_from_sortinfo). Then the expressions coming in to equivclass.c should always match. regards, tom lane