Thread: I have a question about using index in order statement.

I have a question about using index in order statement.

From
"kevin"
Date:
Question:=20
I have a question about using index in order statement.
Why index ix_2 work by Seq Scan and index ix_3 work by Index Scan.

Example :

ix_2 condition :
When I try

  explain
  select * from a_test=20
  order by code_ desc

Postgresql response=20
  Sort  (cost=3D100001815.08..100001852.56 rows=3D14990 width=3D56)
    Sort Key: code_
    ->  Seq Scan on a_test  (cost=3D100000000.00..100000260.90 rows=3D14990=
 width=3D56)

ix_3 condition :
When I try

  explain
  select * from a_test=20
  order by lower(code_) desc

Postgresql response=20
    Index Scan using ix_3 on a_test  (cost=3D0.00..769.27 rows=3D14990 widt=
h=3D18)
=20=20=20=20

Table schema :

CREATE TABLE a_test
(
  t_key_ bigint NOT NULL,
  code_ character varying(15)
)
WITH (OIDS=3DTRUE);
ALTER TABLE a_test OWNER TO postgres;

CREATE INDEX ix_2
  ON a_test
  USING btree
  (code_ DESC);

CREATE INDEX ix_3
  ON a_test
  USING btree
  (lower(code_::text) DESC);

Re: I have a question about using index in order statement.

From
Heikki Linnakangas
Date:
kevin wrote:
> Question:
> I have a question about using index in order statement.
> Why index ix_2 work by Seq Scan and index ix_3 work by Index Scan.
>
> Example :
>
> ix_2 condition :
> When I try
>
>   explain
>   select * from a_test
>   order by code_ desc
>
> Postgresql response
>   Sort  (cost=100001815.08..100001852.56 rows=14990 width=56)
>     Sort Key: code_
>     ->  Seq Scan on a_test  (cost=100000000.00..100000260.90 rows=14990 width=56)
>
> ix_3 condition :
> When I try
>
>   explain
>   select * from a_test
>   order by lower(code_) desc
>
> Postgresql response
>     Index Scan using ix_3 on a_test  (cost=0.00..769.27 rows=14990 width=18)

Thanks for the report. This seems to have been broken by this patch back
in May:

http://archives.postgresql.org/pgsql-committers/2007-05/msg00394.php

that wraps pathkey expressions with a relabel node. Because of that,
get_eclass_for_sort_expr doesn't recognize that the ordering of the
index matches that of the query.

Attached is a patch that fixes that test case. I'm not very familiar
with that piece of code, though, and I have a sneaking suspicion that
the patch is either not general enough, there may be other places where
we should ignore relabel nodes, or it brakes something else.

I'm surprised this hasn't been noticed before. It doesn't happen with
text datatype, but varchar is very common datatype as well.

PS. Kevin, in the future, please specify which PostgreSQL version you're
using. The fact that the above DDL statements don't work until 8.3beta
releases gave it away this time :-).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/optimizer/path/equivclass.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/optimizer/path/equivclass.c,v
retrieving revision 1.3
diff -c -r1.3 equivclass.c
*** src/backend/optimizer/path/equivclass.c    7 Jul 2007 20:46:45 -0000    1.3
--- src/backend/optimizer/path/equivclass.c    2 Nov 2007 11:48:12 -0000
***************
*** 373,378 ****
--- 373,388 ----
      EquivalenceMember *newem;
      ListCell   *lc1;
      MemoryContext oldcontext;
+     Expr *stripped_expr;
+
+     /*
+      * Strip any relabel nodes first; they're not meaningful
+      * for ordering purposes.
+      */
+     if (IsA(expr, RelabelType))
+         stripped_expr = ((RelabelType *)expr)->arg;
+     else
+         stripped_expr = expr;

      /*
       * Scan through the existing EquivalenceClasses for a match
***************
*** 390,395 ****
--- 400,406 ----
          foreach(lc2, cur_ec->ec_members)
          {
              EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
+             Expr *em_expr;

              /*
               * If below an outer join, don't match constants: they're not
***************
*** 399,406 ****
                  cur_em->em_is_const)
                  continue;

              if (expr_datatype == cur_em->em_datatype &&
!                 equal(expr, cur_em->em_expr))
                  return cur_ec;                    /* Match! */
          }
      }
--- 410,421 ----
                  cur_em->em_is_const)
                  continue;

+             em_expr = cur_em->em_expr;
+             if (IsA(em_expr, RelabelType))
+                 em_expr = ((RelabelType *)em_expr)->arg;
+
              if (expr_datatype == cur_em->em_datatype &&
!                 equal(stripped_expr, em_expr))
                  return cur_ec;                    /* Match! */
          }
      }

Re: I have a question about using index in order statement.

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Attached is a patch that fixes that test case. I'm not very familiar
> with that piece of code, though, and I have a sneaking suspicion that
> the patch is either not general enough, there may be other places where
> we should ignore relabel nodes, or it brakes something else.

If we were going to go this route, we'd need to uniformly make the
equivclass.c code strip RelabelType from *all* expressions it deals
with.  Which might be a reasonable thing to do, since it always
considers their types to be the declared input types of the operators,
anyway.  But it's a bigger patch than you have here.

I'm a bit annoyed with the idea because I had hoped to move in the
direction of getting rid of all the ad hoc RelabelType-stripping that
the planner does in various places.  A lot of it is pretty darn
questionable from a semantic standpoint.  However, because equivclass.c
is only worried about opfamily semantics, any RelabelTypes on the input
expressions don't matter, and so that objection doesn't apply.

The basic reason that there's a problem here is that the parser is
playing fast and loose by generating ORDER BY information that cites
"text < text" as the sort operator but applies it to a bare varchar
Var node.  So I thought about a Plan B of changing the parser to put
a correct RelabelType on the sort key.  I'm not sure of all the
implications of that, though, and you could argue that it's an
initdb-forcing change (because stored rules involving ORDER BY on
varchar columns would need to change to work right).  Seems a bit
late in the 8.3 cycle for that.

I guess the right answer is to fix equivclass.c to strip RelabelTypes,
and hope to maybe take that out again someday when all this gets cleaned
up.

Comments?

            regards, tom lane

Re: I have a question about using index in order statement.

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> The basic reason that there's a problem here is that the parser is
> playing fast and loose by generating ORDER BY information that cites
> "text < text" as the sort operator but applies it to a bare varchar
> Var node.  So I thought about a Plan B of changing the parser to put
> a correct RelabelType on the sort key.  I'm not sure of all the
> implications of that, though, and you could argue that it's an
> initdb-forcing change (because stored rules involving ORDER BY on
> varchar columns would need to change to work right).  Seems a bit
> late in the 8.3 cycle for that.

I think mentioning it in the release notes would be enough. You would
just have to recreate the rules to make them work again, right?

> I guess the right answer is to fix equivclass.c to strip RelabelTypes,
> and hope to maybe take that out again someday when all this gets cleaned
> up.

That certainly looks like the easier solution. We still strip
RelabelTypes in many places anyway, so doing it in one more place
doesn't seem too bad to me.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: I have a question about using index in order statement.

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> I guess the right answer is to fix equivclass.c to strip RelabelTypes,
>> and hope to maybe take that out again someday when all this gets cleaned
>> up.

> That certainly looks like the easier solution. We still strip
> RelabelTypes in many places anyway, so doing it in one more place
> doesn't seem too bad to me.

I reconsidered this after further thought.  If we do that it'd also mean
that clauses generated from EquivalenceClasses don't have RelabelType,
which is definitely going in the wrong direction in a big way.

What I'm just about to experiment with is the idea that lines 498-517
of pathkeys.c are in the wrong place: that should be done within
make_pathkey_from_sortinfo, so that it would also happen for expressions
coming from SortClauses (see the other caller of
make_pathkey_from_sortinfo).  Then the expressions coming in to
equivclass.c should always match.

            regards, tom lane