Thread: Feature request: smarter use of conditional indexes

Feature request: smarter use of conditional indexes

From
John Siracusa
Date:
Given an index like this:

    CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;

and a query like this:

    SELECT * FROM t1 WHERE c1 = 123;

I'd like the planner to be smart enough to use an index scan using i1.  Yes,
I can change the query to this:

    SELECT * FROM t1 WHERE c1 = 123 AND c1 IS NOT NULL;

In which case the index will be used, but I shouldn't have to.  More
practically, since a lot of my SQL is auto-generated, it's difficult to make
this query change just in the cases where I need it.  And I'm loathe to
change every "column = value" pair in my auto-generated SQL into a double
pair of "(column = value and column is not null)"  It's redundant and looks
pretty silly, IMO.

Thanks for you consideration :)

-John


Re: Feature request: smarter use of conditional indexes

From
Tom Lane
Date:
John Siracusa <siracusa@mindspring.com> writes:
> Given an index like this:
>     CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;
> and a query like this:
>     SELECT * FROM t1 WHERE c1 = 123;
> I'd like the planner to be smart enough to use an index scan using i1.

Send a patch ;-)

The routine you want to teach about this is pred_test_simple_clause() in
src/backend/optimizer/path/indxpath.c.  ISTM that it's legitimate to
conclude that "foo IS NOT NULL" is implied by "foo op anything" or
"anything op foo" if the operator is marked strict.

Note: please patch against CVS head, as that code got rewritten since
7.4.

            regards, tom lane

Re: Feature request: smarter use of conditional indexes

From
Christopher Kings-Lynne
Date:
>>Given an index like this:
>>    CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;
>>and a query like this:
>>    SELECT * FROM t1 WHERE c1 = 123;
>>I'd like the planner to be smart enough to use an index scan using i1.
>
>
> Send a patch ;-)
>
> The routine you want to teach about this is pred_test_simple_clause() in
> src/backend/optimizer/path/indxpath.c.  ISTM that it's legitimate to
> conclude that "foo IS NOT NULL" is implied by "foo op anything" or
> "anything op foo" if the operator is marked strict.

I've actually mentioned this one before in that of all the partial
indexes I have, almost all of then are a WHERE x IS NOT NULL format.  I
don't know if that's a common use, but if it is, then maybe it's worth
just adding the knowledge for IS NOT NULL...

The other thing is that at the moment, cascading foreign keys will not
use partial indexes even if they match the predicate.  Maybe an IS NOT
NULL hack will help there...

Chris


Re: Feature request: smarter use of conditional indexes

From
John Siracusa
Date:
On 3/3/04 6:53 PM, Tom Lane wrote:
> John Siracusa <siracusa@mindspring.com> writes:
>> Given an index like this:
>>     CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;
>> and a query like this:
>>     SELECT * FROM t1 WHERE c1 = 123;
>> I'd like the planner to be smart enough to use an index scan using i1.
>
> Send a patch ;-)
>
> The routine you want to teach about this is pred_test_simple_clause() in
> src/backend/optimizer/path/indxpath.c.  ISTM that it's legitimate to
> conclude that "foo IS NOT NULL" is implied by "foo op anything" or
> "anything op foo" if the operator is marked strict.

Gack, C is not my forte...

So...I'm noodling around in pred_test_simple_clause() and my test query of:

    SELECT * FROM t1 WHERE c1 = 123;

lands me in pred_test_simple_clause() with a "predicate" with a NodeTag of
NullTest, and a "clause" with a NodeTag of OpExpr.  The clause "rightop"
IsA() Const.  So far, it seems to make sense.  It's comparing the clause "c1
= 123" with the predicate on the "i1" index ("IS NOT NULL") to see if one
implies the other.

But now I'm stuck, because IsA(predicate, NullTest) is *also* true if the
index i1 is dropped and index i2 is created like this:

    CREATE UNIQUE INDEX i2 ON t1 (c1) WHERE c1 IS NOT NULL;

IOW, both "IS NOT NULL" and "IS NULL" lead to IsA(predicate, NullTest) being
true.  I found this, which looked promising:

typedef enum BoolTestType
{
    IS_TRUE, IS_NOT_TRUE, IS_FALSE, IS_NOT_FALSE, IS_UNKNOWN, IS_NOT_UNKNOWN
} BoolTestType;

typedef struct BooleanTest
{
    Expr        xpr;
    Expr       *arg;            /* input expression */
    BoolTestType booltesttype;    /* test type */
} BooleanTest;

But then I realized that "predicate" is "Expr *" inside the
pred_test_simple_clause() function, and Expr seems only to have a single
field, which is tested by IsA()

typedef struct Expr
{
    NodeTag        type;
} Expr;

So apparently all I can do is find out if it's a null test, but not if it is
specifically "IS NOT NULL"

Now I'm stuck, and thinking that I'd have to modify more than
pred_test_simple_clause() to make this work.  Any additional pointers? :)

-John


Re: Feature request: smarter use of conditional indexes

From
Tom Lane
Date:
John Siracusa <siracusa@mindspring.com> writes:
> So apparently all I can do is find out if it's a null test, but not if it is
> specifically "IS NOT NULL"

No, because once you have determined that the node really IsA NullTest,
you can cast the pointer to (NullTest *) and look at the
NullTest-specific fields.  Think of this as poor man's object-oriented
programming: Node is the supertype of Expr which is the supertype of
NullTest (and a lot of other kinds of nodes, too).

It'd look something like

    if (IsA(predicate, NullTest) &&
        ((NullTest *) predicate)->nulltesttype == IS_NOT_NULL)
    {
      /* check to see if arg field matches either side of opclause,
       * and if so check whether operator is strict ...
       */
    }

You can find plenty of examples of this programming pattern throughout
the backend.  In fact pred_test_simple_clause is doing exactly this
to check that what it's given is an OpExpr and not some other node type.

            regards, tom lane

Re: Feature request: smarter use of conditional indexes

From
John Siracusa
Date:
On 3/6/04 4:06 PM, Tom Lane wrote:
> John Siracusa <siracusa@mindspring.com> writes:
>> So apparently all I can do is find out if it's a null test, but not if it is
>> specifically "IS NOT NULL"
>
> No, because once you have determined that the node really IsA NullTest,
> you can cast the pointer to (NullTest *) and look at the
> NullTest-specific fields.

I tried casting, but stupidly tried to access the type name (BoolTestType)
instead of the field name (nulltesttype).  Duh! :)

> Think of this as poor man's object-oriented programming: Node is the supertype
> of Expr which is the supertype of NullTest (and a lot of other kinds of nodes,
> too).

Yeah, I read that in the comments but was defeated by my devious brain ;)
Thanks, I'll see how much farther I can go before getting stuck again :)

-John


Re: Feature request: smarter use of conditional indexes

From
John Siracusa
Date:
On 3/3/04 6:53 PM, Tom Lane wrote:
>  John Siracusa <siracusa@mindspring.com> writes:
>>  Given an index like this:
>>      CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;
>>  and a query like this:
>>      SELECT * FROM t1 WHERE c1 = 123;
>>  I'd like the planner to be smart enough to use an index scan using
>> i1.
>
>  Send a patch ;-)

How does this look?  It seems to do what I want without horribly
breaking anything as far as I can tell.  I ran "make check" and got the
same result as I did before my changes (5 failures in OS X 10.3.2).
But then, I also got the same result when I wasn't even checking to
make sure that both clauses were looking at the same variable :)  I'm
not sure how to add a test for this particular change either.

% cvs diff src/backend/optimizer/path/indxpath.c
Index: src/backend/optimizer/path/indxpath.c
===================================================================
RCS file:
/projects/cvsroot/pgsql-server/src/backend/optimizer/path/indxpath.c,v
retrieving revision 1.156
diff -r1.156 indxpath.c
1032a1033,1055
 >    {
 >        /* One last chance: "var = const" or "const = var" implies "var is
not null" */
 >        if (IsA(predicate, NullTest) &&
 >            ((NullTest *) predicate)->nulltesttype == IS_NOT_NULL &&
 >            is_opclause(clause) && op_strict(((OpExpr *) clause)->opno) &&
 >            length(((OpExpr *) clause)->args) == 2)
 >        {
 >            leftop = get_leftop((Expr *) clause);
 >            rightop = get_rightop((Expr *) clause);
 >
 >            /* One of the two arguments must be a constant */
 >            if (IsA(rightop, Const))
 >                clause_var = leftop;
 >            else if (IsA(leftop, Const))
 >                clause_var = rightop;
 >            else
 >                return false;
 >
 >            /* Finally, make sure "var" is the same var in both clauses */
 >            if (equal(((NullTest *) predicate)->arg, clause_var))
 >                return true;
 >        }
 >
1033a1057
 >    }


Re: Feature request: smarter use of conditional indexes

From
Larry Rosenman
Date:

--On Saturday, March 06, 2004 21:29:27 -0500 John Siracusa
<siracusa@mindspring.com> wrote:

> On 3/3/04 6:53 PM, Tom Lane wrote:
>>  John Siracusa <siracusa@mindspring.com> writes:
>>>  Given an index like this:
>>>      CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;
>>>  and a query like this:
>>>      SELECT * FROM t1 WHERE c1 = 123;
>>>  I'd like the planner to be smart enough to use an index scan using
>>> i1.
>>
>>  Send a patch ;-)
Just a suggestion, please use diff -c format, as it makes it easier for
the folks who apply the patches to do so.
[snip]


--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Attachment

Re: Feature request: smarter use of conditional indexes

From
Tom Lane
Date:
Larry Rosenman <ler@lerctr.org> writes:
> Just a suggestion, please use diff -c format, as it makes it easier for
> the folks who apply the patches to do so.

That's not just a suggestion ... patches that aren't in diff -c (or at
least diff -u) format will be rejected out of hand.  Without the context
lines provided by these formats, applying a patch is an exercise in
risk-taking, because you can't be certain that you are applying the same
patch the submitter intended.

Personally I consider -c format the only one of the three that is
readable for reviewing purposes, so even if I weren't intending
immediate application, I'd ask for -c before looking at the patch.
There are some folks who consider -u format readable, but I'm not
one of them ...

BTW, patches really ought to go to pgsql-patches ... they're a bit
off-topic here.

            regards, tom lane

Re: Feature request: smarter use of conditional indexes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Larry Rosenman <ler@lerctr.org> writes:
> > Just a suggestion, please use diff -c format, as it makes it easier for
> > the folks who apply the patches to do so.
>
> That's not just a suggestion ... patches that aren't in diff -c (or at
> least diff -u) format will be rejected out of hand.  Without the context
> lines provided by these formats, applying a patch is an exercise in
> risk-taking, because you can't be certain that you are applying the same
> patch the submitter intended.

Also, when you get 'fuzz' output when applying the patch, you should
review the patch to make sure it appeared in the right place.  That has
gotten me a few times.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Feature request: smarter use of conditional indexes

From
CoL
Date:
hi,

John Siracusa wrote, On 3/3/2004 20:56:

> Given an index like this:
>
>     CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;
>
> and a query like this:
>
>     SELECT * FROM t1 WHERE c1 = 123;
>
> I'd like the planner to be smart enough to use an index scan using i1.  Yes,
> I can change the query to this:
>
>     SELECT * FROM t1 WHERE c1 = 123 AND c1 IS NOT NULL;
>
> In which case the index will be used, but I shouldn't have to.  More
> practically, since a lot of my SQL is auto-generated, it's difficult to make
> this query change just in the cases where I need it.  And I'm loathe to
> change every "column = value" pair in my auto-generated SQL into a double
> pair of "(column = value and column is not null)"  It's redundant and looks
> pretty silly, IMO.

how about: CREATE UNIQUE INDEX i1 ON t1 (c1);
WHERE c1 IS NOT NULL in this case what is the point of doing this?
You do not need this condition.

C.