Thread: OR clause status

OR clause status

From
Bruce Momjian
Date:
I have worked with Vadim privately on the remaining OR clause index
issues, and I am done.  Please start testing, everyone.

The only open item is whether MergeJoin will try to use a multi-index OR
clause in a join, but I doubt anyone uses such things.  It would require
you to join a table as part of a join clause.

    select *
    from tab1, tab2
    where tab1.col1=2 or tab1.col1 = tab2.col2

I am going to study the optimizer code, then figure out whether this
could ever be broken with the new index usage.  I plan to complete this
in time for 6.4.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR clause status

From
David Hartwig
Date:
Bruce Momjian wrote:

> I have worked with Vadim privately on the remaining OR clause index
> issues, and I am done.  Please start testing, everyone.
>

Ran a few tests last night.   So far, works great for single, non-internal,
attributes.  However, I was not able to get the query to use the table's oid
index when such an index existed.   Did a VACUUM ANALYZE before the query.
Will be doing some more testing over the next few days.

Have not tried any mult-part keys.  Will this patch handle a small number of
OR'ed multi-part keys?



Re: [HACKERS] OR clause status

From
Tom Lane
Date:
David Hartwig <daveh@insightdist.com> writes:
> Ran a few tests last night.   So far, works great for single, non-internal,
> attributes.  However, I was not able to get the query to use the table's oid
> index when such an index existed.

Perhaps this is an artifact of the type-coercion issue (see "indexes and
floats" thread on pg-hackers).  I find I have to write something like

    WHERE oid = 123456::oid

to get the system to use an index on OID.  If I write

    WHERE oid = 123456

it takes it, but does it by sequential scan :-(

I do not know if it's acted like that all along or it's a result
of Tom's type coercion fixes of a couple months ago.

            regards, tom lane

Re: [HACKERS] OR clause status

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
>
> > I have worked with Vadim privately on the remaining OR clause index
> > issues, and I am done.  Please start testing, everyone.
> >
>
> Ran a few tests last night.   So far, works great for single, non-internal,
> attributes.  However, I was not able to get the query to use the table's oid
> index when such an index existed.   Did a VACUUM ANALYZE before the query.
> Will be doing some more testing over the next few days.

Try:
    test=> explain select * from test where oid = 3::oid or oid = 4::oid;
    NOTICE:  QUERY PLAN:

    Index Scan using i_test2 on test  (cost=4.10 size=1 width=4)

Thomas will have to deal with this before 6.4.  The auto-casting of type
oid is not working.


>
> Have not tried any mult-part keys.  Will this patch handle a small number of
> OR'ed multi-part keys?

This is another item that I think does not work.  The code handles AND
and OR separately, making this hard to handle:

    key1 = 7 and (key2 = 99 or key2 = 100)

It will use the first part of the key to get key1 = 7, but recognize
that the first part of the key was matches and use the key2 part.  I am
going to study the optimizer code for 6.4, so I may be able to get it
working.  Not sure yet.

It will certainly use the first part of a multi-part key if the first
key is the one matching the OR clause.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR clause status

From
"Thomas G. Lockhart"
Date:
> > Ran a few tests last night.   So far, works great for single,
> > non-internal, attributes.  However, I was not able to get the query
> > to use the table's oid index when such an index existed.   Did a
> > VACUUM ANALYZE before the query.
> Thomas will have to deal with this before 6.4.  The auto-casting of
> type oid is not working.

I think the OID auto-casting should be very easy (painfully easy,
actually, if that doesn't jinx it). Once the development tree becomes
unbroken :) Haven't finished reading mail yet, so may already work, eh?

                   - Tom

Re: [HACKERS] OR clause status

From
Bruce Momjian
Date:
> > > Ran a few tests last night.   So far, works great for single,
> > > non-internal, attributes.  However, I was not able to get the query
> > > to use the table's oid index when such an index existed.   Did a
> > > VACUUM ANALYZE before the query.
> > Thomas will have to deal with this before 6.4.  The auto-casting of
> > type oid is not working.
>
> I think the OID auto-casting should be very easy (painfully easy,
> actually, if that doesn't jinx it). Once the development tree becomes
> unbroken :) Haven't finished reading mail yet, so may already work, eh?

Yes, I think there are some very good reasons to evaluate functions on
constants inside the parser.  There are some operations that look at
index selectivity that need to know the constant value, rather than
knowing if the function returns an int.  For example x > 3 looks at the
pg_statistics table to see max/min values.

You certainly don't want to be evaluating functions on constants inside
the optimizer.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR clause status

From
"Thomas G. Lockhart"
Date:
> > I think the OID auto-casting should be very easy
> > Once the development tree becomes unbroken...
> Yes, I think there are some very good reasons to evaluate functions on
> constants inside the parser.

I wasn't actually suggesting making real changes to the parser. I
_think_ I can get the OID vs int4 coersion problem fixed with a one-line
change to a header file, but need a working tree to do it :(

Any progress on getting the tree to compile? Are there some people who
don't see a problem??

> There are some operations that look at
> index selectivity that need to know the constant value, rather than
> knowing if the function returns an int.  For example x > 3 looks at
> the pg_statistics table to see max/min values.
>
> You certainly don't want to be evaluating functions on constants
> inside the optimizer.

?? Why not? It is an optimization after all. I have never looked at the
optimizer code however. Are you saying that the current optimizer
doesn't have a pass or phase where functions could be evaluated? Should
we add a pass to do this? I've been a bit worried about coding too many
optimizations into the parser, since they might become pretty obscure
buried in there.

                        - Tom

Re: [HACKERS] OR clause status

From
Bruce Momjian
Date:
> > > I think the OID auto-casting should be very easy
> > > Once the development tree becomes unbroken...
> > Yes, I think there are some very good reasons to evaluate functions on
> > constants inside the parser.
>
> I wasn't actually suggesting making real changes to the parser. I
> _think_ I can get the OID vs int4 coersion problem fixed with a one-line
> change to a header file, but need a working tree to do it :(
>
> Any progress on getting the tree to compile? Are there some people who
> don't see a problem??
>
> > There are some operations that look at
> > index selectivity that need to know the constant value, rather than
> > knowing if the function returns an int.  For example x > 3 looks at
> > the pg_statistics table to see max/min values.
> >
> > You certainly don't want to be evaluating functions on constants
> > inside the optimizer.
>
> ?? Why not? It is an optimization after all. I have never looked at the
> optimizer code however. Are you saying that the current optimizer
> doesn't have a pass or phase where functions could be evaluated? Should
> we add a pass to do this? I've been a bit worried about coding too many
> optimizations into the parser, since they might become pretty obscure
> buried in there.

The optimizer is much too complicated to be putting fmgr() function
calls in there.  I know the parser is bad too, but I think it is best to
do it there.  We already do some of it there anyway.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR clause status

From
"Thomas G. Lockhart"
Date:
> Perhaps this is an artifact of the type-coercion issue (see "indexes
> and floats" thread on pg-hackers).  I find I have to write something
> like
>         WHERE oid = 123456::oid
> to get the system to use an index on OID.  If I write
>         WHERE oid = 123456
> it takes it, but does it by sequential scan :-(
> I do not know if it's acted like that all along or it's a result
> of Tom's type coercion fixes of a couple months ago.

Hi Bruce. You are right, the optimizer is confusing :)

I'm not sure if you were looking at this already, but I was thinking of
finding the place where the optimizer decides whether an index can be
used in a query, in particular when constants are involved. Seems like
the overhead/operations involved should be identical whether the terms
have the same type or not; in the cases above
  WHERE oid = 123456::oid
would use oideq() and
  WHERE oid = 123456
would use oidint4eq().

Why would Postgres give up on using an index in the second case? In both
cases there is one call to a function to evaluate the equality. Do the
types need to match up for other reasons?

I was thinking of adding the IS_BINARY_COMPATIBLE() macro as an
optimization in the place where indices are being chosen, but then got
confused as to why Postgres would care in the first place. Also, haven't
found the area where these decisions are made.

Any hints? Anyone else rummaged around that code?

                        - Tom

Re: [HACKERS] OR clause status

From
Bruce Momjian
Date:
> > Perhaps this is an artifact of the type-coercion issue (see "indexes
> > and floats" thread on pg-hackers).  I find I have to write something
> > like
> >         WHERE oid = 123456::oid
> > to get the system to use an index on OID.  If I write
> >         WHERE oid = 123456
> > it takes it, but does it by sequential scan :-(
> > I do not know if it's acted like that all along or it's a result
> > of Tom's type coercion fixes of a couple months ago.
>
> Hi Bruce. You are right, the optimizer is confusing :)
>
> I'm not sure if you were looking at this already, but I was thinking of
> finding the place where the optimizer decides whether an index can be
> used in a query, in particular when constants are involved. Seems like
> the overhead/operations involved should be identical whether the terms
> have the same type or not; in the cases above
>   WHERE oid = 123456::oid
> would use oideq() and
>   WHERE oid = 123456
> would use oidint4eq().
>
> Why would Postgres give up on using an index in the second case? In both
> cases there is one call to a function to evaluate the equality. Do the
> types need to match up for other reasons?


Because the PARSER converts :: casts to constants.  It does not handle
functions-on-consts conversions.

In scan.l, :: is converted to TYPECAST, and then in gram.y:

        | a_expr TYPECAST Typename
                {
                    $$ = (Node *)$1;
                    /* AexprConst can be either A_Const or ParamNo */
                    if (nodeTag($1) == T_A_Const) {
                        ((A_Const *)$1)->typename = $3;
                    } else if (nodeTag($1) == T_Param) {
                        ((ParamNo *)$1)->typename = $3;
                    /* otherwise, try to transform to a function call */
                    } else {
                        FuncCall *n = makeNode(FuncCall);
                        n->funcname = $3->name;
                        n->args = lcons($1,NIL);
                        $$ = (Node *)n;
                    }
                }


As you can see, if it is a constant as passed from scan.l, a constant is
created with the type of the cast, so it does become a constant.

Then in parse_expr.c, we have:

        case T_A_Const:
            {
                A_Const    *con = (A_Const *) expr;
                Value      *val = &con->val;

                if (con->typename != NULL)
                    result = parser_typecast(val, con->typename, -1);
                else
                    result = (Node *) make_const(val);
                break;
            }

And parser_typecast does, because the constant is an integer:

        case T_Integer:
            const_string = (char *) palloc(256);
            string_palloced = true;
            sprintf(const_string, "%ld", expr->val.ival);

which then does the conversion of the int to a string, and makes a
conversion back to the proper type:

    cp = stringTypeString(tp, const_string, atttypmod);

stringTypeString does:

    /* Given a type structure and a string, returns the internal form of
       that string */
    char *
    stringTypeString(Type tp, char *string, int32 atttypmod)
    {
        Oid         op;
        Oid         typelem;

        op = ((TypeTupleForm) GETSTRUCT(tp))->typinput;
        typelem = ((TypeTupleForm) GETSTRUCT(tp))->typelem; /* XXX - used for
                                                             * array_in */
        return ((char *) fmgr(op, string, typelem, atttypmod));
    }

and then makes a new constant:

    adt = makeConst(typeTypeId(tp),
                    len,
                    (Datum) lcp,
                    false,
                    typeByVal(tp),
                    false,      /* not a set */
                    true /* is cast */ );


The problem is that wrapping a function around the const is not going
through this code.  I will see if I can add the proper calls to get it
working.


>
> I was thinking of adding the IS_BINARY_COMPATIBLE() macro as an
> optimization in the place where indices are being chosen, but then got
> confused as to why Postgres would care in the first place. Also, haven't
> found the area where these decisions are made.
>
> Any hints? Anyone else rummaged around that code?



--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR clause status

From
Bruce Momjian
Date:
> > Perhaps this is an artifact of the type-coercion issue (see "indexes
> > and floats" thread on pg-hackers).  I find I have to write something
> > like
> >         WHERE oid = 123456::oid
> > to get the system to use an index on OID.  If I write
> >         WHERE oid = 123456
> > it takes it, but does it by sequential scan :-(
> > I do not know if it's acted like that all along or it's a result
> > of Tom's type coercion fixes of a couple months ago.
>
> Hi Bruce. You are right, the optimizer is confusing :)
>
> I'm not sure if you were looking at this already, but I was thinking of
> finding the place where the optimizer decides whether an index can be
> used in a query, in particular when constants are involved. Seems like
> the overhead/operations involved should be identical whether the terms
> have the same type or not; in the cases above
>   WHERE oid = 123456::oid
> would use oideq() and
>   WHERE oid = 123456
> would use oidint4eq().
>
> Why would Postgres give up on using an index in the second case? In both
> cases there is one call to a function to evaluate the equality. Do the
> types need to match up for other reasons?
>
> I was thinking of adding the IS_BINARY_COMPATIBLE() macro as an
> optimization in the place where indices are being chosen, but then got
> confused as to why Postgres would care in the first place. Also, haven't
> found the area where these decisions are made.
>
> Any hints? Anyone else rummaged around that code?

In looking at the code, part of the problem is that you are creating a
FuncCall node in coerce_type, so I have to make sure I convert normal
funcs with constants to constants, and your type-conversion funcs too,
which will not be picked up in the normal expression parsing because
they were not there originally.  (Assuming the function is cache-able.)

In the case of x=3, does your code try to convert the constant to the
type of the variable, or does it try and do both.  We can convert:

    x = int4(3)

and not
    int4(x) = 3

or
    int4(x) = int4(3)


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR clause status

From
"Thomas G. Lockhart"
Date:
> > I was thinking of
> > finding the place where the optimizer decides whether an index can
> > be used in a query, in particular when constants are involved.
> >   WHERE oid = 123456::oid
> > would use oideq() and
> >   WHERE oid = 123456
> > would use oidint4eq().
> > Why would Postgres give up on using an index in the second case?

Before we go ripping into the parser, I think it would help to
understand the area of the optimizer or planner which decides whether or
not to use an index. Look, these first queries use an index:

regression=> explain select * from tenk1 where oid = 3000::oid;
Index Scan using tenk1_oid on tenk1  (cost=2.05 size=1 width=148)
regression=> explain select * from tenk1 where oid < 3000::oid;
Index Scan using tenk1_oid on tenk1  (cost=257.67 size=3334 width=148)
regression=> explain select * from tenk1 where oid > 3000::oid;
Index Scan using tenk1_oid on tenk1  (cost=257.67 size=3334 width=148)

But the next queries do not:

regression=> explain select * from tenk1 where oid != 3000::oid;
Seq Scan on tenk1  (cost=574.00 size=9999 width=148)
regression=> explain select * from tenk1 where oid = 3000;
Seq Scan on tenk1  (cost=574.00 size=1 width=148)

Afaik, the parser handles every one of these queries _exactly_ the same,
with slightly different outcomes due to the slight differences in
operators or types. But the parse tree will have the same kind of nodes,
with function calls used to evaluate the operators.

So how does the optimizer conclude that a node coming from the parser is
a possibility for an index scan? Let's find this out before masking the
behavior in the parser.

> In looking at the code, part of the problem is that you are creating a
> FuncCall node in coerce_type, so I have to make sure I convert normal
> funcs with constants to constants, and your type-conversion funcs too,
> which will not be picked up in the normal expression parsing because
> they were not there originally. (Assuming the function is cache-able.)

I think Vadim suggested that if we do something in the parser we create
a PARAM_EXEC node to do a one-time evaluation in the executor. Seems
preferable to doing a brute-force conversion via strings in the
parser...

                      - Tom