Thread: OR clause status
I have worked with Vadim privately on the remaining OR clause index issues, and I am done. Please start testing, everyone. The only open item is whether MergeJoin will try to use a multi-index OR clause in a join, but I doubt anyone uses such things. It would require you to join a table as part of a join clause. select * from tab1, tab2 where tab1.col1=2 or tab1.col1 = tab2.col2 I am going to study the optimizer code, then figure out whether this could ever be broken with the new index usage. I plan to complete this in time for 6.4. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote: > I have worked with Vadim privately on the remaining OR clause index > issues, and I am done. Please start testing, everyone. > Ran a few tests last night. So far, works great for single, non-internal, attributes. However, I was not able to get the query to use the table's oid index when such an index existed. Did a VACUUM ANALYZE before the query. Will be doing some more testing over the next few days. Have not tried any mult-part keys. Will this patch handle a small number of OR'ed multi-part keys?
David Hartwig <daveh@insightdist.com> writes: > Ran a few tests last night. So far, works great for single, non-internal, > attributes. However, I was not able to get the query to use the table's oid > index when such an index existed. Perhaps this is an artifact of the type-coercion issue (see "indexes and floats" thread on pg-hackers). I find I have to write something like WHERE oid = 123456::oid to get the system to use an index on OID. If I write WHERE oid = 123456 it takes it, but does it by sequential scan :-( I do not know if it's acted like that all along or it's a result of Tom's type coercion fixes of a couple months ago. regards, tom lane
> Bruce Momjian wrote: > > > I have worked with Vadim privately on the remaining OR clause index > > issues, and I am done. Please start testing, everyone. > > > > Ran a few tests last night. So far, works great for single, non-internal, > attributes. However, I was not able to get the query to use the table's oid > index when such an index existed. Did a VACUUM ANALYZE before the query. > Will be doing some more testing over the next few days. Try: test=> explain select * from test where oid = 3::oid or oid = 4::oid; NOTICE: QUERY PLAN: Index Scan using i_test2 on test (cost=4.10 size=1 width=4) Thomas will have to deal with this before 6.4. The auto-casting of type oid is not working. > > Have not tried any mult-part keys. Will this patch handle a small number of > OR'ed multi-part keys? This is another item that I think does not work. The code handles AND and OR separately, making this hard to handle: key1 = 7 and (key2 = 99 or key2 = 100) It will use the first part of the key to get key1 = 7, but recognize that the first part of the key was matches and use the key2 part. I am going to study the optimizer code for 6.4, so I may be able to get it working. Not sure yet. It will certainly use the first part of a multi-part key if the first key is the one matching the OR clause. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > Ran a few tests last night. So far, works great for single, > > non-internal, attributes. However, I was not able to get the query > > to use the table's oid index when such an index existed. Did a > > VACUUM ANALYZE before the query. > Thomas will have to deal with this before 6.4. The auto-casting of > type oid is not working. I think the OID auto-casting should be very easy (painfully easy, actually, if that doesn't jinx it). Once the development tree becomes unbroken :) Haven't finished reading mail yet, so may already work, eh? - Tom
> > > Ran a few tests last night. So far, works great for single, > > > non-internal, attributes. However, I was not able to get the query > > > to use the table's oid index when such an index existed. Did a > > > VACUUM ANALYZE before the query. > > Thomas will have to deal with this before 6.4. The auto-casting of > > type oid is not working. > > I think the OID auto-casting should be very easy (painfully easy, > actually, if that doesn't jinx it). Once the development tree becomes > unbroken :) Haven't finished reading mail yet, so may already work, eh? Yes, I think there are some very good reasons to evaluate functions on constants inside the parser. There are some operations that look at index selectivity that need to know the constant value, rather than knowing if the function returns an int. For example x > 3 looks at the pg_statistics table to see max/min values. You certainly don't want to be evaluating functions on constants inside the optimizer. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > I think the OID auto-casting should be very easy > > Once the development tree becomes unbroken... > Yes, I think there are some very good reasons to evaluate functions on > constants inside the parser. I wasn't actually suggesting making real changes to the parser. I _think_ I can get the OID vs int4 coersion problem fixed with a one-line change to a header file, but need a working tree to do it :( Any progress on getting the tree to compile? Are there some people who don't see a problem?? > There are some operations that look at > index selectivity that need to know the constant value, rather than > knowing if the function returns an int. For example x > 3 looks at > the pg_statistics table to see max/min values. > > You certainly don't want to be evaluating functions on constants > inside the optimizer. ?? Why not? It is an optimization after all. I have never looked at the optimizer code however. Are you saying that the current optimizer doesn't have a pass or phase where functions could be evaluated? Should we add a pass to do this? I've been a bit worried about coding too many optimizations into the parser, since they might become pretty obscure buried in there. - Tom
> > > I think the OID auto-casting should be very easy > > > Once the development tree becomes unbroken... > > Yes, I think there are some very good reasons to evaluate functions on > > constants inside the parser. > > I wasn't actually suggesting making real changes to the parser. I > _think_ I can get the OID vs int4 coersion problem fixed with a one-line > change to a header file, but need a working tree to do it :( > > Any progress on getting the tree to compile? Are there some people who > don't see a problem?? > > > There are some operations that look at > > index selectivity that need to know the constant value, rather than > > knowing if the function returns an int. For example x > 3 looks at > > the pg_statistics table to see max/min values. > > > > You certainly don't want to be evaluating functions on constants > > inside the optimizer. > > ?? Why not? It is an optimization after all. I have never looked at the > optimizer code however. Are you saying that the current optimizer > doesn't have a pass or phase where functions could be evaluated? Should > we add a pass to do this? I've been a bit worried about coding too many > optimizations into the parser, since they might become pretty obscure > buried in there. The optimizer is much too complicated to be putting fmgr() function calls in there. I know the parser is bad too, but I think it is best to do it there. We already do some of it there anyway. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> Perhaps this is an artifact of the type-coercion issue (see "indexes > and floats" thread on pg-hackers). I find I have to write something > like > WHERE oid = 123456::oid > to get the system to use an index on OID. If I write > WHERE oid = 123456 > it takes it, but does it by sequential scan :-( > I do not know if it's acted like that all along or it's a result > of Tom's type coercion fixes of a couple months ago. Hi Bruce. You are right, the optimizer is confusing :) I'm not sure if you were looking at this already, but I was thinking of finding the place where the optimizer decides whether an index can be used in a query, in particular when constants are involved. Seems like the overhead/operations involved should be identical whether the terms have the same type or not; in the cases above WHERE oid = 123456::oid would use oideq() and WHERE oid = 123456 would use oidint4eq(). Why would Postgres give up on using an index in the second case? In both cases there is one call to a function to evaluate the equality. Do the types need to match up for other reasons? I was thinking of adding the IS_BINARY_COMPATIBLE() macro as an optimization in the place where indices are being chosen, but then got confused as to why Postgres would care in the first place. Also, haven't found the area where these decisions are made. Any hints? Anyone else rummaged around that code? - Tom
> > Perhaps this is an artifact of the type-coercion issue (see "indexes > > and floats" thread on pg-hackers). I find I have to write something > > like > > WHERE oid = 123456::oid > > to get the system to use an index on OID. If I write > > WHERE oid = 123456 > > it takes it, but does it by sequential scan :-( > > I do not know if it's acted like that all along or it's a result > > of Tom's type coercion fixes of a couple months ago. > > Hi Bruce. You are right, the optimizer is confusing :) > > I'm not sure if you were looking at this already, but I was thinking of > finding the place where the optimizer decides whether an index can be > used in a query, in particular when constants are involved. Seems like > the overhead/operations involved should be identical whether the terms > have the same type or not; in the cases above > WHERE oid = 123456::oid > would use oideq() and > WHERE oid = 123456 > would use oidint4eq(). > > Why would Postgres give up on using an index in the second case? In both > cases there is one call to a function to evaluate the equality. Do the > types need to match up for other reasons? Because the PARSER converts :: casts to constants. It does not handle functions-on-consts conversions. In scan.l, :: is converted to TYPECAST, and then in gram.y: | a_expr TYPECAST Typename { $$ = (Node *)$1; /* AexprConst can be either A_Const or ParamNo */ if (nodeTag($1) == T_A_Const) { ((A_Const *)$1)->typename = $3; } else if (nodeTag($1) == T_Param) { ((ParamNo *)$1)->typename = $3; /* otherwise, try to transform to a function call */ } else { FuncCall *n = makeNode(FuncCall); n->funcname = $3->name; n->args = lcons($1,NIL); $$ = (Node *)n; } } As you can see, if it is a constant as passed from scan.l, a constant is created with the type of the cast, so it does become a constant. Then in parse_expr.c, we have: case T_A_Const: { A_Const *con = (A_Const *) expr; Value *val = &con->val; if (con->typename != NULL) result = parser_typecast(val, con->typename, -1); else result = (Node *) make_const(val); break; } And parser_typecast does, because the constant is an integer: case T_Integer: const_string = (char *) palloc(256); string_palloced = true; sprintf(const_string, "%ld", expr->val.ival); which then does the conversion of the int to a string, and makes a conversion back to the proper type: cp = stringTypeString(tp, const_string, atttypmod); stringTypeString does: /* Given a type structure and a string, returns the internal form of that string */ char * stringTypeString(Type tp, char *string, int32 atttypmod) { Oid op; Oid typelem; op = ((TypeTupleForm) GETSTRUCT(tp))->typinput; typelem = ((TypeTupleForm) GETSTRUCT(tp))->typelem; /* XXX - used for * array_in */ return ((char *) fmgr(op, string, typelem, atttypmod)); } and then makes a new constant: adt = makeConst(typeTypeId(tp), len, (Datum) lcp, false, typeByVal(tp), false, /* not a set */ true /* is cast */ ); The problem is that wrapping a function around the const is not going through this code. I will see if I can add the proper calls to get it working. > > I was thinking of adding the IS_BINARY_COMPATIBLE() macro as an > optimization in the place where indices are being chosen, but then got > confused as to why Postgres would care in the first place. Also, haven't > found the area where these decisions are made. > > Any hints? Anyone else rummaged around that code? -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > Perhaps this is an artifact of the type-coercion issue (see "indexes > > and floats" thread on pg-hackers). I find I have to write something > > like > > WHERE oid = 123456::oid > > to get the system to use an index on OID. If I write > > WHERE oid = 123456 > > it takes it, but does it by sequential scan :-( > > I do not know if it's acted like that all along or it's a result > > of Tom's type coercion fixes of a couple months ago. > > Hi Bruce. You are right, the optimizer is confusing :) > > I'm not sure if you were looking at this already, but I was thinking of > finding the place where the optimizer decides whether an index can be > used in a query, in particular when constants are involved. Seems like > the overhead/operations involved should be identical whether the terms > have the same type or not; in the cases above > WHERE oid = 123456::oid > would use oideq() and > WHERE oid = 123456 > would use oidint4eq(). > > Why would Postgres give up on using an index in the second case? In both > cases there is one call to a function to evaluate the equality. Do the > types need to match up for other reasons? > > I was thinking of adding the IS_BINARY_COMPATIBLE() macro as an > optimization in the place where indices are being chosen, but then got > confused as to why Postgres would care in the first place. Also, haven't > found the area where these decisions are made. > > Any hints? Anyone else rummaged around that code? In looking at the code, part of the problem is that you are creating a FuncCall node in coerce_type, so I have to make sure I convert normal funcs with constants to constants, and your type-conversion funcs too, which will not be picked up in the normal expression parsing because they were not there originally. (Assuming the function is cache-able.) In the case of x=3, does your code try to convert the constant to the type of the variable, or does it try and do both. We can convert: x = int4(3) and not int4(x) = 3 or int4(x) = int4(3) -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > I was thinking of > > finding the place where the optimizer decides whether an index can > > be used in a query, in particular when constants are involved. > > WHERE oid = 123456::oid > > would use oideq() and > > WHERE oid = 123456 > > would use oidint4eq(). > > Why would Postgres give up on using an index in the second case? Before we go ripping into the parser, I think it would help to understand the area of the optimizer or planner which decides whether or not to use an index. Look, these first queries use an index: regression=> explain select * from tenk1 where oid = 3000::oid; Index Scan using tenk1_oid on tenk1 (cost=2.05 size=1 width=148) regression=> explain select * from tenk1 where oid < 3000::oid; Index Scan using tenk1_oid on tenk1 (cost=257.67 size=3334 width=148) regression=> explain select * from tenk1 where oid > 3000::oid; Index Scan using tenk1_oid on tenk1 (cost=257.67 size=3334 width=148) But the next queries do not: regression=> explain select * from tenk1 where oid != 3000::oid; Seq Scan on tenk1 (cost=574.00 size=9999 width=148) regression=> explain select * from tenk1 where oid = 3000; Seq Scan on tenk1 (cost=574.00 size=1 width=148) Afaik, the parser handles every one of these queries _exactly_ the same, with slightly different outcomes due to the slight differences in operators or types. But the parse tree will have the same kind of nodes, with function calls used to evaluate the operators. So how does the optimizer conclude that a node coming from the parser is a possibility for an index scan? Let's find this out before masking the behavior in the parser. > In looking at the code, part of the problem is that you are creating a > FuncCall node in coerce_type, so I have to make sure I convert normal > funcs with constants to constants, and your type-conversion funcs too, > which will not be picked up in the normal expression parsing because > they were not there originally. (Assuming the function is cache-able.) I think Vadim suggested that if we do something in the parser we create a PARAM_EXEC node to do a one-time evaluation in the executor. Seems preferable to doing a brute-force conversion via strings in the parser... - Tom