Thread: BUG #5816: index not used in function
The following bug has been logged online: Bug reference: 5816 Logged by: frank Email address: frank@ros-i.com PostgreSQL version: 8.3.7 Operating system: linux Description: index not used in function Details: Linux: Linux <server name> 2.6.24-24-server #1 SMP Tue Jul 7 20:21:17 UTC 2009 i686 GNU/Linux Postgres: "PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)" Table Description: 1. table (say thisTable) with a column (say thisColumn) of varchar in mixed case data 2. w/o primary key on this column 3. an upper case index on this column with text_pattern_ops as the opclass Query Description: 1. select "thisColumn" from "thisTable" where upper("thisColumn") like 'ABC%' 2. select * from get_this_column('ABC') where get_this_column() is defined as: CREATE OR REPLACE FUNCTION get_this_column(c text) RETURNS SETOF text AS $BODY$ SELECT "thisColumn" FROM "thisTable" WHERE upper("thisColumn") like $1 limit 10; $BODY$ LANGUAGE 'sql' VOLATILE COST 100 ROWS 1000; ALTER FUNCTION get_this_column(text) OWNER TO postgres; Issue Description: The plain query (1) uses the upper case index. The function (2), even though having the same underlying query, does not seem to be able to use index. The wild card can either be in the argument or appended at the query inside the function, the result is the same. Further Question: Why can't force the use of index (even if the plan results in worse performance when user desires)? The index HINT (of ORACLE say) is a bad concept and a horrible design idea. The user should be allowed to force the use of any index regardless of performance. Anyway, the user should know what he/she is doing. Besides, an explain could show which, the system plan or the user's, is better. The combination described here is perhaps not normal. I have not tested it with any other version. Likely the same is true for all versions including 9. Regardless of resolution, I would appreciate a brief response. Thanks
"frank" <frank@ros-i.com> wrote: > WHERE upper("thisColumn") like $1 The function's plan is kept from one execution to another, and it can't know what will be in the first parameter -- perhaps '%X%'? If you build up the statement in a string and EXECUTE it, you might get the desired behavior. Anyway, next time you have an issue like this, please post to the performance list; this is not a bug. -Kevin
Kevin, Very thrilled to hear back from you (and surprised at the promptness, too)! Please do not take this personal as it is a tech issue and I am treating it thus. We may have different perceptions of something being a 'bug'. I always have several simple ways of determining it. One of them is when a work-around is in the proposal. Yours is one. There can be quite a number of ways of looking at the issue. First, it is truly an implementation matter (making it in the true sense a bug). I do not believe that the spec would in formal way say that 'well, there are caveats where you have to do this and that to work around'. I gave specific parameters in my example (if you go back to my original email/bug report). To be precise, the parameter does not start with a wild card. By giving '%X%', you may have changed the nature a bit. Regardless, it is still the same issue: the implementation did not faithfully reflect the specification, which did not explicitly (nor implicitly) endorse such behavior. From a tech perspective, you are actually partially correct, but only partially. How can a leading wild-card warrant the use of index? It can not, for sure. However, the implementation is still defective. Let me be specific. The logic should be a conditional: if (at run time when the argument is parsed and examined) a leading wild-card is found, it should branch to not use index (as it can not), otherwise it should. The compilation already sees the text [upper("thisColumn")]. Assuming there are no other issues and embedded wild-cards are correctly handled (e.g. through index search and then a filter), the above described simple logic is the only thing needed to be thrown in. If by 'kept from one execution to another' means that (the concept of) a plan is implemented static, this can be a low level design issue, which in general will still be regarded as implementation, thus a bug. Lastly, to expose it as a work-around, let us take the situation where I input n parameters, the different combinations will be used to query against a number of tables (inside the function). Due to the issue, I have to input all the actual query text as parameters into the function (which by now is no longer truly a function). But that could be factorial! Anyway, I hope this can be re-considered. I do not want to create headaches, but if resource constraint is a factor, I may offer to fix the problem if you could kindly provide with your lexer code and the code files that are/may be affected, as well as POC for QA. Once again, I appreciate your reply and thank you for the attention. Regards, Frank -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Thursday, January 06, 2011 12:50 PM To: pgsql-bugs@postgresql.org; frank Subject: Re: [BUGS] BUG #5816: index not used in function "frank" <frank@ros-i.com> wrote: > WHERE upper("thisColumn") like $1 The function's plan is kept from one execution to another, and it can't know what will be in the first parameter -- perhaps '%X%'? If you build up the statement in a string and EXECUTE it, you might get the desired behavior. Anyway, next time you have an issue like this, please post to the performance list; this is not a bug. -Kevin
> We may have different perceptions of something being a 'bug'. I always > have several simple ways of determining it. One of them is when a > work-around is in the proposal. Yours is one. It seems to me that the important question in this case is whether or not t= he query produced the correct result. You are complaining about a performance issue, not a correctness issue, rig= ht? Kevin's work-around is meant to help you *gain better performance*, not to = obtain correct results when you are getting incorrect results. > There can be quite a number of ways of looking at the issue. First, it > is truly an implementation matter (making it in the true sense a bug). I > do not believe that the spec would in formal way say that 'well, there > are caveats where you have to do this and that to work around'. The "spec" (by which I assume you mean the SQL standard) says nothing about= which execution plan will be selected the optimizer. > <snip> > If by 'kept from one execution to another' means that (the concept of) a > plan is implemented static, this can be a low level design issue, which > in general will still be regarded as implementation, thus a bug. The execution plan is not quite static - it is computed the first time you = run the function (within a session) and is discarded when your session ends= (or when the compiled function/execution plan becomes obsolete because of = a change to a dependency). That is by design. If you want a dynamic plan that is re-computed each time you execute the qu= ery, you can get that behavior by using dynamic SQL, as Kevin suggested. -- Korry
-----Original Message----- From: Korry Douglas [mailto:korry.douglas@enterprisedb.com] Sent: Sunday, January 09, 2011 2:34 PM To: frank Cc: 'Kevin Grittner'; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #5816: index not used in function > We may have different perceptions of something being a 'bug'. I always > have several simple ways of determining it. One of them is when a > work-around is in the proposal. Yours is one. It seems to me that the important question in this case is whether or not the query produced the correct result. The important question by nature is not whether correct result is produce eventually. In this case, the important thing is the inability to use the index when in fact, if designed and implemented correctly, it could. You are complaining about a performance issue, not a correctness issue, right? No. I am pointing out a low-level-design/implementation defect. Poor performance is the symptom. Poor performance due to sub-optimal execution of the spec is a correctness issue. (See my comment on the nature of a function below) Kevin's work-around is meant to help you *gain better performance*, not to obtain correct results when you are getting incorrect results. If it is not a bug, why do we need a work-around? > There can be quite a number of ways of looking at the issue. First, it > is truly an implementation matter (making it in the true sense a bug). I > do not believe that the spec would in formal way say that 'well, there > are caveats where you have to do this and that to work around'. The "spec" (by which I assume you mean the SQL standard) says nothing about which execution plan will be selected (by) the optimizer. No. Whatever the spec, it will never say that a function will not work as a function, or something that works outside one will not work once moved inside. > <snip> > If by 'kept from one execution to another' means that (the concept of) a > plan is implemented static, this can be a low level design issue, which > in general will still be regarded as implementation, thus a bug. The execution plan is not quite static - it is computed the first time you run the function (within a session) and is discarded when your session ends (or when the compiled function/execution plan becomes obsolete because of a change to a dependency). That is by design. Then the design is poor. If you want a dynamic plan that is re-computed each time you execute the query, you can get that behavior by using dynamic SQL, as Kevin suggested. This seems far fetched and irrelevant. Whatever is truly static should be implemented static; whatever is dynamic should be implemented dynamic; whatever is partially static, the static part should be static and the dynamic part should be dynamic. It is natural and correct treatment. Purely dynamic situation in which the final query can not be determined in any fashion, will have to be constructed either outside of the function or within, so EXECUTE is the only way to handle. What do you think of requiring the caller to construct such a static statement as "select count(*) from sometable" and use EXECUTE? By the same token, "select thiscolumn from thistable where upper(thiscolumn) like $1" has to be treated statically for the static part. The only unknown is the parameter, which can be, by the right design and implementation, delayed till execution (runtime). The code to deal with this is what I pointed out (via a conditional). If the plan is a piece of code, then the conditional will be in it. If the plan is a piece of text to be further interpreted for actual execution (why would one want to do it that way?), the conditional could contain a text reference to two pieces of code (w/o the use of the index). If it is not properly designed/implemented and such situation results in the loss of the said ability, it is a defect to be addressed. Whether one wants to address it is one issue. A defect is a defect. You seem to suggest that the plan was only built at (the first) execution. That is poor design/implementation. Lastly, what is a function? One of the fundamental features of a function is encapsulation. One is guaranteed some well-defined output based on well-defined input. No implementation detail is necessary or is obliged to be available. The user does not have to know what table or anything for that matter is involved. When you push for what is suggested as a work-around, it defeats one of the basic purposes for a function. More can be said, but why one wants to defend a defect is quite beyond me. -- Korry
On Mon, Jan 10, 2011 at 9:07 PM, frank <frank@ros-i.com> wrote: > More can be said, but why one wants to defend a defect is quite beyond me. I guess the question is whether you want to solve your problem or improve PostgreSQL. If you want to solve your immediate problem, the advice given thus far is probably enough for you to do it. If the goal is to improve PostgreSQL, that's a worthy goal and I don't think anyone here would say otherwise. The current behavior is not ideal, but improving it is not easy. When people say "it's not a bug", they don't mean "this is the best possible behavior anyone can imagine"; they mean "we know that it works this way and we haven't actually figured out a way to do any better yet without causing other problems". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Frank, thanks for educating me.=20 -- Korry > -----Original Message----- > From: Korry Douglas [mailto:korry.douglas@enterprisedb.com]=20 > Sent: Sunday, January 09, 2011 2:34 PM > To: frank > Cc: 'Kevin Grittner'; pgsql-bugs@postgresql.org > Subject: Re: [BUGS] BUG #5816: index not used in function >=20 >=20=20 >=20 > > We may have different perceptions of something being a 'bug'. I always >=20 > > have several simple ways of determining it. One of them is when a >=20 > > work-around is in the proposal. Yours is one. >=20 >=20=20 >=20 > It seems to me that the important question in this case is whether or not= the query produced the correct result. >=20 > The important question by nature is not whether correct result is produce= eventually. In this case, the important thing is the inability to use the = index when in fact, if designed and implemented correctly, it could. >=20 >=20=20 >=20 > You are complaining about a performance issue, not a correctness issue, r= ight? >=20 > No. I am pointing out a low-level-design/implementation defect. Poor perf= ormance is the symptom. Poor performance due to sub-optimal execution of th= e spec is a correctness issue. (See my comment on the nature of a function = below) >=20 >=20=20 >=20 > Kevin's work-around is meant to help you *gain better performance*, not t= o obtain correct results when you are getting incorrect results. >=20 > If it is not a bug, why do we need a work-around? >=20 >=20=20 >=20 > > There can be quite a number of ways of looking at the issue. First, it >=20 > > is truly an implementation matter (making it in the true sense a bug). I >=20 > > do not believe that the spec would in formal way say that 'well, there >=20 > > are caveats where you have to do this and that to work around'. >=20 >=20=20 >=20 > The "spec" (by which I assume you mean the SQL standard) says nothing abo= ut which execution plan will be selected (by) the optimizer. >=20 > No. Whatever the spec, it will never say that a function will not work as= a function, or something that works outside one will not work once moved i= nside. >=20 >=20=20 >=20 > > <snip> >=20 > > If by 'kept from one execution to another' means that (the concept of) a >=20 > > plan is implemented static, this can be a low level design issue, which >=20 > > in general will still be regarded as implementation, thus a bug. >=20 >=20=20 >=20 > The execution plan is not quite static - it is computed the first time yo= u run the function (within a session) and is discarded when your session en= ds (or when the compiled function/execution plan becomes obsolete because o= f a change to a dependency). >=20 >=20=20 >=20 > That is by design. >=20 > Then the design is poor. >=20 >=20=20 >=20 > If you want a dynamic plan that is re-computed each time you execute the = query, you can get that behavior by using dynamic SQL, as Kevin suggested. >=20 >=20=20 >=20 > This seems far fetched and irrelevant. Whatever is truly static should be= implemented static; whatever is dynamic should be implemented dynamic; wha= tever is partially static, the static part should be static and the dynamic= part should be dynamic. It is natural and correct treatment. >=20 >=20=20 >=20 > Purely dynamic situation in which the final query can not be determined i= n any fashion, will have to be constructed either outside of the function o= r within, so EXECUTE is the only way to handle. What do you think of requir= ing the caller to construct such a static statement as =93select count(*) f= rom sometable=94 and use EXECUTE? >=20 >=20=20 >=20 > By the same token, =93select thiscolumn from thistable where upper(thisco= lumn) like $1=94 has to be treated statically for the static part. The only= unknown is the parameter, which can be, by the right design and implementa= tion, delayed till execution (runtime). The code to deal with this is what = I pointed out (via a conditional). If the plan is a piece of code, then the= conditional will be in it. If the plan is a piece of text to be further in= terpreted for actual execution (why would one want to do it that way?), the= conditional could contain a text reference to two pieces of code (w/o the = use of the index). If it is not properly designed/implemented and such situ= ation results in the loss of the said ability, it is a defect to be address= ed. Whether one wants to address it is one issue. A defect is a defect. >=20 >=20=20 >=20 > You seem to suggest that the plan was only built at (the first) execution= . That is poor design/implementation. >=20 >=20=20 >=20 > Lastly, what is a function? One of the fundamental features of a function= is encapsulation. One is guaranteed some well-defined output based on well= -defined input. No implementation detail is necessary or is obliged to be a= vailable. The user does not have to know what table or anything for that ma= tter is involved. When you push for what is suggested as a work-around, it = defeats one of the basic purposes for a function. >=20 >=20=20 >=20 > More can be said, but why one wants to defend a defect is quite beyond me. >=20 >=20=20 >=20 > -- Korry >=20
Robert, Thanks for the reply. It should always be understood that things could be much more complex than meets the eye. I am not warranted to demand or expect anything. The work-around does not work for me, but that is a different issue. I have no right to lecture anybody, only that sometimes one tends to forget some basics. Me no exception. My concern was the impression I got on the finding of an excuse, which I do not believe was deliberate. Again, I was not searching for "best possible". The problem is some fundamental issue. I have not seen the code or the design, I do not know for sure. But if planner was called at run time for all (or almost all) query statements inside the function, then it is a big design/implementation issue. Once again, I appreciate the messages from all three of you who try to help people, and I understand the difficulty people face. Regards, Frank -----Original Message----- From: Robert Haas [mailto:robertmhaas@gmail.com] Sent: Wednesday, January 12, 2011 11:10 AM To: frank Cc: Korry Douglas; Kevin Grittner; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #5816: index not used in function On Mon, Jan 10, 2011 at 9:07 PM, frank <frank@ros-i.com> wrote: > More can be said, but why one wants to defend a defect is quite beyond me. I guess the question is whether you want to solve your problem or improve PostgreSQL. If you want to solve your immediate problem, the advice given thus far is probably enough for you to do it. If the goal is to improve PostgreSQL, that's a worthy goal and I don't think anyone here would say otherwise. The current behavior is not ideal, but improving it is not easy. When people say "it's not a bug", they don't mean "this is the best possible behavior anyone can imagine"; they mean "we know that it works this way and we haven't actually figured out a way to do any better yet without causing other problems". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company