Thread: BUG #5816: index not used in function

BUG #5816: index not used in function

From
"frank"
Date:
The following bug has been logged online:

Bug reference:      5816
Logged by:          frank
Email address:      frank@ros-i.com
PostgreSQL version: 8.3.7
Operating system:   linux
Description:        index not used in function
Details:

Linux: Linux <server name> 2.6.24-24-server #1 SMP Tue Jul 7 20:21:17 UTC
2009 i686 GNU/Linux

Postgres: "PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC cc (GCC)
4.2.4 (Ubuntu 4.2.4-1ubuntu3)"

Table Description:
1. table (say thisTable) with a column (say thisColumn) of varchar in mixed
case data
2. w/o primary key on this column
3. an upper case index on this column with text_pattern_ops as the opclass

Query Description:
1. select "thisColumn" from "thisTable" where upper("thisColumn") like
'ABC%'
2. select * from get_this_column('ABC')
where get_this_column() is defined as:

CREATE OR REPLACE FUNCTION get_this_column(c text)
  RETURNS SETOF text AS
$BODY$
   SELECT "thisColumn" FROM "thisTable" WHERE upper("thisColumn") like $1
limit 10;
$BODY$
  LANGUAGE 'sql' VOLATILE
  COST 100
  ROWS 1000;
ALTER FUNCTION get_this_column(text) OWNER TO postgres;

Issue Description:
The plain query (1) uses the upper case index.
The function (2), even though having the same underlying query, does not
seem to be able to use index. The wild card can either be in the argument or
appended at the query inside the function, the result is the same.

Further Question:
Why can't force the use of index (even if the plan results in worse
performance when user desires)?
The index HINT (of ORACLE say) is a bad concept and a horrible design idea.
The user should be allowed to force the use of any index regardless of
performance. Anyway, the user should know what he/she is doing. Besides, an
explain could show which, the system plan or the user's, is better.

The combination described here is perhaps not normal. I have not tested it
with any other version. Likely the same is true for all versions including
9.

Regardless of resolution, I would appreciate a brief response.

Thanks

Re: BUG #5816: index not used in function

From
"Kevin Grittner"
Date:
"frank" <frank@ros-i.com> wrote:

> WHERE upper("thisColumn") like $1

The function's plan is kept from one execution to another, and it
can't know what will be in the first parameter -- perhaps '%X%'?  If
you build up the statement in a string and EXECUTE it, you might get
the desired behavior.

Anyway, next time you have an issue like this, please post to the
performance list; this is not a bug.

-Kevin

Re: BUG #5816: index not used in function

From
"frank"
Date:
Kevin,

Very thrilled to hear back from you (and surprised at the promptness,
too)!

Please do not take this personal as it is a tech issue and I am treating
it thus.

We may have different perceptions of something being a 'bug'. I always
have several simple ways of determining it. One of them is when a
work-around is in the proposal. Yours is one.

There can be quite a number of ways of looking at the issue. First, it
is truly an implementation matter (making it in the true sense a bug). I
do not believe that the spec would in formal way say that 'well, there
are caveats where you have to do this and that to work around'.

I gave specific parameters in my example (if you go back to my original
email/bug report). To be precise, the parameter does not start with a
wild card. By giving '%X%', you may have changed the nature a bit.
Regardless, it is still the same issue: the implementation did not
faithfully reflect the specification, which did not explicitly (nor
implicitly) endorse such behavior. From a tech perspective, you are
actually partially correct, but only partially.

How can a leading wild-card warrant the use of index? It can not, for
sure. However, the implementation is still defective. Let me be
specific. The logic should be a conditional: if (at run time when the
argument is parsed and examined) a leading wild-card is found, it should
branch to not use index (as it can not), otherwise it should. The
compilation already sees the text [upper("thisColumn")]. Assuming there
are no other issues and embedded wild-cards are correctly handled (e.g.
through index search and then a filter), the above described simple
logic is the only thing needed to be thrown in.

If by 'kept from one execution to another' means that (the concept of) a
plan is implemented static, this can be a low level design issue, which
in general will still be regarded as implementation, thus a bug.

Lastly, to expose it as a work-around, let us take the situation where I
input n parameters, the different combinations will be used to query
against a number of tables (inside the function). Due to the issue, I
have to input all the actual query text as parameters into the function
(which by now is no longer truly a function). But that could be
factorial!

Anyway, I hope this can be re-considered. I do not want to create
headaches, but if resource constraint is a factor, I may offer to fix
the problem if you could kindly provide with your lexer code and the
code files that are/may be affected, as well as POC for QA.

Once again, I appreciate your reply and thank you for the attention.

Regards,
Frank

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Thursday, January 06, 2011 12:50 PM
To: pgsql-bugs@postgresql.org; frank
Subject: Re: [BUGS] BUG #5816: index not used in function

"frank" <frank@ros-i.com> wrote:

> WHERE upper("thisColumn") like $1

The function's plan is kept from one execution to another, and it
can't know what will be in the first parameter -- perhaps '%X%'?  If
you build up the statement in a string and EXECUTE it, you might get
the desired behavior.

Anyway, next time you have an issue like this, please post to the
performance list; this is not a bug.

-Kevin

Re: BUG #5816: index not used in function

From
Korry Douglas
Date:
> We may have different perceptions of something being a 'bug'. I always
> have several simple ways of determining it. One of them is when a
> work-around is in the proposal. Yours is one.

It seems to me that the important question in this case is whether or not t=
he query produced the correct result.

You are complaining about a performance issue, not a correctness issue, rig=
ht?

Kevin's work-around is meant to help you *gain better performance*, not to =
obtain correct results when you are getting incorrect results.

> There can be quite a number of ways of looking at the issue. First, it
> is truly an implementation matter (making it in the true sense a bug). I
> do not believe that the spec would in formal way say that 'well, there
> are caveats where you have to do this and that to work around'.

The "spec" (by which I assume you mean the SQL standard) says nothing about=
 which execution plan will be selected the optimizer.

> <snip>
> If by 'kept from one execution to another' means that (the concept of) a
> plan is implemented static, this can be a low level design issue, which
> in general will still be regarded as implementation, thus a bug.

The execution plan is not quite static - it is computed the first time you =
run the function (within a session) and is discarded when your session ends=
 (or when the compiled function/execution plan becomes obsolete because of =
a change to a dependency).

That is by design.

If you want a dynamic plan that is re-computed each time you execute the qu=
ery, you can get that behavior by using dynamic SQL, as Kevin suggested.

            -- Korry

Re: BUG #5816: index not used in function

From
"frank"
Date:
-----Original Message-----
From: Korry Douglas [mailto:korry.douglas@enterprisedb.com]
Sent: Sunday, January 09, 2011 2:34 PM
To: frank
Cc: 'Kevin Grittner'; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #5816: index not used in function

> We may have different perceptions of something being a 'bug'. I always
> have several simple ways of determining it. One of them is when a
> work-around is in the proposal. Yours is one.

It seems to me that the important question in this case is whether or
not the query produced the correct result.
The important question by nature is not whether correct result is
produce eventually. In this case, the important thing is the inability
to use the index when in fact, if designed and implemented correctly, it
could.

You are complaining about a performance issue, not a correctness issue,
right?
No. I am pointing out a low-level-design/implementation defect. Poor
performance is the symptom. Poor performance due to sub-optimal
execution of the spec is a correctness issue. (See my comment on the
nature of a function below)

Kevin's work-around is meant to help you *gain better performance*, not
to obtain correct results when you are getting incorrect results.
If it is not a bug, why do we need a work-around?

> There can be quite a number of ways of looking at the issue. First, it
> is truly an implementation matter (making it in the true sense a bug).
I
> do not believe that the spec would in formal way say that 'well, there
> are caveats where you have to do this and that to work around'.

The "spec" (by which I assume you mean the SQL standard) says nothing
about which execution plan will be selected (by) the optimizer.
No. Whatever the spec, it will never say that a function will not work
as a function, or something that works outside one will not work once
moved inside.

> <snip>
> If by 'kept from one execution to another' means that (the concept of)
a
> plan is implemented static, this can be a low level design issue,
which
> in general will still be regarded as implementation, thus a bug.

The execution plan is not quite static - it is computed the first time
you run the function (within a session) and is discarded when your
session ends (or when the compiled function/execution plan becomes
obsolete because of a change to a dependency).

That is by design.
Then the design is poor.

If you want a dynamic plan that is re-computed each time you execute the
query, you can get that behavior by using dynamic SQL, as Kevin
suggested.

This seems far fetched and irrelevant. Whatever is truly static should
be implemented static; whatever is dynamic should be implemented
dynamic; whatever is partially static, the static part should be static
and the dynamic part should be dynamic. It is natural and correct
treatment.

Purely dynamic situation in which the final query can not be determined
in any fashion, will have to be constructed either outside of the
function or within, so EXECUTE is the only way to handle. What do you
think of requiring the caller to construct such a static statement as
"select count(*) from sometable" and use EXECUTE?

By the same token, "select thiscolumn from thistable where
upper(thiscolumn) like $1" has to be treated statically for the static
part. The only unknown is the parameter, which can be, by the right
design and implementation, delayed till execution (runtime). The code to
deal with this is what I pointed out (via a conditional). If the plan is
a piece of code, then the conditional will be in it. If the plan is a
piece of text to be further interpreted for actual execution (why would
one want to do it that way?), the conditional could contain a text
reference to two pieces of code (w/o the use of the index). If it is not
properly designed/implemented and such situation results in the loss of
the said ability, it is a defect to be addressed. Whether one wants to
address it is one issue. A defect is a defect.

You seem to suggest that the plan was only built at (the first)
execution. That is poor design/implementation.

Lastly, what is a function? One of the fundamental features of a
function is encapsulation. One is guaranteed some well-defined output
based on well-defined input. No implementation detail is necessary or is
obliged to be available. The user does not have to know what table or
anything for that matter is involved. When you push for what is
suggested as a work-around, it defeats one of the basic purposes for a
function.

More can be said, but why one wants to defend a defect is quite beyond
me.

                  -- Korry

Re: BUG #5816: index not used in function

From
Robert Haas
Date:
On Mon, Jan 10, 2011 at 9:07 PM, frank <frank@ros-i.com> wrote:
> More can be said, but why one wants to defend a defect is quite beyond me.

I guess the question is whether you want to solve your problem or
improve PostgreSQL.  If you want to solve your immediate problem, the
advice given thus far is probably enough for you to do it.  If the
goal is to improve PostgreSQL, that's a worthy goal and I don't think
anyone here would say otherwise.  The current behavior is not ideal,
but improving it is not easy.

When people say "it's not a bug", they don't mean "this is the best
possible behavior anyone can imagine"; they mean "we know that it
works this way and we haven't actually figured out a way to do any
better yet without causing other problems".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #5816: index not used in function

From
Korry Douglas
Date:
Frank, thanks for educating me.=20

        -- Korry

> -----Original Message-----
> From: Korry Douglas [mailto:korry.douglas@enterprisedb.com]=20
> Sent: Sunday, January 09, 2011 2:34 PM
> To: frank
> Cc: 'Kevin Grittner'; pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #5816: index not used in function
>=20
>=20=20
>=20
> > We may have different perceptions of something being a 'bug'. I always
>=20
> > have several simple ways of determining it. One of them is when a
>=20
> > work-around is in the proposal. Yours is one.
>=20
>=20=20
>=20
> It seems to me that the important question in this case is whether or not=
 the query produced the correct result.
>=20
> The important question by nature is not whether correct result is produce=
 eventually. In this case, the important thing is the inability to use the =
index when in fact, if designed and implemented correctly, it could.
>=20
>=20=20
>=20
> You are complaining about a performance issue, not a correctness issue, r=
ight?
>=20
> No. I am pointing out a low-level-design/implementation defect. Poor perf=
ormance is the symptom. Poor performance due to sub-optimal execution of th=
e spec is a correctness issue. (See my comment on the nature of a function =
below)
>=20
>=20=20
>=20
> Kevin's work-around is meant to help you *gain better performance*, not t=
o obtain correct results when you are getting incorrect results.
>=20
> If it is not a bug, why do we need a work-around?
>=20
>=20=20
>=20
> > There can be quite a number of ways of looking at the issue. First, it
>=20
> > is truly an implementation matter (making it in the true sense a bug). I
>=20
> > do not believe that the spec would in formal way say that 'well, there
>=20
> > are caveats where you have to do this and that to work around'.
>=20
>=20=20
>=20
> The "spec" (by which I assume you mean the SQL standard) says nothing abo=
ut which execution plan will be selected (by) the optimizer.
>=20
> No. Whatever the spec, it will never say that a function will not work as=
 a function, or something that works outside one will not work once moved i=
nside.
>=20
>=20=20
>=20
> > <snip>
>=20
> > If by 'kept from one execution to another' means that (the concept of) a
>=20
> > plan is implemented static, this can be a low level design issue, which
>=20
> > in general will still be regarded as implementation, thus a bug.
>=20
>=20=20
>=20
> The execution plan is not quite static - it is computed the first time yo=
u run the function (within a session) and is discarded when your session en=
ds (or when the compiled function/execution plan becomes obsolete because o=
f a change to a dependency).
>=20
>=20=20
>=20
> That is by design.
>=20
> Then the design is poor.
>=20
>=20=20
>=20
> If you want a dynamic plan that is re-computed each time you execute the =
query, you can get that behavior by using dynamic SQL, as Kevin suggested.
>=20
>=20=20
>=20
> This seems far fetched and irrelevant. Whatever is truly static should be=
 implemented static; whatever is dynamic should be implemented dynamic; wha=
tever is partially static, the static part should be static and the dynamic=
 part should be dynamic. It is natural and correct treatment.
>=20
>=20=20
>=20
> Purely dynamic situation in which the final query can not be determined i=
n any fashion, will have to be constructed either outside of the function o=
r within, so EXECUTE is the only way to handle. What do you think of requir=
ing the caller to construct such a static statement as =93select count(*) f=
rom sometable=94 and use EXECUTE?
>=20
>=20=20
>=20
> By the same token, =93select thiscolumn from thistable where upper(thisco=
lumn) like $1=94 has to be treated statically for the static part. The only=
 unknown is the parameter, which can be, by the right design and implementa=
tion, delayed till execution (runtime). The code to deal with this is what =
I pointed out (via a conditional). If the plan is a piece of code, then the=
 conditional will be in it. If the plan is a piece of text to be further in=
terpreted for actual execution (why would one want to do it that way?), the=
 conditional could contain a text reference to two pieces of code (w/o the =
use of the index). If it is not properly designed/implemented and such situ=
ation results in the loss of the said ability, it is a defect to be address=
ed. Whether one wants to address it is one issue. A defect is a defect.
>=20
>=20=20
>=20
> You seem to suggest that the plan was only built at (the first) execution=
. That is poor design/implementation.
>=20
>=20=20
>=20
> Lastly, what is a function? One of the fundamental features of a function=
 is encapsulation. One is guaranteed some well-defined output based on well=
-defined input. No implementation detail is necessary or is obliged to be a=
vailable. The user does not have to know what table or anything for that ma=
tter is involved. When you push for what is suggested as a work-around, it =
defeats one of the basic purposes for a function.
>=20
>=20=20
>=20
> More can be said, but why one wants to defend a defect is quite beyond me.
>=20
>=20=20
>=20
>                   -- Korry
>=20

Re: BUG #5816: index not used in function

From
"frank"
Date:
Robert,

Thanks for the reply.

It should always be understood that things could be much more complex
than meets the eye. I am not warranted to demand or expect anything.

The work-around does not work for me, but that is a different issue. I
have no right to lecture anybody, only that sometimes one tends to
forget some basics. Me no exception.

My concern was the impression I got on the finding of an excuse, which I
do not believe was deliberate. Again, I was not searching for "best
possible". The problem is some fundamental issue. I have not seen the
code or the design, I do not know for sure. But if planner was called at
run time for all (or almost all) query statements inside the function,
then it is a big design/implementation issue.

Once again, I appreciate the messages from all three of you who try to
help people, and I understand the difficulty people face.

Regards,
Frank

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Wednesday, January 12, 2011 11:10 AM
To: frank
Cc: Korry Douglas; Kevin Grittner; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #5816: index not used in function

On Mon, Jan 10, 2011 at 9:07 PM, frank <frank@ros-i.com> wrote:
> More can be said, but why one wants to defend a defect is quite beyond
me.

I guess the question is whether you want to solve your problem or
improve PostgreSQL.  If you want to solve your immediate problem, the
advice given thus far is probably enough for you to do it.  If the
goal is to improve PostgreSQL, that's a worthy goal and I don't think
anyone here would say otherwise.  The current behavior is not ideal,
but improving it is not easy.

When people say "it's not a bug", they don't mean "this is the best
possible behavior anyone can imagine"; they mean "we know that it
works this way and we haven't actually figured out a way to do any
better yet without causing other problems".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company