Thread: Problem with index in OR'd expression

Problem with index in OR'd expression

From
postgresql.org@tgice.com
Date:
Hello,

I've been using PostgreSQL for a few years and mostly love it.  Aside
from a few (perceived, anyway) annoying limitations in PL/PGSQL (which I
almost exclusively am using for db interaction), I'm very satisfied with it.

I ran across this problem several months back and decided to blow it off
back then, but now I'm really wanting to understand why it's happening,
if there's a work around, if I'm doing something wrong, and if it's just
a shortcoming / bug if the developers are aware of it and might
implement a fix some day.

I've already done a fair amount of testing (a total of hours, I'm sure)
and Googling around on it.  I also read this FAQ on indexes:

   http://www.postgresql.org/docs/faqs.FAQ.html#item4.6

I think my issue is not explained by any of the resources I reviewed.

To simplify my actual problem, I conducted a very simple test.  First I
defined a table:

create table t (
   c1 int primary key,
   c2 int
)

with the single index (primary key).  Then I filled the table with
100,000 rows using a quick pl/pgsql function (c1 = 1, 2, 3... 100,000).

Then I ran these two tests (with EXPLAIN in pgAdmin):

   select * from t where c1 = 75000;

   select * from t where ((0 is null) OR (c1 = 75000));

The first one properly uses the index on c1, the second does not.
Obviously, a human looking at the second one would realize it's
essentially identical to the first and properly evaluate the (0 is null)
part once (since it's immutable) and then ignore it for the rest of the
searching.

Now, I'm sure some of you might ask "why the hell are you doing that in
the first place?"  I have a good reason.  I write a lot of pl/pgsql
functions that are search functions with a list of *optional*
parameters.  I don't know ahead of time whether a user will include on
or not.  In MSSQL, what I'm able to do (with no obvious index problems
that I've seen) is add those all to the WHERE clause like this:

   ((vC1 IS NULL) OR (C1 = vC1)) AND
   ((vC2 IS NULL) OR (C2 = vC2)) ...

(here vC1 and vC2 represent variables passed into the pl/pgsql function).

So my question were basically asked at the beginning of this post: is
there another way to get the optimizer to understand what I'm trying to
do here?  Is this a known problem?  Is it working as preferred and
unlikely to change any time soon?  Is there some setting I can hit
somewhere to make it work like I want?

The only solution that I've come up with so far is making all of my
statements that otherwise wouldn't have to be dynamic and then only
including criteria for ones that I really need (based on the tested
nullness of the variables), but I find that highly annoying and have run
into other problems as a result as well.

I'd appreciate any suggestions you might have to help resolve this.

Thank,

John Lawler

Re: Problem with index in OR'd expression

From
Tom Lane
Date:
postgresql.org@tgice.com writes:
>    select * from t where c1 = 75000;
>    select * from t where ((0 is null) OR (c1 = 75000));

> The first one properly uses the index on c1, the second does not.
> Obviously, a human looking at the second one would realize it's
> essentially identical to the first and properly evaluate the (0 is null)
> part once (since it's immutable) and then ignore it for the rest of the
> searching.

Well, you could update --- 8.2 contains code to recognize that the IS
NULL expression is constant, but prior releases do not.

However, if you're hoping to do this:

>    ((vC1 IS NULL) OR (C1 = vC1)) AND
>    ((vC2 IS NULL) OR (C2 = vC2)) ...

you're still gonna lose because those are variables not constants ...

            regards, tom lane

Re: Problem with index in OR'd expression

From
postgresql.org@tgice.com
Date:
Tom Lane wrote:
> Well, you could update --- 8.2 contains code to recognize that the IS
> NULL expression is constant, but prior releases do not.

That's excellent to hear -- I'd missed that in my perusing of the
changelogs between 8.0.x and 8.2.  That does give me one more reason to
upgrade.  It appears I did not mention what version I was running --
sorry for that, though you guessed it was < 8.2.  It's actually 8.0.x.

> However, if you're hoping to do this:
>
>>    ((vC1 IS NULL) OR (C1 = vC1)) AND
>>    ((vC2 IS NULL) OR (C2 = vC2)) ...
>
> you're still gonna lose because those are variables not constants ...

Well, that *is* what I'm hoping to do.  I understand how (0 IS NULL) is
different from (variable IS NULL), but isn't it reasonable to expect
that PG could evaluate that expression only once (knowing that the
variable couldn't change during the course of the query execution) and
then treat that expression as constant?  I appreciate that you're saying
that it won't work even in 8.2, but what I'm getting at is would it be
possible to add it in the future?

As I mentioned, I'm pretty sure that that must be what MSSQL (6.5, 7,
2000 and 2005 [all of which I've had some experience with]) seem to be
doing.

Now failing all of this, does any one have a better idea for what I'm
trying to do?  A simple syntax for optionally including WHERE criteria
depending on the null-ness of variables (w/o having to go to dynamic
execution)?

Thanks for your reply Tom.

jl

Re: Problem with index in OR'd expression

From
postgresql.org@tgice.com
Date:
postgresql.org@tgice.com wrote:
> Tom Lane wrote:
>> you're still gonna lose because those are variables not constants ...
>
> Well, that *is* what I'm hoping to do.  I understand how (0 IS NULL) is
> different from (variable IS NULL), but isn't it reasonable to expect
> that PG could evaluate that expression only once (knowing that the
> variable couldn't change during the course of the query execution) and
> then treat that expression as constant?

Also, I should mention that in my case, the "variables" are actually
constants either defined in the function param list (and thus
non-modifiable within the function body) or defined with:

   DECLARE vC1 CONSTANT varchar(10) := 'blah'

-type syntax.  I would submit that in that situation, it would be
reasonable for a user to expect my suggested syntax to still use the
indicated indexes.

Do you agree?  It it possible that something other than picking up the
'IS NULL' boolean as constant has changed between 8.0.x & 8.2 that might
make this work?

Thanks,

jl

Re: Problem with index in OR'd expression

From
Tom Lane
Date:
postgresql.org@tgice.com writes:
> I would submit that in that situation, it would be
> reasonable for a user to expect my suggested syntax to still use the
> indicated indexes.

The only thing that will make that work is if "indexed_col IS NULL" were
an indexable condition, which it isn't because the PG index API only
supports "indexed_col operator something" as an indexable condition
(IS NULL is not an operator, and even if it were, there's no "something"
on its righthand side).  Fixing this has been on the radar screen for
awhile, but it's not done, largely for lack of agreement about a
reasonably clean way to change that API.

            regards, tom lane

Re: Problem with index in OR'd expression

From
postgresql.org@tgice.com
Date:
Tom Lane wrote:
> postgresql.org@tgice.com writes:
>> I would submit that in that situation, it would be
>> reasonable for a user to expect my suggested syntax to still use the
>> indicated indexes.
>
> The only thing that will make that work is if "indexed_col IS NULL" were
> an indexable condition, which it isn't because the PG index API only
> supports "indexed_col operator something" as an indexable condition
> (IS NULL is not an operator, and even if it were, there's no "something"
> on its righthand side).  Fixing this has been on the radar screen for
> awhile, but it's not done, largely for lack of agreement about a
> reasonably clean way to change that API.

Sorry to keep this issue alive even longer, Tom, but I think I may've
been unclear with my example.

I was referring to the situation where one has this in a WHERE clause:

   ((vConstant IS NULL) OR (Table.IndexedCol = vConstant))

where vConstant is a *constant* parameter in a pl/pgsql function.

In the latest versions (8.1 *or* 8.2), would you expect this to
successfully use the index on Table.IndexedCol and not have PG be
confused (into a sequential scan) by the (vConstant IS NULL) expression?

As I indicated, I'm currently running 8.0.x, and am wondering whether it
would be worth the effort to upgrade to 8.1 or 8.2 (Gentoo doesn't yet
have PG at 8.2, and I'm a bit lazy with installing things outside of
Portage) to solve this issue or whether I should just enable a
workaround for now and keep an eye on future releases for a better
solution to this problem.

Thanks again,

John

Re: Problem with index in OR'd expression

From
Ragnar
Date:
On mið, 2006-12-27 at 11:02 -0600, postgresql.org@tgice.com wrote:
>
> I was referring to the situation where one has this in a WHERE clause:
>
>    ((vConstant IS NULL) OR (Table.IndexedCol = vConstant))
>
> where vConstant is a *constant* parameter in a pl/pgsql function.

Reguardless of the issue whether pl/pgsql could be expected to
optimize this case, I find it difficult to imagine a scenario
where this kind of coding makes sense.

I understand that in some cases on would like to do this with
a *variable* to simplify logic, but what possible gain can
be had from doing this with a constant, other that obfuscation?

It does not make sense to spend CPU-time on  optimizing a case
that is not useful or common.

gnari



Re: Problem with index in OR'd expression

From
Tom Lane
Date:
postgresql.org@tgice.com writes:
> I was referring to the situation where one has this in a WHERE clause:
>    ((vConstant IS NULL) OR (Table.IndexedCol = vConstant))
> where vConstant is a *constant* parameter in a pl/pgsql function.

My mistake, I was thinking of Table.IndexedCol IS NULL.

> In the latest versions (8.1 *or* 8.2), would you expect this to
> successfully use the index on Table.IndexedCol and not have PG be
> confused (into a sequential scan) by the (vConstant IS NULL) expression?

No, and there's no "confusion" about it: with that WHERE clause, the
plan might have to return every row in the table.  The index is useless.

Since you know that the two OR'd conditions are mutually exclusive,
perhaps you could transform the query into a UNION operation, as in this
example:

regression=# prepare foo(int) as select * from tenk1 where $1 is null union all select * from tenk1 where unique2 = $1;
PREPARE
regression=# explain analyze execute foo(42);
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..561.02 rows=10001 width=244) (actual time=0.169..0.201 rows=1 loops=1)
   ->  Result  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.012..0.012 rows=0 loops=1)
         One-Time Filter: ($1 IS NULL)
         ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (never executed)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..3.01 rows=1 width=244) (actual time=0.142..0.166 rows=1
loops=1)
         Index Cond: (unique2 = $1)
 Total runtime: 1.092 ms
(7 rows)

Because of the one-time filter, the seqscan isn't executed unless
needed.

However I'm not sure that this sort of approach scales up if you have
more than one of these conditions in a query ...

            regards, tom lane

Re: Problem with index in OR'd expression

From
postgresql.org@tgice.com
Date:
Ragnar wrote:
> Reguardless of the issue whether pl/pgsql could be expected to
> optimize this case, I find it difficult to imagine a scenario
> where this kind of coding makes sense.
>
> I understand that in some cases on would like to do this with
> a *variable* to simplify logic, but what possible gain can
> be had from doing this with a constant, other that obfuscation?

Well, in one way it's a variable, but in another a constant.  It's a
variable in the context of general PG usage... e.g., my application code
may call the function with whatever parameters a user chooses, leaving
some parameters null and others not.

Within the context of the function (after calling), these variables are
constant and I'm attempting to use my OR syntax as shorthand to avoid
having to use a dynamic statement *only* because of this situation.

As I've mentioned, this approach seems to work with MSSQL 6.5+, which I
assume we consider as a valid competitor to PG... if this didn't work
anywhere else, I probably wouldn't even have brought it up.

I'll re-iterate another question I attempted to pose which was: what
have other PG application developers done in this situation?  Is it most
common to just use dynamic statements?

Thanks for your response.

jl

Re: Problem with index in OR'd expression

From
Ragnar
Date:
On mán, 2007-01-01 at 14:21 -0600, postgresql.org@tgice.com wrote:

> Within the context of the function (after calling), these variables are
> constant and I'm attempting to use my OR syntax as shorthand to avoid
> having to use a dynamic statement *only* because of this situation.
> As I've mentioned, this approach seems to work with MSSQL 6.5+, which I
> assume we consider as a valid competitor to PG... if this didn't work
> anywhere else, I probably wouldn't even have brought it up.
>
> I'll re-iterate another question I attempted to pose which was: what
> have other PG application developers done in this situation?  Is it most
> common to just use dynamic statements?

ps/pgsql will prepare the statement caching the plan for
subsequent calls, thus making this not easily optimized.

to avoid this in ps/pgsql , I believe it is common to to make
the statement dynamic by using EXECUTE, thus in effect making
sure the query is planned each time.

gnari