Thread: Optimizer & boolean syntax

Optimizer & boolean syntax

From
Daniele Orlandi
Date:
Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

Tested on RC1:

template1=# create table a (a boolean, b text);
CREATE TABLE


.... inserted ~18000 rows with just one true (just to make an index scan  meaningful)....

template1=# vacuum analyze a;
VACUUM
template1=# explain select * from a where a;                     QUERY PLAN
---------------------------------------------------- Seq Scan on a  (cost=0.00..802.64 rows=1 width=11)   Filter: a
(2 rows)

template1=# explain select * from a where a=true;                          QUERY PLAN
-------------------------------------------------------------- Index Scan using a_a on a  (cost=0.00..2.01 rows=1
width=11)  Index Cond: (a = true)
 
(2 rows)

Bye!

--  Daniele Orlandi Planet Srl



Re: Optimizer & boolean syntax

From
Robert Treat
Date:
Using the famous WAG tech, in your first query the optimizer has to
evaluate monitored for each record to determine its value.

Robert Treat

On Thu, 2002-11-21 at 13:39, Daniele Orlandi wrote:
> 
> Are those two syntaxes eqivalent ?
> 
> select * from users where monitored;
> select * from users where monitored=true;
> 
> If the answer is yes, the optimimer probably doesn't agree with you :)
> 
> Tested on RC1:
> 
> template1=# create table a (a boolean, b text);
> CREATE TABLE
> 
> 
> .... inserted ~18000 rows with just one true (just to make an index scan 
>   meaningful)....
> 
> template1=# vacuum analyze a;
> VACUUM
> template1=# explain select * from a where a;
>                       QUERY PLAN
> ----------------------------------------------------
>   Seq Scan on a  (cost=0.00..802.64 rows=1 width=11)
>     Filter: a
> (2 rows)
> 
> template1=# explain select * from a where a=true;
>                            QUERY PLAN
> --------------------------------------------------------------
>   Index Scan using a_a on a  (cost=0.00..2.01 rows=1 width=11)
>     Index Cond: (a = true)
> (2 rows)
> 
> Bye!
> 
> -- 
>   Daniele Orlandi
>   Planet Srl
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html





Re: Optimizer & boolean syntax

From
Stephan Szabo
Date:
On Thu, 21 Nov 2002, Daniele Orlandi wrote:

> Are those two syntaxes eqivalent ?
>
> select * from users where monitored;
> select * from users where monitored=true;
>
> If the answer is yes, the optimimer probably doesn't agree with you :)

That depends on the definition of equivalent.  They presumably give the
same answer (I'm assuming monitored is a boolean), but the latter has
something that's considered an indexable condition and I believe the
former does not (even with enable_seqscan=off the former syntax
appears to give a sequence scan, usually a good sign it's not considered
indexable).



Re: Optimizer & boolean syntax

From
"Christopher Kings-Lynne"
Date:
> > Are those two syntaxes eqivalent ?
> >
> > select * from users where monitored;
> > select * from users where monitored=true;
> >
> > If the answer is yes, the optimimer probably doesn't agree with you :)
>
> That depends on the definition of equivalent.  They presumably give the
> same answer (I'm assuming monitored is a boolean), but the latter has
> something that's considered an indexable condition and I believe the
> former does not (even with enable_seqscan=off the former syntax
> appears to give a sequence scan, usually a good sign it's not considered
> indexable).

I think his point is that they _should_ be equivalent.  Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a' should be
discarded?

Chris




Re: Optimizer & boolean syntax

From
Stephan Szabo
Date:
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

> > > Are those two syntaxes eqivalent ?
> > >
> > > select * from users where monitored;
> > > select * from users where monitored=true;
> > >
> > > If the answer is yes, the optimimer probably doesn't agree with you :)
> >
> > That depends on the definition of equivalent.  They presumably give the
> > same answer (I'm assuming monitored is a boolean), but the latter has
> > something that's considered an indexable condition and I believe the
> > former does not (even with enable_seqscan=off the former syntax
> > appears to give a sequence scan, usually a good sign it's not considered
> > indexable).
>
> I think his point is that they _should_ be equivalent.  Surely there's
> something in the optimiser that discards '=true' stuff, like 'a=a' should be
> discarded?

I figure that's what he meant, but it isn't what was said. ;)

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c.  I'm not sure what the best way to make it work would be given
that presumably we'd want to make col IS TRUE/FALSE use an index at the
same time (since that appears to not do so as well).




Re: Optimizer & boolean syntax

From
"Christopher Kings-Lynne"
Date:
> > I think his point is that they _should_ be equivalent.  Surely there's
> > something in the optimiser that discards '=true' stuff, like 'a=a'
should be
> > discarded?
>
> I figure that's what he meant, but it isn't what was said. ;)
>
> "col" isn't of the general form "indexkey op constant" or "constant op
> indexkey" which I presume it's looking for given the comments in
> indxpath.c.  I'm not sure what the best way to make it work would be given
> that presumably we'd want to make col IS TRUE/FALSE use an index at the
> same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

Chris




Re: Optimizer & boolean syntax

From
Alvaro Herrera
Date:
On Thu, Nov 21, 2002 at 02:45:34PM -0800, Christopher Kings-Lynne wrote:
> > > I think his point is that they _should_ be equivalent.  Surely there's
> > > something in the optimiser that discards '=true' stuff, like 'a=a'
> should be
> > > discarded?

> Not that I see the point of indexing booleans, but hey :)

If one of the values is much more infrequent than the other, you can
probably get a substantial win using a partial index, can't you?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."


Re: Optimizer & boolean syntax

From
"scott.marlowe"
Date:
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

> > > I think his point is that they _should_ be equivalent.  Surely there's
> > > something in the optimiser that discards '=true' stuff, like 'a=a'
> should be
> > > discarded?
> >
> > I figure that's what he meant, but it isn't what was said. ;)
> >
> > "col" isn't of the general form "indexkey op constant" or "constant op
> > indexkey" which I presume it's looking for given the comments in
> > indxpath.c.  I'm not sure what the best way to make it work would be given
> > that presumably we'd want to make col IS TRUE/FALSE use an index at the
> > same time (since that appears to not do so as well).
> 
> Not that I see the point of indexing booleans, but hey :)

While full indexes do seem futile, partial indexes can be quite useful.

select articles from forum where approved is false

if 99.9% of all articles are approved would be quite common.



Re: Optimizer & boolean syntax

From
"scott.marlowe"
Date:
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

> > > I think his point is that they _should_ be equivalent.  Surely there's
> > > something in the optimiser that discards '=true' stuff, like 'a=a'
> should be
> > > discarded?
> >
> > I figure that's what he meant, but it isn't what was said. ;)
> >
> > "col" isn't of the general form "indexkey op constant" or "constant op
> > indexkey" which I presume it's looking for given the comments in
> > indxpath.c.  I'm not sure what the best way to make it work would be given
> > that presumably we'd want to make col IS TRUE/FALSE use an index at the
> > same time (since that appears to not do so as well).
> 
> Not that I see the point of indexing booleans, but hey :)

also, in reference to my last message, even if the % was 50/50, if the 
table was such that the bool was in a table next to a text field with 20k 
or text in it, an index on the bool would be much faster to go through 
than to seq scan the table.



Re: Optimizer & boolean syntax

From
"Christopher Kings-Lynne"
Date:
> > Not that I see the point of indexing booleans, but hey :)
> 
> If one of the values is much more infrequent than the other, you can
> probably get a substantial win using a partial index, can't you?

Yes, I thought of the partial index after I wrote that email :)

Chris




Re: Optimizer & boolean syntax

From
"Christopher Kings-Lynne"
Date:
> > > "col" isn't of the general form "indexkey op constant" or "constant op
> > > indexkey" which I presume it's looking for given the comments in
> > > indxpath.c.  I'm not sure what the best way to make it work would be
given
> > > that presumably we'd want to make col IS TRUE/FALSE use an index at
the
> > > same time (since that appears to not do so as well).
> >
> > Not that I see the point of indexing booleans, but hey :)
>
> also, in reference to my last message, even if the % was 50/50, if the
> table was such that the bool was in a table next to a text field with 20k
> or text in it, an index on the bool would be much faster to go through
> than to seq scan the table.

Hmmm...I'm not sure about that.  Postgres's storage strategry with text will
be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and
it will only be retrieved if it's in the select parameters.

Chris



Re: Optimizer & boolean syntax

From
Stephan Szabo
Date:
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

> > > > "col" isn't of the general form "indexkey op constant" or "constant op
> > > > indexkey" which I presume it's looking for given the comments in
> > > > indxpath.c.  I'm not sure what the best way to make it work would be
> given
> > > > that presumably we'd want to make col IS TRUE/FALSE use an index at
> the
> > > > same time (since that appears to not do so as well).
> > >
> > > Not that I see the point of indexing booleans, but hey :)
> >
> > also, in reference to my last message, even if the % was 50/50, if the
> > table was such that the bool was in a table next to a text field with 20k
> > or text in it, an index on the bool would be much faster to go through
> > than to seq scan the table.
>
> Hmmm...I'm not sure about that.  Postgres's storage strategry with text will
> be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and
> it will only be retrieved if it's in the select parameters.

True, but replace that text with 1500 integers. :)

The only problem with the partial index solution is that it seems to still
only work for the same method of asking for the result, so if you make an
index where col=true, using col IS TRUE or col in a query doesn't seem to
use it.



Re: Optimizer & boolean syntax

From
Daniele Orlandi
Date:
Stephan Szabo wrote:
> On Thu, 21 Nov 2002, Daniele Orlandi wrote:
> 
> 
>>Are those two syntaxes eqivalent ?
>>
>>select * from users where monitored;
>>select * from users where monitored=true;
>>
>>If the answer is yes, the optimimer probably doesn't agree with you :)
> 
> 
> That depends on the definition of equivalent.

By equivalent I mean "means the same thing so, behaves in the same way".

I consider the former syntax to be cleaner and I would tend to use it 
most of times.

For what concerns partial indexes, I agree, it's a better approach for 
this kind of indexing and I did some test:

-------------------------
ctonet=# create index users_monitored on users (monitored) where monitored;
CREATE
ctonet=# explain select * from users where monitored;
NOTICE:  QUERY PLAN:

Index Scan using users_monitored on users  (cost=0.00..9.44 rows=6 
width=186)

EXPLAIN

Nice, it appears to use the index, but:


ctonet=# explain select * from users where monitored=true;
NOTICE:  QUERY PLAN:

Seq Scan on users  (cost=0.00..8298.84 rows=59 width=186)

EXPLAIN
-------------------------

The problem is the opposite... so, effectively, seems that the optimizer 
considers "monitored" and "monitored=true" as two different expressions...

The viceversa is analog and we also can see that the syntax "monitored 
is true" is considered different from the other two syntaxes:

-----------------------
ctonet=# drop index users_monitored;
DROP
ctonet=# create index users_monitored on users (monitored) where 
monitored=true;
CREATE
ctonet=# explain select * from users where monitored=true;
NOTICE:  QUERY PLAN:

Index Scan using users_monitored on users  (cost=0.00..9.45 rows=6 
width=186)

EXPLAIN
ctonet=# explain select * from users where monitored;
NOTICE:  QUERY PLAN:

Seq Scan on users  (cost=0.00..8077.07 rows=59 width=186)

EXPLAIN

ctonet=# create index users_monitored on users (monitored) where 
monitored=true;
CREATE
ctonet=# explain select * from users where monitored is true;
NOTICE:  QUERY PLAN:

Seq Scan on users  (cost=0.00..8077.07 rows=59 width=186)

EXPLAIN
-------------------------

What I propose is that all those syntaxes are made equivalent (by, for 
example, rewriting boolean comparisons to a common form) in order to 
have a more consistent index usage.

Bye!

--  Daniele Orlandi Planet Srl



Re: Optimizer & boolean syntax

From
Tom Lane
Date:
Daniele Orlandi <daniele@orlandi.com> writes:
> The problem is the opposite... so, effectively, seems that the optimizer 
> considers "monitored" and "monitored=true" as two different expressions...

Check.

> The viceversa is analog and we also can see that the syntax "monitored 
> is true" is considered different from the other two syntaxes:

As it should be.

> What I propose is that all those syntaxes are made equivalent

Only two of them are logically equivalent.  Consider NULL.

Even for the first two, assuming equivalence requires hard-wiring an
assumption about the behavior of the "bool = bool" operator; which is
a user-redefinable operator.  I'm not totally comfortable with the idea.
        regards, tom lane


Re: Optimizer & boolean syntax

From
"scott.marlowe"
Date:
On Thu, 21 Nov 2002, Stephan Szabo wrote:

> 
> On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:
> 
> > > > > "col" isn't of the general form "indexkey op constant" or "constant op
> > > > > indexkey" which I presume it's looking for given the comments in
> > > > > indxpath.c.  I'm not sure what the best way to make it work would be
> > given
> > > > > that presumably we'd want to make col IS TRUE/FALSE use an index at
> > the
> > > > > same time (since that appears to not do so as well).
> > > >
> > > > Not that I see the point of indexing booleans, but hey :)
> > >
> > > also, in reference to my last message, even if the % was 50/50, if the
> > > table was such that the bool was in a table next to a text field with 20k
> > > or text in it, an index on the bool would be much faster to go through
> > > than to seq scan the table.
> >
> > Hmmm...I'm not sure about that.  Postgres's storage strategry with text will
> > be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and
> > it will only be retrieved if it's in the select parameters.
> 
> True, but replace that text with 1500 integers. :)
> 
> The only problem with the partial index solution is that it seems to still
> only work for the same method of asking for the result, so if you make an
> index where col=true, using col IS TRUE or col in a query doesn't seem to
> use it.

True.  I always use the syntax:

select * from table where field IS TRUE

OR IS FALSE  for consistency.



Re: Optimizer & boolean syntax

From
Daniele Orlandi
Date:
Tom Lane wrote:
> 
> Only two of them are logically equivalent.  Consider NULL.

Ohhh IS NOT TRUE or IS NOT FALSE also match NULL, I never knew this :)

> Even for the first two, assuming equivalence requires hard-wiring an
> assumption about the behavior of the "bool = bool" operator; which is
> a user-redefinable operator.>  I'm not totally comfortable with the idea.

Ok, I see your point and the problems that may arise, but I hope wou 
will agree with me that from the point of view of the user, both clauses 
have the same meaning and the index usage should be consistant with it.

Unfortunatelly I don't know very well PostgreSQL internals, so I may be 
saying a load of bullshits, but wouldn't be possible to consider any 
evaluation of a bool expression in the form of bool=bool with true as 
the second 'bool'[1] ? At least as a TODO item ?


Thanks!
Bye!

[1] Eventually including the "var IS TRUE" and "var IS FALSE" (not var 
IS NOT ...) which already are special syntax cases if I am not wrong.

--  Daniele Orlandi Planet Srl