Thread: Planner question - "bit" data types

Planner question - "bit" data types

From
Karl Denninger
Date:
Does the planner know how to use indices to optimize these queries?

For reference, I was having SEVERE performance problems with the
following comparison in an SQL statement where "mask" was an integer:

"select ... from .... where ...... and (permission & mask = permission)"

This resulted in the planner deciding to run a nested loop and
extraordinarily poor performance.

I can probably recode the application to use a field of type "bit(32)"
and either cast to an integer or have the code do the conversion
internally (its just a shift eh?)

The question is whether the above statement will be reasonably planned
if "mask" is a bit type.


-- Karl Denninger

Attachment

Re: Planner question - "bit" data types

From
Josh Berkus
Date:
Karl,

> For reference, I was having SEVERE performance problems with the
> following comparison in an SQL statement where "mask" was an integer:
>
> "select ... from .... where ...... and (permission & mask = permission)"

AFAIK, the only way to use an index on these queries is through
expression indexes.  That's why a lot of folks use INTARRAY instead; it
comes with a GIN index type.

It would probably be possible to create a new index type using GiST or
GIN which indexed bitstrings automatically, but I don't know that anyone
has done it yet.

Changing your integer to a bitstring will not, to my knowledge, improve
this.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

Re: Planner question - "bit" data types

From
Merlin Moncure
Date:
On Fri, Sep 4, 2009 at 6:29 PM, Josh Berkus<josh@agliodbs.com> wrote:
> Karl,
>
>> For reference, I was having SEVERE performance problems with the
>> following comparison in an SQL statement where "mask" was an integer:
>>
>> "select ... from .... where ...... and (permission & mask = permission)"
>
> AFAIK, the only way to use an index on these queries is through
> expression indexes.  That's why a lot of folks use INTARRAY instead; it
> comes with a GIN index type.
>
> It would probably be possible to create a new index type using GiST or
> GIN which indexed bitstrings automatically, but I don't know that anyone
> has done it yet.
>
> Changing your integer to a bitstring will not, to my knowledge, improve
> this.

agreed.   also, gist/gin is no free lunch, maintaining these type of
indexes is fairly expensive.

If you are only interested in one or a very small number of cases of
'permission', you can use an expression index to target constant
values:

"select ... from .... where ...... and (permission & mask = permission)"

create index foo_permission_xyz_idx on foo((64 & mask = 64));
select * from foo where 64 & mask = 64; --indexed!

this optimizes a _particular_ case of permission into a boolean based
index.  this can be a big win if the # of matching cases is very small
or you want to use this in a multi-column index.

merlin

Re: Planner question - "bit" data types

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> If you are only interested in one or a very small number of cases of
> 'permission', you can use an expression index to target constant
> values:

> "select ... from .... where ...... and (permission & mask = permission)"

> create index foo_permission_xyz_idx on foo((64 & mask = 64));
> select * from foo where 64 & mask = 64; --indexed!

A possibly more useful variant is to treat the permission condition
as a partial index's WHERE condition.  The advantage of that is that
the index's actual content can be some other column, so that you can
combine the permission check with a second indexable test.  The index
is still available for queries that don't use the other column, but
it's more useful for those that do.

            regards, tom lane

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Tom Lane wrote:
Merlin Moncure <mmoncure@gmail.com> writes: 
If you are only interested in one or a very small number of cases of
'permission', you can use an expression index to target constant
values:   
 
"select ... from .... where ...... and (permission & mask = permission)"   
 
create index foo_permission_xyz_idx on foo((64 & mask = 64));
select * from foo where 64 & mask = 64; --indexed!   
A possibly more useful variant is to treat the permission condition
as a partial index's WHERE condition.  The advantage of that is that
the index's actual content can be some other column, so that you can
combine the permission check with a second indexable test.  The index
is still available for queries that don't use the other column, but
it's more useful for those that do.
		regards, tom lane
 
That doesn't help in this case as the returned set will typically be quite large, with the condition typically being valid on anywhere from 10-80% of the returned tuples.

What I am trying to avoid is creating a boolean column for EACH potential bit (and an index on each), as that makes the schema non-portable for others and quite messy as well - while there are a handful of "known masks" the system also has a number of "user defined" bit positions that vary from installation to installation.


-- Karl
Attachment

Re: Planner question - "bit" data types

From
Tom Lane
Date:
Karl Denninger <karl@denninger.net> writes:
> That doesn't help in this case as the returned set will typically be
> quite large, with the condition typically being valid on anywhere from
> 10-80% of the returned tuples.

In that case you'd be wasting your time to get it to use an index
for the condition anyway.  Maybe you need to take a step back and
look at the query as a whole rather than focus on this particular
condition.

            regards, tom lane

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Tom Lane wrote:
Karl Denninger <karl@denninger.net> writes: 
That doesn't help in this case as the returned set will typically be
quite large, with the condition typically being valid on anywhere from
10-80% of the returned tuples.   
In that case you'd be wasting your time to get it to use an index
for the condition anyway.  Maybe you need to take a step back and
look at the query as a whole rather than focus on this particular
condition.
		regards, tom lane
 
The query, sans this condition, is extremely fast and contains a LOT of other conditions (none of which cause trouble.)

It is only attempting to filter the returned tuples on the permission bit(s) involved that cause trouble.

-- Karl
Attachment

Re: Planner question - "bit" data types

From
Tom Lane
Date:
Karl Denninger <karl@denninger.net> writes:
> Tom Lane wrote:
>> In that case you'd be wasting your time to get it to use an index
>> for the condition anyway.  Maybe you need to take a step back and
>> look at the query as a whole rather than focus on this particular
>> condition.

> The query, sans this condition, is extremely fast and contains a LOT of
> other conditions (none of which cause trouble.)
> It is only attempting to filter the returned tuples on the permission
> bit(s) involved that cause trouble.

My comment stands: asking about how to use an index for this is the
wrong question.

You never showed us any EXPLAIN results, but I suspect what is happening
is that the planner thinks the "permission & mask = permission"
condition is fairly selective (offhand I think it'd default to
DEFAULT_EQ_SEL or 0.005) whereas the true selectivity per your prior
comment is only 0.1 to 0.8.  This is causing it to change to a plan that
would be good for a small number of rows, when it should stick to a plan
that is good for a large number of rows.

So the right question is "how do I fix the bad selectivity estimate?".
Unfortunately there's no non-kluge answer.  What I think I'd try is
wrapping the condition into a function, say

create function permission_match(perms int, mask int) returns bool
as $$begin return perms & mask = mask; end$$ language plpgsql
strict immutable;

The planner won't know what to make of "where permission_match(perms, 64)"
either, but the default selectivity estimate for a boolean function
is 0.333, much closer to what you need.

Or plan B, which I'd recommend, is to forget the mask business and go
over to a boolean column per permission flag.  Then the planner would
actually have decent statistics about the flag selectivities, and the
queries would be a lot more readable too.  Your objection that you'd
need an index per flag column is misguided --- at these selectivities
an index is really pointless.  And I entirely fail to understand the
complaint about it being unportable; you think "&" is more portable than
boolean?  Only one of those things is in the SQL standard.

            regards, tom lane

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Tom Lane wrote:
Karl Denninger <karl@denninger.net> writes: 
Tom Lane wrote:   
In that case you'd be wasting your time to get it to use an index
for the condition anyway.  Maybe you need to take a step back and
look at the query as a whole rather than focus on this particular
condition.     
The query, sans this condition, is extremely fast and contains a LOT of
other conditions (none of which cause trouble.)
It is only attempting to filter the returned tuples on the permission
bit(s) involved that cause trouble.   
My comment stands: asking about how to use an index for this is the
wrong question.

You never showed us any EXPLAIN results,
Yes I did.  Go back and look at the archives.  I provided full EXPLAIN and EXPLAIN ANALYZE results for the original query.  Sheesh.
Or plan B, which I'd recommend, is to forget the mask business and go
over to a boolean column per permission flag.  Then the planner would
actually have decent statistics about the flag selectivities, and the
queries would be a lot more readable too.  Your objection that you'd
need an index per flag column is misguided --- at these selectivities
an index is really pointless.  And I entirely fail to understand the
complaint about it being unportable; you think "&" is more portable than
boolean?  Only one of those things is in the SQL standard.
		regards, tom lane 
The point isn't portability to other SQL engines - it is to other people's installations.  The bitmask is (since it requires only changing the mask constants in the container file that makes the SQL calls by reference) where explicit columns is not by a long shot.

In any event it looks like that's the only reasonable way to do this, so thanks (I think)

-- Karl
Attachment

Re: Planner question - "bit" data types

From
Alvaro Herrera
Date:
Karl Denninger escribió:
> Tom Lane wrote:

> > You never showed us any EXPLAIN results,
> Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
> and EXPLAIN ANALYZE results for the original query.  Sheesh.

You did?  Where?  This is your first message in this thread:
http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
No EXPLAINs anywhere to be seen.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
There was a previous thread and I referenced it. I don't have the other one in my email system any more to follow up to it.

I give up; the attack-dog crowd has successfully driven me off.  Ciao.

Alvaro Herrera wrote:
Karl Denninger escribió: 
Tom Lane wrote:   
 
You never showed us any EXPLAIN results,     
Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
and EXPLAIN ANALYZE results for the original query.  Sheesh.   
You did?  Where?  This is your first message in this thread:
http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
No EXPLAINs anywhere to be seen.
 
Attachment

Re: Planner question - "bit" data types

From
Merlin Moncure
Date:
On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote:
> There was a previous thread and I referenced it. I don't have the other one
> in my email system any more to follow up to it.
>
> I give up; the attack-dog crowd has successfully driven me off.  Ciao.

Another more standard sql approach is to push the flags out to a
subordinate table.  This is less efficient of course but now you get
to use standard join tactics to match conditions...


merlin

Re: Planner question - "bit" data types

From
"Fernando Hevia"
Date:

> -----Mensaje original-----
> De: Karl Denninger
> Enviado el: Sábado, 05 de Septiembre de 2009 21:19
> Para: Alvaro Herrera
> CC: Tom Lane; Merlin Moncure; Josh Berkus;
> pgsql-performance@postgresql.org
> Asunto: Re: [PERFORM] Planner question - "bit" data types
>
> There was a previous thread and I referenced it. I don't have
> the other one in my email system any more to follow up to it.
>
> I give up; the attack-dog crowd has successfully driven me off.  Ciao.
>
> Alvaro Herrera wrote:
>
>     Karl Denninger escribió:
>
>
>         Tom Lane wrote:
>
>
>
>
>
>             You never showed us any EXPLAIN results,
>
>
>         Yes I did.  Go back and look at the archives.
> I provided full EXPLAIN
>         and EXPLAIN ANALYZE results for the original
> query.  Sheesh.
>
>
>
>     You did?  Where?  This is your first message in this thread:
>
> http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
>     No EXPLAINs anywhere to be seen.
>

I guess this is the post Karl refers to:

http://archives.postgresql.org/pgsql-sql/2009-08/msg00088.php

Still you can't hope that others will recall a post 2 weeks ago, with an
other subject and in an other list!



Re: Planner question - "bit" data types

From
Robert Haas
Date:
On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote:
> There was a previous thread and I referenced it. I don't have the other one
> in my email system any more to follow up to it.
>
> I give up; the attack-dog crowd has successfully driven me off.  Ciao.

Perhaps I'm biased by knowing some of the people involved, but I don't
think anyone on this thread has been anything but polite.  It would
certainly be great if PostgreSQL could properly estimate the
selectivity of expressions like this without resorting to nasty hacks,
but it can't, and unfortunately, there's really no possibility of that
changing any time soon.  Even if someone implements a fix today, the
soonest it will appear in a production release is June 2010.  So, any
suggestion for improvement is going to be in the form of suggesting
that you modify the schema in some way.  I know that's not really what
you're looking for, but unfortunately it's the best we can do.

As far as I can tell, it is not correct to say that you referenced the
previous thread.  I do not see any such reference.

...Robert

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Robert Haas wrote:
On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote: 
There was a previous thread and I referenced it. I don't have the other one
in my email system any more to follow up to it.

I give up; the attack-dog crowd has successfully driven me off.  Ciao.   
Perhaps I'm biased by knowing some of the people involved, but I don't
think anyone on this thread has been anything but polite.  It would
certainly be great if PostgreSQL could properly estimate the
selectivity of expressions like this without resorting to nasty hacks,
but it can't, and unfortunately, there's really no possibility of that
changing any time soon.  Even if someone implements a fix today, the
soonest it will appear in a production release is June 2010.  So, any
suggestion for improvement is going to be in the form of suggesting
that you modify the schema in some way.  I know that's not really what
you're looking for, but unfortunately it's the best we can do.

As far as I can tell, it is not correct to say that you referenced the
previous thread.  I do not see any such reference.

...Robert
 
I was asking about modifying the schema.

The current schema is an integer being used as a bitmask.  If the planner knows how to handle a type of "bit(X)" (and will at least FILTER rather than NESTED LOOP it on a select, as happens for an Integer used in this fashion), that change is easier than splitting it into individual boolean fields.

-- Karl


Attachment

Re: Planner question - "bit" data types

From
Robert Haas
Date:
On Mon, Sep 7, 2009 at 8:51 PM, Karl Denninger<karl@denninger.net> wrote:
> Robert Haas wrote:
>
> On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote:
>
>
> There was a previous thread and I referenced it. I don't have the other one
> in my email system any more to follow up to it.
>
> I give up; the attack-dog crowd has successfully driven me off.  Ciao.
>
>
> Perhaps I'm biased by knowing some of the people involved, but I don't
> think anyone on this thread has been anything but polite.  It would
> certainly be great if PostgreSQL could properly estimate the
> selectivity of expressions like this without resorting to nasty hacks,
> but it can't, and unfortunately, there's really no possibility of that
> changing any time soon.  Even if someone implements a fix today, the
> soonest it will appear in a production release is June 2010.  So, any
> suggestion for improvement is going to be in the form of suggesting
> that you modify the schema in some way.  I know that's not really what
> you're looking for, but unfortunately it's the best we can do.
>
> As far as I can tell, it is not correct to say that you referenced the
> previous thread.  I do not see any such reference.
>
> ...Robert
>
>
>
> I was asking about modifying the schema.
>
> The current schema is an integer being used as a bitmask.  If the planner
> knows how to handle a type of "bit(X)" (and will at least FILTER rather than
> NESTED LOOP it on a select, as happens for an Integer used in this fashion),
> that change is easier than splitting it into individual boolean fields.

Well, the first several replies seem to address that question - I
think we all agree that won't help.  I'm not sure what you mean by "at
least FILTER rather than NESTED LOOP it on a select".  However,
typically, the time when you get a nested loop is when the planner
believes that the loop will be executed very few times (in other
words, the outer side will return very few rows).  It probably isn't
the case that the planner COULDN'T choose to execute the query in some
other way; rather, the planner believes that the nested loop is faster
because of a (mistaken) belief about how many rows the
bitmap-criterion will actually match.  All the suggestions you've
gotten upthread are tricks to enable the planner to make a better
estimate, which will hopefully cause it to choose a better plan.

As a general statement, selectivity estimation problems are very
painful to work around and often involve substantial application
redesign. In all honesty, I think you've run across one of the easier
variants.  As painful as it is to hear the word easy applied to a
problem that's killing you, there actually IS a good solution to this
problem: use individual boolean fields.  I know that's not what you
want to do, but it's better than "sorry, you're hosed, no matter how
you do this it ain't gonna work".  And I do think there are a few in
the archives that fall into that category.

Good luck, and sorry for the bad news.

...Robert

Re: Planner question - "bit" data types

From
Brian Cox
Date:
"pgsql-performance-owner@postgresql.org" wrote:
> On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote:
>  > There was a previous thread and I referenced it. I don't have the
> other one
>  > in my email system any more to follow up to it.
>  >
>  > I give up; the attack-dog crowd has successfully driven me off.  Ciao.
>
> Perhaps I'm biased by knowing some of the people involved, but I don't
> think anyone on this thread has been anything but polite.
I use several online forums and this -- hands down -- is the best: not
only for politeness even when the information I provided was misleading
or the question I asked was, in retrospect, Duh? but also for 1) speed
of response, 2) breadth of ideas and 3) accuracy of information -- often
on complex issues with no simple solution from folk who probably have
more to do than sit around waiting for the next post. My thanks to the
knowledgeable people on this forum.

Brian


Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Robert Haas wrote:
On Mon, Sep 7, 2009 at 8:51 PM, Karl Denninger<karl@denninger.net> wrote: 
Robert Haas wrote:

On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote:


There was a previous thread and I referenced it. I don't have the other one
in my email system any more to follow up to it.

I give up; the attack-dog crowd has successfully driven me off.  Ciao.


Perhaps I'm biased by knowing some of the people involved, but I don't
think anyone on this thread has been anything but polite.  It would
certainly be great if PostgreSQL could properly estimate the
selectivity of expressions like this without resorting to nasty hacks,
but it can't, and unfortunately, there's really no possibility of that
changing any time soon.  Even if someone implements a fix today, the
soonest it will appear in a production release is June 2010.  So, any
suggestion for improvement is going to be in the form of suggesting
that you modify the schema in some way.  I know that's not really what
you're looking for, but unfortunately it's the best we can do.

As far as I can tell, it is not correct to say that you referenced the
previous thread.  I do not see any such reference.

...Robert



I was asking about modifying the schema.

The current schema is an integer being used as a bitmask.  If the planner
knows how to handle a type of "bit(X)" (and will at least FILTER rather than
NESTED LOOP it on a select, as happens for an Integer used in this fashion),
that change is easier than splitting it into individual boolean fields.   
Well, the first several replies seem to address that question - I
think we all agree that won't help.  I'm not sure what you mean by "at
least FILTER rather than NESTED LOOP it on a select".  However,
typically, the time when you get a nested loop is when the planner
believes that the loop will be executed very few times (in other
words, the outer side will return very few rows).  It probably isn't
the case that the planner COULDN'T choose to execute the query in some
other way; rather, the planner believes that the nested loop is faster
because of a (mistaken) belief about how many rows the
bitmap-criterion will actually match.  All the suggestions you've
gotten upthread are tricks to enable the planner to make a better
estimate, which will hopefully cause it to choose a better plan.

As a general statement, selectivity estimation problems are very
painful to work around and often involve substantial application
redesign. In all honesty, I think you've run across one of the easier
variants.  As painful as it is to hear the word easy applied to a
problem that's killing you, there actually IS a good solution to this
problem: use individual boolean fields.  I know that's not what you
want to do, but it's better than "sorry, you're hosed, no matter how
you do this it ain't gonna work".  And I do think there are a few in
the archives that fall into that category.

Good luck, and sorry for the bad news.

...Robert 
The individual boolean fields don't kill me and in terms of some of the application issues they're actually rather easy to code for.

The problem with re-coding for them is extensibility (by those who install and administer the package); a mask leaves open lots of extra bits for "site-specific" use, where hard-coding booleans does not, and since the executable is a binary it instantly becomes a huge problem for everyone but me.

It does appear, however, that a bitfield doesn't evaluate any differently than does an integer used with a mask, so there you have it..... it is what it is, and if I want this sort of selectivity in the search I have no choice.

-- Karl
Attachment

Re: Planner question - "bit" data types

From
Alvaro Herrera
Date:
Karl Denninger escribió:

> The individual boolean fields don't kill me and in terms of some of the
> application issues they're actually rather easy to code for.
>
> The problem with re-coding for them is extensibility (by those who
> install and administer the package); a mask leaves open lots of extra
> bits for "site-specific" use, where hard-coding booleans does not, and
> since the executable is a binary it instantly becomes a huge problem for
> everyone but me.

Did you try hiding the bitmask operations inside a function as Tom
suggested?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Planner question - "bit" data types

From
Robert Haas
Date:
On Mon, Sep 7, 2009 at 10:05 PM, Karl Denninger<karl@denninger.net> wrote:
> The individual boolean fields don't kill me and in terms of some of the
> application issues they're actually rather easy to code for.
>
> The problem with re-coding for them is extensibility (by those who install
> and administer the package); a mask leaves open lots of extra bits for
> "site-specific" use, where hard-coding booleans does not, and since the
> executable is a binary it instantly becomes a huge problem for everyone but
> me.
>
> It does appear, however, that a bitfield doesn't evaluate any differently
> than does an integer used with a mask, so there you have it..... it is what
> it is, and if I want this sort of selectivity in the search I have no
> choice.

You can always create 32 boolean fields and only use some of them,
leaving the others for site-specific use...

...Robert

Re: Planner question - "bit" data types

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 7, 2009 at 10:05 PM, Karl Denninger<karl@denninger.net> wrote:
>> The problem with re-coding for them is extensibility (by those who install
>> and administer the package); a mask leaves open lots of extra bits for
>> "site-specific" use, where hard-coding booleans does not,

> You can always create 32 boolean fields and only use some of them,
> leaving the others for site-specific use...

Indeed.  Why is "user_defined_flag_24" so much worse that "mask &
16777216" ?  Especially when the day comes that you need to add one more
system-defined flag bit?

            regards, tom lane

Re: Planner question - "bit" data types

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Karl Denninger escribi?:
>
> > The individual boolean fields don't kill me and in terms of some of the
> > application issues they're actually rather easy to code for.
> >
> > The problem with re-coding for them is extensibility (by those who
> > install and administer the package); a mask leaves open lots of extra
> > bits for "site-specific" use, where hard-coding booleans does not, and
> > since the executable is a binary it instantly becomes a huge problem for
> > everyone but me.
>
> Did you try hiding the bitmask operations inside a function as Tom
> suggested?

Yes.  In addition, functions that are part of expression indexes do get
their own optimizer statistics, so it does allow you to get optimizer
stats for your test without having to use booleans.

I see this documented in the 8.0 release notes:

     * "ANALYZE" now collects statistics for expression indexes (Tom)
       Expression indexes (also called functional indexes) allow users
       to index not just columns but the results of expressions and
       function calls. With this release, the optimizer can gather and
       use statistics about the contents of expression indexes. This will
       greatly improve the quality of planning for queries in which an
       expression index is relevant.

Is this in our main documentation somewhere?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Bruce Momjian wrote:
Alvaro Herrera wrote: 
Karl Denninger escribi?:
   
The individual boolean fields don't kill me and in terms of some of the
application issues they're actually rather easy to code for.

The problem with re-coding for them is extensibility (by those who
install and administer the package); a mask leaves open lots of extra
bits for "site-specific" use, where hard-coding booleans does not, and
since the executable is a binary it instantly becomes a huge problem for
everyone but me.     
Did you try hiding the bitmask operations inside a function as Tom
suggested?   
Yes.  In addition, functions that are part of expression indexes do get
their own optimizer statistics, so it does allow you to get optimizer
stats for your test without having to use booleans.

I see this documented in the 8.0 release notes:
    * "ANALYZE" now collects statistics for expression indexes (Tom)      Expression indexes (also called functional indexes) allow users      to index not just columns but the results of expressions and      function calls. With this release, the optimizer can gather and      use statistics about the contents of expression indexes. This will      greatly improve the quality of planning for queries in which an      expression index is relevant.

Is this in our main documentation somewhere?
 
Interesting... declaring this:

create function ispermitted(text, integer) returns boolean as $$
select permission & $2 = permission from forum where forum.name=$1;
$$ Language SQL STABLE;

then calling it with "ispermitted(post.forum, '4')" as one of the terms causes the query optimizer to treat it as a FILTER instead of a nested loop, and it works as expected.

However, I don't think I can index that - right - since there are two variables involved which are not part of the table being indexed.....

-- Karl

Attachment

Re: Planner question - "bit" data types

From
Bruce Momjian
Date:
Karl Denninger wrote:
> > Yes.  In addition, functions that are part of expression indexes do get
> > their own optimizer statistics, so it does allow you to get optimizer
> > stats for your test without having to use booleans.
> >
> > I see this documented in the 8.0 release notes:
> >
> >      * "ANALYZE" now collects statistics for expression indexes (Tom)
> >        Expression indexes (also called functional indexes) allow users
> >        to index not just columns but the results of expressions and
> >        function calls. With this release, the optimizer can gather and
> >        use statistics about the contents of expression indexes. This will
> >        greatly improve the quality of planning for queries in which an
> >        expression index is relevant.
> >
> > Is this in our main documentation somewhere?
> >
> >
> Interesting... declaring this:
>
> create function ispermitted(text, integer) returns boolean as $$
> select permission & $2 = permission from forum where forum.name=$1;
> $$ Language SQL STABLE;
>
> then calling it with "ispermitted(post.forum, '4')" as one of the terms
> causes the query optimizer to treat it as a FILTER instead of a nested
> loop, and it works as expected.
>
> However, I don't think I can index that - right - since there are two
> variables involved which are not part of the table being indexed.....

That should index fine.  It is an _expression_ index so it can be pretty
complicated.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Planner question - "bit" data types

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > Interesting... declaring this:
> >
> > create function ispermitted(text, integer) returns boolean as $$
> > select permission & $2 = permission from forum where forum.name=$1;
> > $$ Language SQL STABLE;
> >
> > then calling it with "ispermitted(post.forum, '4')" as one of the terms
> > causes the query optimizer to treat it as a FILTER instead of a nested
> > loop, and it works as expected.
> >
> > However, I don't think I can index that - right - since there are two
> > variables involved which are not part of the table being indexed.....
>
> That should index fine.  It is an _expression_ index so it can be pretty
> complicated.

Oh, you have to use the exact same syntax in there WHERE clause for the
expression index to be used, then use EXPLAIN to see if the index is
used.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Planner question - "bit" data types

From
Karl Denninger
Date:
Bruce Momjian wrote:
Karl Denninger wrote: 
Yes.  In addition, functions that are part of expression indexes do get
their own optimizer statistics, so it does allow you to get optimizer
stats for your test without having to use booleans.

I see this documented in the 8.0 release notes:
    * "ANALYZE" now collects statistics for expression indexes (Tom)      Expression indexes (also called functional indexes) allow users      to index not just columns but the results of expressions and      function calls. With this release, the optimizer can gather and      use statistics about the contents of expression indexes. This will      greatly improve the quality of planning for queries in which an      expression index is relevant.

Is this in our main documentation somewhere?
      
Interesting... declaring this:

create function ispermitted(text, integer) returns boolean as $$
select permission & $2 = permission from forum where forum.name=$1;
$$ Language SQL STABLE;

then calling it with "ispermitted(post.forum, '4')" as one of the terms
causes the query optimizer to treat it as a FILTER instead of a nested
loop, and it works as expected.

However, I don't think I can index that - right - since there are two
variables involved which are not part of the table being indexed.....   
That should index fine.  It is an _expression_ index so it can be pretty
complicated
It does not appear I can create an index on that (not that it appears to be necessary for decent performance)

create index forum_ispermitted on forum using btree(ispermitted(name, permission));
ERROR:  functions in index expression must be marked IMMUTABLE
ticker=#

The function is of course of class STATIC.

-- Karl


Attachment

Re: Planner question - "bit" data types

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Karl Denninger escribi?:
> >
> > > The individual boolean fields don't kill me and in terms of some of the
> > > application issues they're actually rather easy to code for.
> > >
> > > The problem with re-coding for them is extensibility (by those who
> > > install and administer the package); a mask leaves open lots of extra
> > > bits for "site-specific" use, where hard-coding booleans does not, and
> > > since the executable is a binary it instantly becomes a huge problem for
> > > everyone but me.
> >
> > Did you try hiding the bitmask operations inside a function as Tom
> > suggested?
>
> Yes.  In addition, functions that are part of expression indexes do get
> their own optimizer statistics, so it does allow you to get optimizer
> stats for your test without having to use booleans.
>
> I see this documented in the 8.0 release notes:
>
>      * "ANALYZE" now collects statistics for expression indexes (Tom)
>        Expression indexes (also called functional indexes) allow users
>        to index not just columns but the results of expressions and
>        function calls. With this release, the optimizer can gather and
>        use statistics about the contents of expression indexes. This will
>        greatly improve the quality of planning for queries in which an
>        expression index is relevant.
>
> Is this in our main documentation somewhere?

Added with attached, applied patch.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/maintenance.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v
retrieving revision 1.99
diff -c -c -r1.99 maintenance.sgml
*** doc/src/sgml/maintenance.sgml    8 Feb 2010 04:33:51 -0000    1.99
--- doc/src/sgml/maintenance.sgml    23 Feb 2010 02:46:21 -0000
***************
*** 318,323 ****
--- 318,331 ----
       SET STATISTICS</>, or change the database-wide default using the <xref
       linkend="guc-default-statistics-target"> configuration parameter.
      </para>
+
+     <para>
+      Also, by default there is limited information available about
+      the selectivity of functions.  However, if you create an expression
+      index that uses a function call, useful statistics will be
+      gathered about the function, which can greatly improve query
+      plans that use the expression index.
+     </para>
     </tip>
    </sect2>


Re: Planner question - "bit" data types

From
Scott Carey
Date:

On Sep 7, 2009, at 7:05 PM, Karl Denninger wrote:

The individual boolean fields don't kill me and in terms of some of the application issues they're actually rather easy to code for.

The problem with re-coding for them is extensibility (by those who install and administer the package); a mask leaves open lots of extra bits for "site-specific" use, where hard-coding booleans does not, and since the executable is a binary it instantly becomes a huge problem for everyone but me.

It does appear, however, that a bitfield doesn't evaluate any differently than does an integer used with a mask, so there you have it..... it is what it is, and if I want this sort of selectivity in the search I have no choice.

Perhaps, use a view to encapsulate the extensible bit fields?  Then custom installations just modify the view?  I haven't thought through that too far, but it might work.


-- Karl
<karl.vcf>
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance