Re: Planner question - "bit" data types - Mailing list pgsql-performance

From Karl Denninger
Subject Re: Planner question - "bit" data types
Date
Msg-id 4AA5BC07.7090806@denninger.net
Whole thread Raw
In response to Re: Planner question - "bit" data types  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Planner question - "bit" data types  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Planner question - "bit" data types  (Robert Haas <robertmhaas@gmail.com>)
Re: Planner question - "bit" data types  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
Robert Haas wrote:
On Mon, Sep 7, 2009 at 8:51 PM, Karl Denninger<karl@denninger.net> wrote: 
Robert Haas wrote:

On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@denninger.net> wrote:


There was a previous thread and I referenced it. I don't have the other one
in my email system any more to follow up to it.

I give up; the attack-dog crowd has successfully driven me off.  Ciao.


Perhaps I'm biased by knowing some of the people involved, but I don't
think anyone on this thread has been anything but polite.  It would
certainly be great if PostgreSQL could properly estimate the
selectivity of expressions like this without resorting to nasty hacks,
but it can't, and unfortunately, there's really no possibility of that
changing any time soon.  Even if someone implements a fix today, the
soonest it will appear in a production release is June 2010.  So, any
suggestion for improvement is going to be in the form of suggesting
that you modify the schema in some way.  I know that's not really what
you're looking for, but unfortunately it's the best we can do.

As far as I can tell, it is not correct to say that you referenced the
previous thread.  I do not see any such reference.

...Robert



I was asking about modifying the schema.

The current schema is an integer being used as a bitmask.  If the planner
knows how to handle a type of "bit(X)" (and will at least FILTER rather than
NESTED LOOP it on a select, as happens for an Integer used in this fashion),
that change is easier than splitting it into individual boolean fields.   
Well, the first several replies seem to address that question - I
think we all agree that won't help.  I'm not sure what you mean by "at
least FILTER rather than NESTED LOOP it on a select".  However,
typically, the time when you get a nested loop is when the planner
believes that the loop will be executed very few times (in other
words, the outer side will return very few rows).  It probably isn't
the case that the planner COULDN'T choose to execute the query in some
other way; rather, the planner believes that the nested loop is faster
because of a (mistaken) belief about how many rows the
bitmap-criterion will actually match.  All the suggestions you've
gotten upthread are tricks to enable the planner to make a better
estimate, which will hopefully cause it to choose a better plan.

As a general statement, selectivity estimation problems are very
painful to work around and often involve substantial application
redesign. In all honesty, I think you've run across one of the easier
variants.  As painful as it is to hear the word easy applied to a
problem that's killing you, there actually IS a good solution to this
problem: use individual boolean fields.  I know that's not what you
want to do, but it's better than "sorry, you're hosed, no matter how
you do this it ain't gonna work".  And I do think there are a few in
the archives that fall into that category.

Good luck, and sorry for the bad news.

...Robert 
The individual boolean fields don't kill me and in terms of some of the application issues they're actually rather easy to code for.

The problem with re-coding for them is extensibility (by those who install and administer the package); a mask leaves open lots of extra bits for "site-specific" use, where hard-coding booleans does not, and since the executable is a binary it instantly becomes a huge problem for everyone but me.

It does appear, however, that a bitfield doesn't evaluate any differently than does an integer used with a mask, so there you have it..... it is what it is, and if I want this sort of selectivity in the search I have no choice.

-- Karl
Attachment

pgsql-performance by date:

Previous
From: Brian Cox
Date:
Subject: Re: Planner question - "bit" data types
Next
From: Alvaro Herrera
Date:
Subject: Re: Planner question - "bit" data types