Thread: Bitmapset data type???

Bitmapset data type???

From
Martha Chronopoulou
Date:
Hi all,

I'm tring to understand a part of code of postgres and I saw a line like
this:
"bms_is_subset(restrictinfo->right_relids,outerrelids)"  (createplan.c,
function get_switched_clauses() at ~/src/backend/optimizer/plan).
I don't understand the data types of the the arguments
of that function. The datatype of those arguments is Bitmapset,
i.e.
typedef struct Bitmapset
{
int nwords;
bitmapword words[1];
}Bitmapset;             (bitmapset.h at ~/src/include/nodes)

and bitmapword is:
typedef unit32 bitmapword; (bitmapset.h at ~/src/include/nodes)

and the prototype of the function is:
extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
(bitmapset.h at ~/src/include/nodes)


The following is at the file bitmapset.h at dir  ~/src/include/nodes)
"A bitmap set can represent any set of nonnegative integers, although it is mainly intented for sets where the maximum
valueis not large, say
 
at most a few hundred.By convention, a NULL pointer is always accepted
by all operations to represent the empty set.(But beware that this is
not the only representation of the empty set. Use bms_is_empty() in
preference to testing for NULL.)"

Can someone explain to me please the datatype Bitmapset?
I'm confused.

Thanks in advance!!! 
- martha



Re: Bitmapset data type???

From
Alvaro Herrera
Date:
On Sun, Dec 19, 2004 at 05:39:43PM +0200, Martha Chronopoulou wrote:

Hi,

> "A bitmap set can represent any set of nonnegative integers, although it
> is mainly intented for sets where the maximum value is not large, say
> at most a few hundred."

Clearly this can only come from the contorted mind of a JPEG hacker.
(And indeed it does, as you can see from the CVS logs.)


> I'm tring to understand a part of code of postgres and I saw a line like
> this:
> "bms_is_subset(restrictinfo->right_relids,outerrelids)"  (createplan.c,
> function get_switched_clauses() at ~/src/backend/optimizer/plan).

What that is doing is check whether restrictingo->right_relids is a
subset of outerrelids.

> I don't understand the data types of the the arguments
> of that function.

The only thing you need to know is that it represents a set of integers.
Anything beyond that is going to drive you crazy (and is irrelevant
anyway.)

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)


Re: Bitmapset data type???

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Sun, Dec 19, 2004 at 05:39:43PM +0200, Martha Chronopoulou wrote:
>> I'm tring to understand a part of code of postgres and I saw a line like
>> this:
>> "bms_is_subset(restrictinfo->right_relids,outerrelids)"  (createplan.c,
>> function get_switched_clauses() at ~/src/backend/optimizer/plan).

> What that is doing is check whether restrictingo->right_relids is a
> subset of outerrelids.

In particular, it's looking for the case where we are using a WHERE
clause like "a.f1 = b.f2" as a merge or hash join clause, but we decided
that b ought to be the outer (left-side) relation in the join.  The
executor expects the outer relation's variable to be on the left in the
constructed plan's join clause, so we've got to commute the clause
in such cases.

The reason it's a subset test is that in three-or-more-relation queries,
we might be doing this in an upper join level, where the selected
joining sequence is (b join c) join a.  So the relations mentioned on
the right side of the clause might be just a subset of those we have so
far collected into a single joined relation.

In recent releases (perhaps only 7.4 and later, not sure offhand) we can
handle expressions as arguments of merge/hash join clauses, so the
expression might even be likea.f1 = (b.f2 + c.f3)
This is why all the relation-membership fields have to be sets and
not just single relations.

We know by construction that there is no overlap between the sets of
relations being joined (ie we'd never do b join b).  We also know
there's no overlap between the relation membership of the sets of
variables on the two sides of the clause (else it wouldn't have been
selected as a usable merge or hash join clause).  So this subset test
is sufficient to distinguish the two cases we have to distinguish.
There are several other ways it could have been written, of course;
eg we could have tested left_relids for not being a subset of
outerrelids, or we could have passed in innerrelids and made the
tests against that.

If what's bugging you is just the physical representation: when we have
to reason about sets of relations in the planner, we represent them as
sets of integers where the individual integers are rangetable indexes
for the relations.  Bitmapset is just an abstract data type that happens
to provide operations that are convenient for the reasoning we need to
do.  (Of course that's not by chance ;-) ... but the internal
representation of Bitmapset is not at all the planner's concern.  The
planner just sees it as a set of integers.)
        regards, tom lane