Add bms_offset_members() function for bitshifting Bitmapsets - Mailing list pgsql-hackers

From David Rowley
Subject Add bms_offset_members() function for bitshifting Bitmapsets
Date
Msg-id CAApHDvq=eEdw2Qp+aSzSOtTSe+h0fnVQ55CcTNqBkLDYiRZmxw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
(v20 material)

While working on some new code which required offsetting the members
of a Bitmapset, I decided to go and write a function to do this rather
than copy the various other places where we manually construct a new
set with a bms_next_member() -> bms_add_member() loop. The new use
case I have is from pulling varattnos from a scan's targetlist, which
there could be several hundred Vars in. I considered this might be
noticeably expensive.

The current manual way we have of doing this is a bit laborious since
we're only doing 1 member per bms_next_member() loop, and also, if the
set has multiple words, we may end up doing several repallocs to
expand the set, perhaps as little as 1 word at a time. That's not to
mention the rather repetitive code that we have to do this in various
places that might be nice to consolidate.

I've attached a patch which adds bms_offset_members(), which does
bitshifting to move the members up or down by the given offset. While
working on this I made a few choices which might be worth a revisit:

1) The function modifies the given set in-place rather than making a new set.
2) The function will ERROR if any member would go above INT_MAX. These
would be inaccessible, and that seems weird/wrong.
3) When offsetting by a negative value, members that would go below
zero fall out of the set silently.

For #1; my original use-case that made me write this didn't need a
copy, so I wrote the function to modify the set in-place. After
hunting down and replacing the relevant existing bms_next_member()
loops with the new function, I noticed all these seem to need a copy.
Currently, I have coded the patch to do
bms_offset_members(bms_copy(set), ...), but that's a little
inefficient as it *could* result in a palloc for the copy then a
repalloc in the offset. If bms_offset_members() just created a new
set, then it could palloc() a set to the exact required size.  The
counterargument to that is that I don't really want to copy the set
for my intended use case. I thought about two versions, but thought
that might be overkill. There could be a boolean parameter to define
that, but we don't do that anywhere else in bitmapset.c, or we could
avoid a copy-paste of the code with an always-inlined helper function.
I couldn't decide, so left it as-is.

For #2, I could have equally made these fall off the top of the set,
but I thought we might want to know about it in the unlikely event
that this ever happens.

#3 We commonly want to get rid of system columns from a
pull_varattnos() set which is offset by
FirstLowInvalidHeapAttributeNumber, so those disappearing silently is
what most use cases seem to want. I expect that's not for revisiting,
but I listed this one anyway as it flies in the face of #2.

Patch attached. Comments welcome.

David

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Support EXCEPT for ALL SEQUENCES publications
Next
From: Pavel Stehule
Date:
Subject: proposal - queryid can be used as filter for auto_explain