Thread: Microoptimization of Bitmapset usage in postgres_fdw
There are a couple of places in postgres_fdw where we check if the Bitmapset has multiple members using bms_num_members(), without storing the returned count. The attached patch instead use bms_membership() which is optimized for just that usecase, and (IMO) makes for clearer code. cheers ./daniel
Attachment
On Tue, May 29, 2018 at 9:10 PM, Daniel Gustafsson <daniel@yesql.se> wrote: > There are a couple of places in postgres_fdw where we check if the Bitmapset > has multiple members using bms_num_members(), without storing the returned > count. The attached patch instead use bms_membership() which is optimized for > just that usecase, and (IMO) makes for clearer code. > +1 for that change. Some of those usages of bms_num_members() were added by me. Sorry for that. It was mostly because I wasn't aware of bms_membership() when I wrote that code. May be we should add a comment in the prologue of bms_num_members() like "Note: if the callers is interested only knowing whether the bitmapset has 0, 1 or more members, it should call bms_membership().". I understand that bms_membership() resides just below bms_num_members(), but 1. bms_membership doesn't sound like it would tell me that 2. bms_membership's prologue refers to bms_num_members, which should have been the other way; we want the developers to use bms_membership instead of bms_num_members(), when they land on bms_num_members. It's less likely that somebody landing on bms_membership would want to use bms_num_members(). I am not sure if this can b e squeezed into HEAD right now. It looks safe to do so. But in case not, please add this to the next commitfest so that it's not forgotten. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
> On 30 May 2018, at 09:36, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > > On Tue, May 29, 2018 at 9:10 PM, Daniel Gustafsson <daniel@yesql.se> wrote: >> There are a couple of places in postgres_fdw where we check if the Bitmapset >> has multiple members using bms_num_members(), without storing the returned >> count. The attached patch instead use bms_membership() which is optimized for >> just that usecase, and (IMO) makes for clearer code. > > +1 for that change. Some of those usages of bms_num_members() were > added by me. Sorry for that. It was mostly because I wasn't aware of > bms_membership() when I wrote that code. May be we should add a > comment in the prologue of bms_num_members() like "Note: if the > callers is interested only knowing whether the bitmapset has 0, 1 or > more members, it should call bms_membership().". I understand that > bms_membership() resides just below bms_num_members(), but 1. > bms_membership doesn't sound like it would tell me that 2. > bms_membership's prologue refers to bms_num_members, which should have > been the other way; we want the developers to use bms_membership > instead of bms_num_members(), when they land on bms_num_members. It's > less likely that somebody landing on bms_membership would want to use > bms_num_members(). That makes sense, I’ve added a second patch to this thread which expands the comment on bms_num_members to make it clearer. > I am not sure if this can b e squeezed into HEAD right now. It looks > safe to do so. But in case not, please add this to the next commitfest > so that it's not forgotten. Will do, thanks for reviewing. cheers ./daniel
Attachment
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: not tested Hello, The v2 patches look good to me. However, I found a couple other places where we might be able to use this micro-optimization. 1) dependencies_clauselist_selectivity() in dependencies.c /* * If there's not at least two distinct attnums then reject the whole list * of clauses. We must return 1.0 so the calling function's selectivity is * unaffected. */ if (bms_num_members(clauses_attnums) < 2) { pfree(list_attnums); return 1.0; } 2) BuildRelationExtStatistics() in extended_stats.c. /* check allowed number of dimensions */ Assert(bms_num_members(stat->columns) >= 2 && bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS); Nathan The new status of this patch is: Waiting on Author
> On 14 Jun 2018, at 16:56, Nathan Bossart <bossartn@amazon.com> wrote: > The v2 patches look good to me. However, I found a couple other > places where we might be able to use this micro-optimization. Thanks a lot for your review! > 1) dependencies_clauselist_selectivity() in dependencies.c > > /* > * If there's not at least two distinct attnums then reject the whole list > * of clauses. We must return 1.0 so the calling function's selectivity is > * unaffected. > */ > if (bms_num_members(clauses_attnums) < 2) > { > pfree(list_attnums); > return 1.0; > } I agree with this one, not sure why I missed that when grep’ing around while writing the patch. Fixed in the attached v3 patchset (which are now awkwardly named but I didn’t change that). > 2) BuildRelationExtStatistics() in extended_stats.c. > > /* check allowed number of dimensions */ > Assert(bms_num_members(stat->columns) >= 2 && > bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS); Since this usage is in an assertion I don’t see the value in changing it as the current programming is more optimized for readability. cheers ./daniel
Attachment
Thanks for the updated patch set. On 6/14/18, 2:34 PM, "Daniel Gustafsson" <daniel@yesql.se> wrote: >> 2) BuildRelationExtStatistics() in extended_stats.c. >> >> /* check allowed number of dimensions */ >> Assert(bms_num_members(stat->columns) >= 2 && >> bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS); > > Since this usage is in an assertion I don’t see the value in changing it as the > current programming is more optimized for readability. Agreed. I hesitated to even point this one out. I'll go ahead and mark this as Ready for Committer. Nathan
On Thu, Jun 14, 2018 at 08:14:54PM +0000, Bossart, Nathan wrote: > I'll go ahead and mark this as Ready for Committer. Another thing not mentioned on this thread is that bms_membership is faster than bms_num_members by design with many members, so this change makes sense to shave a couple of cycles. /* * bms_num_members - count members of set + * + * In situations where the exact count isn't required, bms_membership can be + * used to test if the set has 0, 1 or multiple members. */ bms_membership is a couple of lines down, I am not sure I see much point in duplicating what's already present. - if (bms_num_members(clauses_attnums) < 2) + if (bms_membership(clauses_attnums) != BMS_MULTIPLE) For this one, the comment above directly mentions that at least two attnums need to be present, so it seems to me that the current coding is easier to understand and intentional... So I would be incline to not change it. I think that this should not go into the tree until REL_11_STABLE gets created though, so this will have to wait a bit. -- Michael
Attachment
> On 17 Jun 2018, at 14:47, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jun 14, 2018 at 08:14:54PM +0000, Bossart, Nathan wrote: >> I'll go ahead and mark this as Ready for Committer. > > Another thing not mentioned on this thread is that bms_membership is > faster than bms_num_members by design with many members, so this change > makes sense to shave a couple of cycles. > > /* > * bms_num_members - count members of set > + * > + * In situations where the exact count isn't required, bms_membership can be > + * used to test if the set has 0, 1 or multiple members. > */ > bms_membership is a couple of lines down, I am not sure I see much point > in duplicating what's already present. It is indeed a bit of a duplication, but the only reason this patch came to be was that the original author of the instances in postgres_fdw had missed said comment on bms_membership a few lines down. > - if (bms_num_members(clauses_attnums) < 2) > + if (bms_membership(clauses_attnums) != BMS_MULTIPLE) > For this one, the comment above directly mentions that at least two > attnums need to be present, so it seems to me that the current coding is > easier to understand and intentional... So I would be incline to not > change it. I don’t have any strong feelings either way, and will leave that call to the committer who picks this up. I agree that the current coding is easy to understand but I don’t see this being much harder. > I think that this should not go into the tree until REL_11_STABLE gets > created though, so this will have to wait a bit. 100% agree. cheers ./daniel
On Sun, Jun 17, 2018 at 07:23:19PM +0200, Daniel Gustafsson wrote: > On 17 Jun 2018, at 14:47, Michael Paquier <michael@paquier.xyz> wrote: >> - if (bms_num_members(clauses_attnums) < 2) >> + if (bms_membership(clauses_attnums) != BMS_MULTIPLE) >> For this one, the comment above directly mentions that at least two >> attnums need to be present, so it seems to me that the current coding is >> easier to understand and intentional... So I would be incline to not >> change it. > > I don’t have any strong feelings either way, and will leave that call to the > committer who picks this up. I agree that the current coding is easy to > understand but I don’t see this being much harder. I have looked at that again, and pushed the portion for postgres_fdw as the intention is clear, while leaving out the part from the statistics per the comment close by. Thanks! -- Michael