Thread: Microoptimization of Bitmapset usage in postgres_fdw

Microoptimization of Bitmapset usage in postgres_fdw

From
Daniel Gustafsson
Date:
There are a couple of places in postgres_fdw where we check if the Bitmapset
has multiple members using bms_num_members(), without storing the returned
count.  The attached patch instead use bms_membership() which is optimized for
just that usecase, and (IMO) makes for clearer code.

cheers ./daniel


Attachment

Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Ashutosh Bapat
Date:
On Tue, May 29, 2018 at 9:10 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
> There are a couple of places in postgres_fdw where we check if the Bitmapset
> has multiple members using bms_num_members(), without storing the returned
> count.  The attached patch instead use bms_membership() which is optimized for
> just that usecase, and (IMO) makes for clearer code.
>

+1 for that change. Some of those usages of bms_num_members() were
added by me. Sorry for that. It was mostly because I wasn't aware of
bms_membership() when I wrote that code. May be we should add a
comment in the prologue of bms_num_members() like "Note: if the
callers is interested only knowing whether the bitmapset has 0, 1 or
more members, it should call bms_membership().". I understand that
bms_membership() resides just below bms_num_members(), but 1.
bms_membership doesn't sound like it would tell me that 2.
bms_membership's prologue refers to bms_num_members, which should have
been the other way; we want the developers to use bms_membership
instead of bms_num_members(), when they land on bms_num_members. It's
less likely that somebody landing on bms_membership would want to use
bms_num_members().


I am not sure if this can b e squeezed into HEAD right now. It looks
safe to do so. But in case not, please add this to the next commitfest
so that it's not forgotten.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Daniel Gustafsson
Date:
> On 30 May 2018, at 09:36, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>
> On Tue, May 29, 2018 at 9:10 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>> There are a couple of places in postgres_fdw where we check if the Bitmapset
>> has multiple members using bms_num_members(), without storing the returned
>> count.  The attached patch instead use bms_membership() which is optimized for
>> just that usecase, and (IMO) makes for clearer code.
>
> +1 for that change. Some of those usages of bms_num_members() were
> added by me. Sorry for that. It was mostly because I wasn't aware of
> bms_membership() when I wrote that code. May be we should add a
> comment in the prologue of bms_num_members() like "Note: if the
> callers is interested only knowing whether the bitmapset has 0, 1 or
> more members, it should call bms_membership().". I understand that
> bms_membership() resides just below bms_num_members(), but 1.
> bms_membership doesn't sound like it would tell me that 2.
> bms_membership's prologue refers to bms_num_members, which should have
> been the other way; we want the developers to use bms_membership
> instead of bms_num_members(), when they land on bms_num_members. It's
> less likely that somebody landing on bms_membership would want to use
> bms_num_members().

That makes sense, I’ve added a second patch to this thread which expands the
comment on bms_num_members to make it clearer.

> I am not sure if this can b e squeezed into HEAD right now. It looks
> safe to do so. But in case not, please add this to the next commitfest
> so that it's not forgotten.

Will do, thanks for reviewing.

cheers ./daniel


Attachment

Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Nathan Bossart
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Hello,

The v2 patches look good to me.  However, I found a couple other
places where we might be able to use this micro-optimization.

1) dependencies_clauselist_selectivity() in dependencies.c

    /*
     * If there's not at least two distinct attnums then reject the whole list
     * of clauses. We must return 1.0 so the calling function's selectivity is
     * unaffected.
     */
    if (bms_num_members(clauses_attnums) < 2)
    {
        pfree(list_attnums);
        return 1.0;
    }

2) BuildRelationExtStatistics() in extended_stats.c.

    /* check allowed number of dimensions */
    Assert(bms_num_members(stat->columns) >= 2 &&
           bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);

Nathan

The new status of this patch is: Waiting on Author

Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Daniel Gustafsson
Date:
> On 14 Jun 2018, at 16:56, Nathan Bossart <bossartn@amazon.com> wrote:

> The v2 patches look good to me.  However, I found a couple other
> places where we might be able to use this micro-optimization.

Thanks a lot for your review!

> 1) dependencies_clauselist_selectivity() in dependencies.c
>
>     /*
>      * If there's not at least two distinct attnums then reject the whole list
>      * of clauses. We must return 1.0 so the calling function's selectivity is
>      * unaffected.
>      */
>     if (bms_num_members(clauses_attnums) < 2)
>     {
>         pfree(list_attnums);
>         return 1.0;
>     }

I agree with this one, not sure why I missed that when grep’ing around while
writing the patch.  Fixed in the attached v3 patchset (which are now awkwardly
named but I didn’t change that).

> 2) BuildRelationExtStatistics() in extended_stats.c.
>
>     /* check allowed number of dimensions */
>     Assert(bms_num_members(stat->columns) >= 2 &&
>            bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);

Since this usage is in an assertion I don’t see the value in changing it as the
current programming is more optimized for readability.

cheers ./daniel


Attachment

Re: Microoptimization of Bitmapset usage in postgres_fdw

From
"Bossart, Nathan"
Date:
Thanks for the updated patch set.

On 6/14/18, 2:34 PM, "Daniel Gustafsson" <daniel@yesql.se> wrote:
>> 2) BuildRelationExtStatistics() in extended_stats.c.
>> 
>>     /* check allowed number of dimensions */
>>     Assert(bms_num_members(stat->columns) >= 2 &&
>>            bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);
>
> Since this usage is in an assertion I don’t see the value in changing it as the
> current programming is more optimized for readability.

Agreed.  I hesitated to even point this one out.

I'll go ahead and mark this as Ready for Committer.

Nathan


Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Michael Paquier
Date:
On Thu, Jun 14, 2018 at 08:14:54PM +0000, Bossart, Nathan wrote:
> I'll go ahead and mark this as Ready for Committer.

Another thing not mentioned on this thread is that bms_membership is
faster than bms_num_members by design with many members, so this change
makes sense to shave a couple of cycles.

 /*
  * bms_num_members - count members of set
+ *
+ * In situations where the exact count isn't required, bms_membership can be
+ * used to test if the set has 0, 1 or multiple members.
   */
bms_membership is a couple of lines down, I am not sure I see much point
in duplicating what's already present.

-   if (bms_num_members(clauses_attnums) < 2)
+   if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
For this one, the comment above directly mentions that at least two
attnums need to be present, so it seems to me that the current coding is
easier to understand and intentional...  So I would be incline to not
change it.

I think that this should not go into the tree until REL_11_STABLE gets
created though, so this will have to wait a bit.
--
Michael

Attachment

Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Daniel Gustafsson
Date:
> On 17 Jun 2018, at 14:47, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jun 14, 2018 at 08:14:54PM +0000, Bossart, Nathan wrote:
>> I'll go ahead and mark this as Ready for Committer.
>
> Another thing not mentioned on this thread is that bms_membership is
> faster than bms_num_members by design with many members, so this change
> makes sense to shave a couple of cycles.
>
> /*
>  * bms_num_members - count members of set
> + *
> + * In situations where the exact count isn't required, bms_membership can be
> + * used to test if the set has 0, 1 or multiple members.
>   */
> bms_membership is a couple of lines down, I am not sure I see much point
> in duplicating what's already present.

It is indeed a bit of a duplication, but the only reason this patch came to be
was that the original author of the instances in postgres_fdw had missed said
comment on bms_membership a few lines down.

> -   if (bms_num_members(clauses_attnums) < 2)
> +   if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
> For this one, the comment above directly mentions that at least two
> attnums need to be present, so it seems to me that the current coding is
> easier to understand and intentional...  So I would be incline to not
> change it.

I don’t have any strong feelings either way, and will leave that call to the
committer who picks this up.  I agree that the current coding is easy to
understand but I don’t see this being much harder.

> I think that this should not go into the tree until REL_11_STABLE gets
> created though, so this will have to wait a bit.

100% agree.

cheers ./daniel

Re: Microoptimization of Bitmapset usage in postgres_fdw

From
Michael Paquier
Date:
On Sun, Jun 17, 2018 at 07:23:19PM +0200, Daniel Gustafsson wrote:
> On 17 Jun 2018, at 14:47, Michael Paquier <michael@paquier.xyz> wrote:
>> -   if (bms_num_members(clauses_attnums) < 2)
>> +   if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
>> For this one, the comment above directly mentions that at least two
>> attnums need to be present, so it seems to me that the current coding is
>> easier to understand and intentional...  So I would be incline to not
>> change it.
>
> I don’t have any strong feelings either way, and will leave that call to the
> committer who picks this up.  I agree that the current coding is easy to
> understand but I don’t see this being much harder.

I have looked at that again, and pushed the portion for postgres_fdw as
the intention is clear, while leaving out the part from the statistics
per the comment close by.  Thanks!
--
Michael

Attachment