Re: Making empty Bitmapsets always be NULL - Mailing list pgsql-hackers

From Yuya Watari
Subject Re: Making empty Bitmapsets always be NULL
Date
Msg-id CAJ2pMkYmPti=8V46+tmQtGcqGXAeoaiRBLS=ekF8nC6OQO42sQ@mail.gmail.com
Whole thread Raw
In response to Re: Making empty Bitmapsets always be NULL  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Making empty Bitmapsets always be NULL
List pgsql-hackers
Hello,

On Tue, Mar 7, 2023 at 1:07 PM David Rowley <dgrowleyml@gmail.com> wrote:
> I adjusted the patch to remove the invariant checks and fixed up a
> couple of things I'd missed.  The 0002 patch changes the for loops
> into do while loops. I wanted to see if we could see any performance
> gains from doing this.

I noticed that these patches caused significant degradation while
working on improving planning performance in another thread [1].

In the experiment, I used the query attached to this email. This
workload consists of eight tables, each of which is split into n
partitions. The "query.sql" measures the planning time of a query that
joins these tables. You can quickly reproduce my experiment using the
following commands.

=====
psql -f create-tables.sql
psql -f query.sql
=====

I show the result in the following tables. I refer to David's patches
in [2] as the "trailing-zero" patch. When n was large, the
trailing-zero patch showed significant degradation. This is due to too
many calls of repalloc(). With this patch, we cannot reuse spaces
after the last non-zero bitmapword, so we need to call repalloc() more
frequently than before. When n is 256, repalloc() was called only 670
times without the patch, while it was called 5694 times with the
patch.

Table 1: Planning time (ms)
-----------------------------------------------------------------
   n |   Master | Patched (trailing-zero) | Patched (bitwise-OR)
-----------------------------------------------------------------
   1 |   37.639 |                  37.330 |               36.979
   2 |   36.066 |                  35.646 |               36.044
   4 |   37.958 |                  37.349 |               37.842
   8 |   42.397 |                  42.994 |               39.779
  16 |   54.565 |                  67.713 |               44.186
  32 |   89.220 |                 100.828 |               65.542
  64 |  227.854 |                 269.059 |              150.398
 128 |  896.513 |                1279.965 |              577.671
 256 | 4241.994 |                8220.508 |             2538.681
-----------------------------------------------------------------

Table 2: Planning time speedup (higher is better)
------------------------------------------------------
   n | Patched (trailing-zero) | Patched (bitwise-OR)
------------------------------------------------------
   1 |                  100.8% |               101.8%
   2 |                  101.2% |               100.1%
   4 |                  101.6% |               100.3%
   8 |                   98.6% |               106.6%
  16 |                   80.6% |               123.5%
  32 |                   88.5% |               136.1%
  64 |                   84.7% |               151.5%
 128 |                   70.0% |               155.2%
 256 |                   51.6% |               167.1%
------------------------------------------------------

On Fri, Mar 3, 2023 at 10:52 AM David Rowley <dgrowleyml@gmail.com> wrote:
> The patch also optimizes sub-optimal newly added code which calls
> bms_is_empty_internal() when we have other more optimal means to
> determine if the set is empty or not.

However, I agree with David's opinion regarding the
bms_is_empty_internal() calls, which is quoted above. I have
implemented this optimization in a slightly different way than David.
My patch is attached to this email. The difference between my patch
and David's is in the determination method of whether the result is
empty: David's patch records the last index of non-zero bitmapword to
minimize the Bitmapset. If the index is -1, we can conclude that the
result is empty. In contrast, my patch uses a more lightweight
operation. I show my changes as follows.

=====
@@ -263,6 +261,7 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
     const Bitmapset *other;
     int            resultlen;
     int            i;
+    bitmapword    bitwise_or = 0;

     /* Handle cases where either input is NULL */
     if (a == NULL || b == NULL)
@@ -281,9 +280,17 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
     /* And intersect the longer input with the result */
     resultlen = result->nwords;
     for (i = 0; i < resultlen; i++)
-        result->words[i] &= other->words[i];
+    {
+        bitmapword    w = (result->words[i] &= other->words[i]);
+
+        /*
+         * Compute bitwise OR of all bitmapwords to determine if the result
+         * is empty
+         */
+        bitwise_or |= w;
+    }
     /* If we computed an empty result, we must return NULL */
-    if (bms_is_empty_internal(result))
+    if (bitwise_or == 0)
     {
         pfree(result);
         return NULL;
@@ -711,30 +718,6 @@ bms_membership(const Bitmapset *a)
     return result;
 }
=====

My idea is to compute the bitwise OR of all bitmapwords of the result
Bitmapset. The bitwise OR can be represented as a single operation in
the machine code and does not require any conditional branches. If the
bitwise ORed value is zero, we can conclude the result Bitmapset is
empty. The costs related to this operation can be almost negligible;
it is significantly cheaper than calling bms_is_empty_internal() and
less expensive than using a conditional branch such as 'if.'

In the tables above, I called my patch the "bitwise-OR" patch. The
patch is much faster than the master when n is large. Its speed up
reached 167.1%. I think just adopting this optimization is worth
considering.

[1] https://www.postgresql.org/message-id/CAJ2pMkY10J_PA2jpH5M-VoOo6BvJnTOO_-t_znu_pOaP0q10pA@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAApHDvq9eq0W_aFUGrb6ba28ieuQN4zM5Uwqxy7+LMZjJc+VGg@mail.gmail.com

-- 
Best regards,
Yuya Watari

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: recovery modules
Next
From: Peter Smith
Date:
Subject: Re: Add macros for ReorderBufferTXN toptxn