Re: Making empty Bitmapsets always be NULL - Mailing list pgsql-hackers
From | Yuya Watari |
---|---|
Subject | Re: Making empty Bitmapsets always be NULL |
Date | |
Msg-id | CAJ2pMkYmPti=8V46+tmQtGcqGXAeoaiRBLS=ekF8nC6OQO42sQ@mail.gmail.com Whole thread Raw |
In response to | Re: Making empty Bitmapsets always be NULL (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Making empty Bitmapsets always be NULL
|
List | pgsql-hackers |
Hello, On Tue, Mar 7, 2023 at 1:07 PM David Rowley <dgrowleyml@gmail.com> wrote: > I adjusted the patch to remove the invariant checks and fixed up a > couple of things I'd missed. The 0002 patch changes the for loops > into do while loops. I wanted to see if we could see any performance > gains from doing this. I noticed that these patches caused significant degradation while working on improving planning performance in another thread [1]. In the experiment, I used the query attached to this email. This workload consists of eight tables, each of which is split into n partitions. The "query.sql" measures the planning time of a query that joins these tables. You can quickly reproduce my experiment using the following commands. ===== psql -f create-tables.sql psql -f query.sql ===== I show the result in the following tables. I refer to David's patches in [2] as the "trailing-zero" patch. When n was large, the trailing-zero patch showed significant degradation. This is due to too many calls of repalloc(). With this patch, we cannot reuse spaces after the last non-zero bitmapword, so we need to call repalloc() more frequently than before. When n is 256, repalloc() was called only 670 times without the patch, while it was called 5694 times with the patch. Table 1: Planning time (ms) ----------------------------------------------------------------- n | Master | Patched (trailing-zero) | Patched (bitwise-OR) ----------------------------------------------------------------- 1 | 37.639 | 37.330 | 36.979 2 | 36.066 | 35.646 | 36.044 4 | 37.958 | 37.349 | 37.842 8 | 42.397 | 42.994 | 39.779 16 | 54.565 | 67.713 | 44.186 32 | 89.220 | 100.828 | 65.542 64 | 227.854 | 269.059 | 150.398 128 | 896.513 | 1279.965 | 577.671 256 | 4241.994 | 8220.508 | 2538.681 ----------------------------------------------------------------- Table 2: Planning time speedup (higher is better) ------------------------------------------------------ n | Patched (trailing-zero) | Patched (bitwise-OR) ------------------------------------------------------ 1 | 100.8% | 101.8% 2 | 101.2% | 100.1% 4 | 101.6% | 100.3% 8 | 98.6% | 106.6% 16 | 80.6% | 123.5% 32 | 88.5% | 136.1% 64 | 84.7% | 151.5% 128 | 70.0% | 155.2% 256 | 51.6% | 167.1% ------------------------------------------------------ On Fri, Mar 3, 2023 at 10:52 AM David Rowley <dgrowleyml@gmail.com> wrote: > The patch also optimizes sub-optimal newly added code which calls > bms_is_empty_internal() when we have other more optimal means to > determine if the set is empty or not. However, I agree with David's opinion regarding the bms_is_empty_internal() calls, which is quoted above. I have implemented this optimization in a slightly different way than David. My patch is attached to this email. The difference between my patch and David's is in the determination method of whether the result is empty: David's patch records the last index of non-zero bitmapword to minimize the Bitmapset. If the index is -1, we can conclude that the result is empty. In contrast, my patch uses a more lightweight operation. I show my changes as follows. ===== @@ -263,6 +261,7 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b) const Bitmapset *other; int resultlen; int i; + bitmapword bitwise_or = 0; /* Handle cases where either input is NULL */ if (a == NULL || b == NULL) @@ -281,9 +280,17 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b) /* And intersect the longer input with the result */ resultlen = result->nwords; for (i = 0; i < resultlen; i++) - result->words[i] &= other->words[i]; + { + bitmapword w = (result->words[i] &= other->words[i]); + + /* + * Compute bitwise OR of all bitmapwords to determine if the result + * is empty + */ + bitwise_or |= w; + } /* If we computed an empty result, we must return NULL */ - if (bms_is_empty_internal(result)) + if (bitwise_or == 0) { pfree(result); return NULL; @@ -711,30 +718,6 @@ bms_membership(const Bitmapset *a) return result; } ===== My idea is to compute the bitwise OR of all bitmapwords of the result Bitmapset. The bitwise OR can be represented as a single operation in the machine code and does not require any conditional branches. If the bitwise ORed value is zero, we can conclude the result Bitmapset is empty. The costs related to this operation can be almost negligible; it is significantly cheaper than calling bms_is_empty_internal() and less expensive than using a conditional branch such as 'if.' In the tables above, I called my patch the "bitwise-OR" patch. The patch is much faster than the master when n is large. Its speed up reached 167.1%. I think just adopting this optimization is worth considering. [1] https://www.postgresql.org/message-id/CAJ2pMkY10J_PA2jpH5M-VoOo6BvJnTOO_-t_znu_pOaP0q10pA@mail.gmail.com [2] https://www.postgresql.org/message-id/CAApHDvq9eq0W_aFUGrb6ba28ieuQN4zM5Uwqxy7+LMZjJc+VGg@mail.gmail.com -- Best regards, Yuya Watari
Attachment
pgsql-hackers by date: