Re: Use compiler intrinsics for bit ops in hash - Mailing list pgsql-hackers

From Jesse Zhang
Subject Re: Use compiler intrinsics for bit ops in hash
Date
Msg-id CAGf+fX79VCypUUuJTbf8o1no3Sqay1tHKVsnYAEN8wvDp2_eqA@mail.gmail.com
Whole thread Raw
In response to Use compiler intrinsics for bit ops in hash  (David Fetter <david@fetter.org>)
Responses Re: Use compiler intrinsics for bit ops in hash  (David Fetter <david@fetter.org>)
List pgsql-hackers
Hi David,

On Tue, Jan 14, 2020 at 9:36 AM David Fetter <david@fetter.org> wrote:
>
> Folks,
>
> The recent patch for distinct windowing aggregates contained a partial
> fix of the FIXME that didn't seem entirely right, so I extracted that
> part, changed it to use compiler intrinsics, and submit it here.

The changes in hash AM and SIMPLEHASH do look like a net positive
improvement. My biggest cringe might be in pg_bitutils:

> diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
> index 498e532308..cc9338da25 100644
> --- a/src/include/port/pg_bitutils.h
> +++ b/src/include/port/pg_bitutils.h
> @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n)
>  return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
>  }
>
> +/* ceil(lg2(num)) */
> +static inline uint32
> +ceil_log2_32(uint32 num)
> +{
> + return pg_leftmost_one_pos32(num-1) + 1;
> +}
> +
> +static inline uint64
> +ceil_log2_64(uint64 num)
> +{
> + return pg_leftmost_one_pos64(num-1) + 1;
> +}
> +
> +/* calculate first power of 2 >= num
> + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
> + * using BSR where available */
> +static inline uint32
> +next_power_of_2_32(uint32 num)
> +{
> + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1);
> +}
> +
> +static inline uint64
> +next_power_of_2_64(uint64 num)
> +{
> + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1);
> +}
> +
>  #endif /* PG_BITUTILS_H */
>

1. Is ceil_log2_64 dead code?

2. The new utilities added here (ceil_log2_32 and company,
next_power_of_2_32 and company) all require num > 1, but don't clearly
Assert (or at the very least document) so.

3. A couple of the callers can actively pass in an argument of 1, e.g.
from _hash_spareindex in hashutil.c, while some other callers are iffy
at best (simplehash.h maybe?)

4. It seems like you *really* would like an operation like LZCNT in x86
(first appearing in Haswell) that is well defined on zero input. ISTM
the alternatives are:

   a) Special case 1. That seems straightforward, but the branching cost
   on a seemingly unlikely condition seems to be a lot of performance
   loss

   b) Use architecture specific intrinsic (and possibly with CPUID
   shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
   intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
   instructions that are well defined on zero in most ISA's other than
   x86, so maybe we can get away with special-casing x86?

Cheers,
Jesse



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pause recovery if pitr target not reached
Next
From: Tom Lane
Date:
Subject: Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)