On Sun, Jan 22, 2023, at 11:06, Dean Rasheed wrote:
> Seems like a reasonable idea, with some pretty decent gains.
>
> Note, however, that for a divisor having fewer than 5 or 6 digits,
> it's now significantly slower because it's forced to go through
> div_var_int64() instead of div_var_int() for all small divisors. So
> the var2ndigits <= 2 case needs to come first.
Can you give a measurable example of when the patch
the way it's written is significantly slower for a divisor having
fewer than 5 or 6 digits, on some platform?
I can't detect any difference at all at my MacBook Pro M1 Max for this example:
EXPLAIN ANALYZE SELECT count(numeric_div_volatile(1,3333)) FROM generate_series(1,1e8);
I did write the code like you suggest first, but changed it,
since I realised the extra "else if" needed could be eliminated,
and thought div_var_int64() wouldn't be slower than div_var_int() since
I thought 64-bit instructions in general are as fast as 32-bit instructions,
on 64-bit platforms.
I'm not suggesting your claim is incorrect, I'm just trying to understand
and verify it experimentally.
> The implementation of div_var_int64() should be in an #ifdef HAVE_INT128 block.
>
> In div_var_int64(), s/ULONG_MAX/PG_UINT64_MAX/
OK, thanks, I'll fix, but I'll await your feedback first on the above.
/Joel