Thread: Re: Shave a few cycles off our ilog10 implementation

Re: Shave a few cycles off our ilog10 implementation

From

Heikki Linnakangas

Date:

31 October, 01:54:20

On 30/10/2024 21:27, David Fetter wrote:
> Please find attached a patch to $Subject
> 
> I've done some preliminary testing, and it appears to shave somewhere
> between 25-50% off the operations themselves, and these cascade into
> things like formatting result sets and COPY OUT.

Impressive! What did you use to performance test it, to get those results?

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Shave a few cycles off our ilog10 implementation

From

David Fetter

Date:

31 October, 02:02:21

On Wed, Oct 30, 2024 at 09:54:20PM +0200, Heikki Linnakangas wrote:
> On 30/10/2024 21:27, David Fetter wrote:
> > Please find attached a patch to $Subject
> > 
> > I've done some preliminary testing, and it appears to shave somewhere
> > between 25-50% off the operations themselves, and these cascade into
> > things like formatting result sets and COPY OUT.
> 
> Impressive! What did you use to performance test it, to get those results?

In case that wasn't clear, what I've tested so far was the ilog10
implementations, not the general effects on the things they underlie.

This testing was basically just sending a bunch of appropriately sized
pseudo-random uints in a previously created array sent through a tight
loop that called the ilog10s and getting average execution times.

Any suggestions for more thorough testing would be welcome.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Re: Shave a few cycles off our ilog10 implementation

From

David Rowley

Date:

31 October, 02:14:12

On Thu, 31 Oct 2024 at 09:02, David Fetter <david@fetter.org> wrote:
> This testing was basically just sending a bunch of appropriately sized
> pseudo-random uints in a previously created array sent through a tight
> loop that called the ilog10s and getting average execution times.
>
> Any suggestions for more thorough testing would be welcome.

Maybe something similar to what I did in [1].

David

[1] https://postgr.es/m/CAApHDvopR=yPgNr4AbbN4HMOztuyVa+iFYRTvu49pxg9YO_tKw@mail.gmail.com

Re: Shave a few cycles off our ilog10 implementation

From

John Naylor

Date:

18 December, 16:42:07

On Thu, Oct 31, 2024 at 3:14 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 31 Oct 2024 at 09:02, David Fetter <david@fetter.org> wrote:
> > This testing was basically just sending a bunch of appropriately sized
> > pseudo-random uints in a previously created array sent through a tight
> > loop that called the ilog10s and getting average execution times.
> >
> > Any suggestions for more thorough testing would be welcome.
>
> Maybe something similar to what I did in [1].
>
> David
>
> [1] https://postgr.es/m/CAApHDvopR=yPgNr4AbbN4HMOztuyVa+iFYRTvu49pxg9YO_tKw@mail.gmail.com

I tried this test as-is and saw no difference. I bumped it up to 10
columns and got (no turbo, about 30 transactions):

master:
6831.308 ms
patch:
6580.506 ms

The difference is small enough that normally I'd say it's likely
unrelated to the patch, but on the other hand it's consistent with
saving (3 * 10 * 10 million) cycles because of 1 less multiplication
each, which is not nothing, but for shoving bytes into /dev/null it's
not exciting either. The lookup for the 64-bit case has grown to 1024
bytes, which will compete for cache space. I don't have a strong
reason to be either for or against this patch. Anyone else want to
test?

create table bi (a bigint, b bigint, c bigint, d bigint, e bigint, f
bigint, g bigint, h bigint, i bigint, j bigint);
insert into bi select i,i,i,i,i,i,i,i,i,i from generate_Series(1,10_000_000) i;
vacuum freeze analyze bi;

pgbench -n -T 180 -f bench.sql
--
John Naylor
Amazon Web Services