Thread: Re: Shave a few cycles off our ilog10 implementation
On 30/10/2024 21:27, David Fetter wrote: > Please find attached a patch to $Subject > > I've done some preliminary testing, and it appears to shave somewhere > between 25-50% off the operations themselves, and these cascade into > things like formatting result sets and COPY OUT. Impressive! What did you use to performance test it, to get those results? -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Oct 30, 2024 at 09:54:20PM +0200, Heikki Linnakangas wrote: > On 30/10/2024 21:27, David Fetter wrote: > > Please find attached a patch to $Subject > > > > I've done some preliminary testing, and it appears to shave somewhere > > between 25-50% off the operations themselves, and these cascade into > > things like formatting result sets and COPY OUT. > > Impressive! What did you use to performance test it, to get those results? In case that wasn't clear, what I've tested so far was the ilog10 implementations, not the general effects on the things they underlie. This testing was basically just sending a bunch of appropriately sized pseudo-random uints in a previously created array sent through a tight loop that called the ilog10s and getting average execution times. Any suggestions for more thorough testing would be welcome. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778
On Thu, 31 Oct 2024 at 09:02, David Fetter <david@fetter.org> wrote: > This testing was basically just sending a bunch of appropriately sized > pseudo-random uints in a previously created array sent through a tight > loop that called the ilog10s and getting average execution times. > > Any suggestions for more thorough testing would be welcome. Maybe something similar to what I did in [1]. David [1] https://postgr.es/m/CAApHDvopR=yPgNr4AbbN4HMOztuyVa+iFYRTvu49pxg9YO_tKw@mail.gmail.com
On Thu, Oct 31, 2024 at 3:14 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 31 Oct 2024 at 09:02, David Fetter <david@fetter.org> wrote: > > This testing was basically just sending a bunch of appropriately sized > > pseudo-random uints in a previously created array sent through a tight > > loop that called the ilog10s and getting average execution times. > > > > Any suggestions for more thorough testing would be welcome. > > Maybe something similar to what I did in [1]. > > David > > [1] https://postgr.es/m/CAApHDvopR=yPgNr4AbbN4HMOztuyVa+iFYRTvu49pxg9YO_tKw@mail.gmail.com I tried this test as-is and saw no difference. I bumped it up to 10 columns and got (no turbo, about 30 transactions): master: 6831.308 ms patch: 6580.506 ms The difference is small enough that normally I'd say it's likely unrelated to the patch, but on the other hand it's consistent with saving (3 * 10 * 10 million) cycles because of 1 less multiplication each, which is not nothing, but for shoving bytes into /dev/null it's not exciting either. The lookup for the 64-bit case has grown to 1024 bytes, which will compete for cache space. I don't have a strong reason to be either for or against this patch. Anyone else want to test? create table bi (a bigint, b bigint, c bigint, d bigint, e bigint, f bigint, g bigint, h bigint, i bigint, j bigint); insert into bi select i,i,i,i,i,i,i,i,i,i from generate_Series(1,10_000_000) i; vacuum freeze analyze bi; pgbench -n -T 180 -f bench.sql -- John Naylor Amazon Web Services