On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar@esi.dz> wrote: >> Thanks for running that benchmark! Would you mind sharing a reproducer >> for the regression you observed? > > Of course, I attached the sql to generate the text and csv test files. > If having a 1/3 of line length of special characters can be an exaggeration, something lower might still reproduce some regressions of course for the same idea.
Thank you so much!
I am able to reproduce the regression you mentioned but both regressions are %20 on my end. I found that (by experimenting) SIMD causes a regression if it advances less than 5 characters.
So, I implemented a small heuristic. It works like that:
- If advance < 5 -> insert a sleep penalty (n cycles). - Each time advance < 5, n is doubled. - Each time advance ≥ 5, n is halved.
I am sharing a POC patch to show heuristic, it can be applied on top of v1-0001. Heuristic version has the same performance improvements with the v1-0001 but the regression is %5 instead of %20 compared to the master.
-- Regards, Nazir Bilal Yavuz Microsoft
Yes this is good, i'm also getting about 5% regression only now.