On Mon, Dec 8, 2025 at 9:52 AM Chao Li <li.evan.chao@gmail.com> wrote:
> First, I changed my direction and implemented the in-place switching in the other way, where I did a way like
chained-switching.Say starting from item0, for example, switching item0 to item5, then check where item5 should be
switchedto, and makes the switch, till an item is switch to position 0. See my implementation in other-implemation.diff
ifyou are interested in it. This time, I eyeball checked the sort result and confirmed the correction. But my
implementationis slightly slower than your implementation, based on the same test procedure I described in my previous
email,my implementation is roughly ~3% slower your implementation. So I think that at least proves your current
implementationin v5 has been perfectly fine tuned.
That shouldn't be surprising, since the way you describe is basically
"American flag sort", which is much older, and the innovation of
ska_byte_sort was to recognize that this is bad for CPU pipelining.
That was explained in detail in the blog post I linked to in my first
email.
Also notice that by attaching a .diff, the CF bot tries and fails to
apply that to master, and has been complaining that my patch needs a
rebase. Please don't do that again.
--
John Naylor
Amazon Web Services