Hi,
Thanks for the review!
On Thu, 12 Feb 2026 at 01:39, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote:
> > I am sharing a v6 which implements (1). My benchmark results show
> > almost no difference for the special-character cases and a nice
> > improvement for the no-special-character cases.
>
> Thanks!
>
> > + /* Initialize SIMD variables */
> > + cstate->simd_enabled = false;
> > + cstate->simd_initialized = false;
>
> > + /* Initialize SIMD on the first read */
> > + if (unlikely(!cstate->simd_initialized))
> > + {
> > + cstate->simd_initialized = true;
> > + cstate->simd_enabled = true;
> > + }
>
> Why do we do this initialization in CopyReadLine() as opposed to setting
> simd_enabled to true when initializing cstate in BeginCopyFrom()? If we
> can initialize it in BeginCopyFrom, we could probably remove
> simd_initialized.
Correct, I guess this is left over from the earlier versions.
> > + if (cstate->simd_enabled)
> > + result = CopyReadLineText(cstate, is_csv, true);
> > + else
> > + result = CopyReadLineText(cstate, is_csv, false);
>
> I know we discussed this upthread, but I'd like to take a closer look at
> this to see whether/why it makes such a big difference. It's a bit awkward
> that CopyReadLineText() needs to manage both its local simd_enabled and
> cstate->simd_enabled.
I extensively benchmarked this with the new v6 version. If I change
this to either of:
CopyReadLineText(cstate, is_csv);
or
CopyReadLineText(cstate, is_csv, cstate->simd_enabled);
then there is %5-%10 regression for the scalar path. I ran my
benchmarks with both "meson --buildtype=debugoptimized" and "meson
--buildtype=release" but the result is the same.
Also, if I change this code to:
if (cstate->simd_enabled)
{
if (is_csv)
result = CopyReadLineText(cstate, true, true);
else
result = CopyReadLineText(cstate, false, true);
}
else
{
if (is_csv)
result = CopyReadLineText(cstate, true, false);
else
result = CopyReadLineText(cstate, false, false);
}
then I see ~%5 performance improvement in scalar path compared to master.
> + /* Load a chunk of data into a vector register */
> + vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]);
>
> As mentioned upthread [0], I think it's worth testing whether processing
> multiple vectors worth of data in each loop iteration is worthwhile.
>
> [0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan
There are multiple keys in CopyReadLineText() compared to
pg_lfind32(). I am not sure if I correctly used multiple vectors but I
attached what I did as 0002, could you please look at it? I didn't see
any performance benefit in my benchmarks, though.
--
Regards,
Nazir Bilal Yavuz
Microsoft