Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From Manni Wood
Subject Re: Speed up COPY FROM text/CSV parsing using SIMD
Date
Msg-id CAKWEB6r0CrN-a2P=2ey3EK7p1MxsbQx2C8=hpNGfxLxnRaX66Q@mail.gmail.com
Whole thread
In response to Re: Speed up COPY FROM text/CSV parsing using SIMD  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Speed up COPY FROM text/CSV parsing using SIMD
List pgsql-hackers
Hello.

I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using the same build I've been using: meson with "debugoptimized", which translates to "-g -O2" gcc flags.

x86 NARROW old master (18bcdb75)
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW v10
TXT :                 26416.331500 ms  -1.957890% regression
CSV :                 25318.727500 ms  10.018142% improvement
TXT with 1/3 escapes: 28608.007500 ms  -2.928061% regression
CSV with 1/3 quotes:  32805.627750 ms  5.026032% improvement

x86 NARROW v11
TXT :                 27212.945750 ms  -5.032545% regression
CSV :                 26985.971250 ms  4.092817% improvement
TXT with 1/3 escapes: 27216.510000 ms  2.078374% improvement
CSV with 1/3 quotes:  32817.267500 ms  4.992334% improvement


x86 WIDE old master (18bcdb75)
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE v10
TXT :                 23067.046750 ms  19.846046% improvement
CSV :                 23259.092250 ms  34.797174% improvement
TXT with 1/3 escapes: 31796.098250 ms  1.989583% improvement
CSV with 1/3 quotes:  42925.792250 ms  8.715948% improvement

x86 WIDE v11
TXT :                 22571.305750 ms  21.568659% improvement
CSV :                 22711.524750 ms  36.332184% improvement
TXT with 1/3 escapes: 29236.453000 ms  9.879604% improvement
CSV with 1/3 quotes:  40022.110750 ms  14.890786% improvement



arm NARROW old master (18bcdb75)
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW v10
TXT :                 10467.816750 ms  4.816988% improvement
CSV :                 9986.288000 ms  7.513381% improvement
TXT with 1/3 escapes: 10323.173750 ms  -0.234262% regression
CSV with 1/3 quotes:  11843.611750 ms  5.699116% improvement

arm NARROW v11
TXT :                 10340.966250 ms  5.970429% improvement
CSV :                 10224.399500 ms  5.308144% improvement
TXT with 1/3 escapes: 10438.216750 ms  -1.351288% regression
CSV with 1/3 quotes:  11865.934000 ms  5.521383% improvement


arm WIDE old master (18bcdb75)
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE v10
TXT :                 9064.959000 ms  23.345727% improvement
CSV :                 9019.553250 ms  35.144134% improvement
TXT with 1/3 escapes: 12344.497250 ms  8.087402% improvement
CSV with 1/3 quotes:  15495.863750 ms  11.744482% improvement

arm WIDE v11
TXT :                 9001.442250 ms  23.882831% improvement
CSV :                 8940.928750 ms  35.709490% improvement
TXT with 1/3 escapes: 12049.668500 ms  10.282589% improvement
CSV with 1/3 quotes:  15277.843250 ms  12.986201% improvement

Best,

-Manni

On Thu, Mar 5, 2026 at 3:25 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
>>> If anyone has any suggestions/ideas, please let me know!
> I am able to fix the problem. My first assumption was that the
> branching of SIMD code caused that problem, so I moved SIMD code to
> the CopyReadLineTextSIMDHelper() function. Then I moved this
> CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
> that we won't have any branching in the non-SIMD (scalar) code path.
> This didn't solve the problem and then I realized that even though I
> disable SIMD code path with 'if (false)', there is still regression
> but if I comment all of the 'if (cstate->simd_enabled)' branch, then
> there is no regression at all.
>
> To find out more, I compared assembly outputs of both and found out
> the possible reason. What I understood is that the compiler can't
> promote a variable to register, instead these variables live in the
> stack; which is slower. Please see the two different assembly outputs:
>
> Slow code:
>
>          c = copy_input_buf[input_buf_ptr++];
>       db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
>       db4:    48 63 c6                 movslq %esi,%rax
>       db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
>       dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
>       dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx
>
> Fast code:
>
>          c = copy_input_buf[input_buf_ptr++];
>       d80:    49 63 c4                 movslq %r12d,%rax
>       d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
>       d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax
>
> And the reason for that is sending the address of input_buf_ptr to a
> CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
> this:
>
> int            temp_input_buf_ptr = input_buf_ptr;
> CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
>
> Then there is no regression. However, I am still not completely sure
> if that is the same problem in the v10, I am planning to spend more
> time debugging this.
>
>> A couple of random ideas:
>>
>> * Additional inlining for callers.  I looked around a little bit and didn't
>> see any great candidates, so I don't have much faith in this, but maybe
>> you'll see something I don't.
> I agree with you. CopyReadLineText() is already quite a big function.
>
>> * Disable SIMD if we are consistently getting small rows.  That won't help
>> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
>> the regression for narrow rows described elsewhere.
> I implemented this, two consecutive small rows disables SIMD.
>
>> * Surround the variable initializations with "if (simd_enabled)".
>> Presumably compilers are smart enough to remove those in the non-SIMD paths
>> already, but it could be worth a try.
> Done.
>
>> * Add simd_enabled function parameter to CopyReadLine(),
>> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
>> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
>> compiler to do some additional optimizations to reduce branching.
> I think we don't need this. At least the implementation with
> CopyReadLineTextSIMDHelper() doesn't need this since branching will be
> at the top and it will be once per line.
>
> I think v11 looks better compared to v10. I liked the
> CopyReadLineTextSIMDHelper() helper function. I also liked it being at
> the top of CopyReadLineText(), not being in the scalar path. This
> gives us more optimization options without affecting the scalar path.
>
> Here are the new benchmark results, I benchmarked the changes with
> both -O2 and -O3 and also both with and without 'changing
> default_toast_compression to lz4' commit (65def42b1d5). Benchmark
> results show that there is no regression and the performance
> improvement is much bigger with 65def42b1d5, it is close to 2x for
> text format and more than 2x for the csv format.


I spent some time exploring different ideas for improving this, but
found none that didn't cause regression in some cases, so good to go
from my POV.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com



--
-- Manni Wood EDB: https://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Allow specifying NULL default in pg_proc.dat for "any" arguments
Next
From: Maxim Orlov
Date:
Subject: Re: Rework SLRU I/O errors handle