Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers
| From | KAZAR Ayoub |
|---|---|
| Subject | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date | |
| Msg-id | CA+K2Run1VdLnmp-5_Qv2Fax0KgT7LLJMH-uzjaaf-NZD1oU-=w@mail.gmail.com Whole thread |
| In response to | Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
| Responses |
Re: Speed up COPY FROM text/CSV parsing using SIMD
|
| List | pgsql-hackers |
Hello,
On Fri, Feb 6, 2026 at 2:51 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Hi,
On Sat, 31 Jan 2026 at 19:21, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>
> On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <neil.conway@gmail.com> wrote:
>>
>> * I'm curious if we'll see better performance on large inputs if we flush to `line_buf` periodically (e.g., at least every few thousand bytes or so). Otherwise we might see poor data cache behavior if large inputs with no control characters get evicted before we've copied them over. See the approach taken in escape_json_with_len() in utils/adt/json.c
>>
> So i gave this a try, attached is the small patch that has v3 + the suggestion added, here are the results with different threshold for line_buf refill:
>
> Execution time compared to master:
> Workloadv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none-16.5%-17.4%-14.3%-12.6%-13.6%-10.5%-16.3%
> text/esc+5.6%+11.1%+3.1%+7.6%+3.0%+4.9%+4.2%
> csv/none-31.0%-29.9%-26.7%-30.1%-27.9%-30.2%-29.6%
> csv/quote+0.2%-0.6%-0.4%-1.0%+0.1%+2.5%-1.0%
>
> L1d cache miss rates:
> WorkloadMasterv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none0.20%0.23%0.21%0.22%0.21%0.21%0.21%0.22%
> text/esc0.21%0.22%0.22%0.22%0.22%0.21%0.22%0.22%
> csv/none0.17%0.22%0.21%0.22%0.21%0.21%0.22%0.22%
> csv/quote0.18%0.22%0.19%0.20%0.20%0.19%0.20%0.20%
>
> On my laptop I have 32KB L1 cache per core.
> Results are super close, it is hard to see in the cache misses numbers but execution times are saying other things, doing the periodic filling of line_buf seems good to do.
> If Manni can rerun the benchmarks on these too, it would be nice to confirm this.
I looked at this change and had a couple of points.
We already have REFILL_LINEBUF at the start of the for loop in the
CopyReadLineText() function (let’s call this refill #1). This refills
when the input_buf_ptr >= copy_buf_len check is true. On my end,
copy_buf_len stays at 8191 until the end of the input, and then it
becomes the remaining amount. So when I set LINE_BUF_FLUSH_AFTER to
8192, the REFILL_LINEBUF you added shouldn’t be called; instead,
refill #1 should be triggered.
I verified this manually by adding some logging, and the results seem
to confirm this behavior. Based on that, there shouldn’t be a
performance difference when LINE_BUF_FLUSH_AFTER >= 8k.
Could you please take a look and confirm whether you see the same behavior?
So just to make sure i understand this correctly, line_buf holds processed bytes of ONE line, so for the periodic flush that i did in:
input_buf_ptr - cstate->input_buf_index >= LINE_BUF_FLUSH_AFTER
if a line in the file is smaller than LINE_BUF_FLUSH_AFTER, the #2 REFILL_LINEBUF is never reached in a CopyReadLine entrance, as line_buf is cleared after a line:
CopyReadLine(CopyFromState cstate, bool is_csv)
{
bool result;
resetStringInfo(&cstate->line_buf);
....
}
CopyReadLine(CopyFromState cstate, bool is_csv)
{
bool result;
resetStringInfo(&cstate->line_buf);
....
}
So my previous benchmarks (ones that have LINE_BUF_FLUSH_AFTER > 4096) are wrong since I was working with lines of 4096 bytes.
If: Line length < LINE_BUF_FLUSH_AFTER < INPUT_BUF_SIZE : This neither hits #1 REFILL_LINEBUF nor #2 PERIODIC REFILL_LINEBUF, this only reaches the #3 REFILL_LINEBUF before CopyReadLineText returns.
So flushing here is just mitigating against l1d cache misses for long lines (lines that occupy for example > 70% input_buf size, maybe ?)
Does this make sense for your case too with 8k ?
So i propose removing REFILL_LINEBUF in #1, as it doesn't make sense anymore since PERIODIC REFILL_LINEBUF already does the job for smaller sizes than INPUT_BUF_SIZE (in accordance to most l1d cache sizes).
So basically it becomes
If (line length < LINE_BUF_FLUSH_AFTER) then flush at very end
if (line length > LINE_BUF_FLUSH_AFTER) flush (line length / LINE_BUF_FLUSH_AFTER) + leftover times
Also, I noticed that json.c uses ESCAPE_JSON_FLUSH_AFTER set to 512,
so it might be worth trying smaller values here as well.
If I'm correct about the usage of LINE_BUF_FLUSH_AFTER above, I think smaller values would imply too many memory loads that are unnecessary, as for 512 we aren't battling against l1d cache misses anymore, though I'll try it in the next benchmark.
If this sounds right, I'll be re-benchmarking for multiple row sizes with different LINE_BUF_FLUSH_AFTER sizes.
Regards,
Ayoub
pgsql-hackers by date: