Home > mailing lists

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From	KAZAR Ayoub
Subject	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date	August 21, 2025 22:36:42
Msg-id	CA+K2RukEWfNAp821Fy1LYWCoE_fOKMU8efsP2VLb5ZM8OEETWA@mail.gmail.com Whole thread
In response to	Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>> Thanks for running that benchmark! Would you mind sharing a reproducer
>> for the regression you observed?
>
> Of course, I attached the sql to generate the text and csv test files.
> If having a 1/3 of line length of special characters can be an exaggeration, something lower might still reproduce some regressions of course for the same idea.

Thank you so much!

I am able to reproduce the regression you mentioned but both
regressions are %20 on my end. I found that (by experimenting) SIMD
causes a regression if it advances less than 5 characters.

So, I implemented a small heuristic. It works like that:

- If advance < 5 -> insert a sleep penalty (n cycles).
- Each time advance < 5, n is doubled.
- Each time advance ≥ 5, n is halved.

I am sharing a POC patch to show heuristic, it can be applied on top
of v1-0001. Heuristic version has the same performance improvements
with the v1-0001 but the regression is %5 instead of %20 compared to
the master.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Yes this is good, i'm also getting about 5% regression only now.

Regards,

Ayoub Kazar

pgsql-hackers by date:

From: Andres Freund
Date: 21 August 2025, 21:16:28
Subject: Re: Adding REPACK [concurrently]

From: Robert Treat
Date: 22 August 2025, 01:06:13
Subject: Re: Adding REPACK [concurrently]

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

Previous

Next