Re: Speed up COPY TO text/CSV parsing using SIMD - Mailing list pgsql-hackers

From KAZAR Ayoub
Subject Re: Speed up COPY TO text/CSV parsing using SIMD
Date
Msg-id CA+K2Ru=JK5NUEaxA77pCEer40QnV1TMxeg68Et9RL0zMZw_Jyw@mail.gmail.com
Whole thread
In response to Re: Speed up COPY TO text/CSV parsing using SIMD  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Tue, Mar 31, 2026 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> I added a prescan loop inside the simd helpers trying to catch special
> chars in sizeof(Vector8) characters, i measured how good is this at
> reducing the overhead of starting simd and exiting at first vector:
> the scalar loop is better than SIMD for one vector if it finds a special
> character before 6th character, worst case is not a clean vector, where the
> scalar loop needs 20 more cycles compared to SIMD.
> This helps mitigate the case of JSON(B) in CSV format, this is why I only
> added this for CSV case only.

Interesting.

> In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> 3% regression is gone.

While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.
Thanks for the review.
Fair enough, I'll try many more cases in the upcoming weeks to make sure we're not missing anything.

--
nathan
Regards,
Ayoub

pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: Bypassing cursors in postgres_fdw to enable parallel plans
Next
From: Tomas Vondra
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile