Home > mailing lists

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date	November 21 17:48:53
Msg-id	fbf4c7f6-974c-4984-bf8a-3eff3a91684a@dunslane.net Whole thread Raw
In response to	Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>)
List	pgsql-hackers

Tree view

On 2025-11-20 Th 7:55 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> Thank you for looking into this!
>
> On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
>> IMHO we should be looking for ways to simplify this should-we-use-SIMD
>> code.  For example, perhaps we could just disable the SIMD path for 10K or
>> 100K lines any time a special character is found.  I'm dubious that a lot
>> of complexity is warranted.
> I think this is a bit too harsh since SIMD is still worth it if SIMD
> can advance more than ~5 character average. I am trying to use SIMD as
> much as possible when it is worth it but what you said can remove the
> regression completely, perhaps that is the correct way.
>

Perhaps a very small regression (say under 1%) in the worst case would 
be OK. But the closer you can get that to zero the more acceptable this 
will be. Very large loads of sparse data, which will often have lots of 
special characters AIUI, are very common, so we should not dismiss the 
worst case as an outlier. I still like the idea of testing, say, a 
thousand lines every million, or something like that.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

pgsql-hackers by date:

From: Dmitry Dolgov
Date: 21 November, 17:21:45
Subject: Re: System views for versions reporting

From: Matthias van de Meent
Date: 21 November, 18:25:06
Subject: Re: Expanding HOT updates for expression and partial indexes

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

Previous

Next