Home > mailing lists

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date	August 21, 2025 18:47:30
Msg-id	8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net Whole thread Raw
In response to	Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Responses	Re: Speed up COPY FROM text/CSV parsing using SIMD
List	pgsql-hackers

Tree view

On 2025-08-19 Tu 10:14 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>> I am able to reproduce the regression you mentioned but both
>> regressions are %20 on my end. I found that (by experimenting) SIMD
>> causes a regression if it advances less than 5 characters.
>>
>> So, I implemented a small heuristic. It works like that:
>>
>> - If advance < 5 -> insert a sleep penalty (n cycles).
> 'sleep' might be a poor word choice here. I meant skipping SIMD for n
> number of times.
>

I was thinking a bit about that this morning. I wonder if it might be 
better instead of having a constantly applied heuristic like this, it 
might be better to do a little extra accounting in the first, say, 1000 
lines of an input file, and if less than some portion of the input is 
found to be special characters then switch to the SIMD code. What that 
portion should be would need to be determined by some experimentation 
with a variety of typical workloads, but given your findings 20% seems 
like a good starting point.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

pgsql-hackers by date:

From: Nathan Bossart
Date: 21 August 2025, 18:37:10
Subject: Re: Don't treat virtual generated columns as missing statistics in vacuumdb --missing-stats-only

From: Ranier Vilela
Date: 21 August 2025, 19:17:56
Subject: Re: Weird error message from Postgres 18

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

Previous

Next