Home > mailing lists

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From	Nathan Bossart
Subject	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date	October 21, 2025 21:40:09
Msg-id	aPfTiX0HwV42R6Od@nathan Whole thread
In response to	Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Responses	Re: Speed up COPY FROM text/CSV parsing using SIMD
List	pgsql-hackers

Tree view

On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote:
> I think the problem is deciding how many lines to process before
> deciding for the rest. 1000 lines could work for the small sized data
> but it might not work for the big sized data. Also, it might cause a
> worse regressions for the small sized data.

IMHO we have some leeway with smaller amounts of data.  If COPY FROM for
1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems
unlikely users would be inconvenienced all that much.  (Those numbers are
completely made up in order to illustrate my point.)

> Because of this reason, I
> tried to implement a heuristic that will work regardless of the size
> of the data. The last heuristic I suggested will run SIMD for
> approximately (#number_of_lines / 1024 [1024 is the max number of
> lines to sleep before running SIMD again]) lines if all characters in
> the data are special characters.

I wonder if we could mitigate the regression further by spacing out the
checks a bit more.  It could be worth comparing a variety of values to
identify what works best with the test data.

-- 
nathan

pgsql-hackers by date:

From: Jeff Davis
Date: 21 October 2025, 21:28:04
Subject: downcase_identifier(): use method table from locale provider

From: Nathan Bossart
Date: 21 October 2025, 21:55:07
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

Previous

Next