Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From KAZAR Ayoub
Subject Re: Speed up COPY FROM text/CSV parsing using SIMD
Date
Msg-id CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com
Whole thread Raw
In response to Re: Speed up COPY FROM text/CSV parsing using SIMD  (Manni Wood <manni.wood@enterprisedb.com>)
Responses Re: Speed up COPY FROM text/CSV parsing using SIMD
List pgsql-hackers
On Tue, Nov 11, 2025 at 11:23 PM Manni Wood <manni.wood@enterprisedb.com> wrote:
Hello!

I wanted reproduce the results using files attached by Shinya Kato and Ayoub Kazar. I installed a postgres compiled from master, and then I installed a postgres built from master plus Nazir Bilal Yavuz's v3 patches applied.

The master+v3patches postgres naturally performed better on copying into the database: anywhere from 11% better for the t.csv file produced by Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file created by Ayoub Kazar's simd-copy-from-bench.sql.

But here's where it gets weird. The two files created by Ayoub Kazar's simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.txt, and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5% respectively.

This seems impossible.

A few things I should note:

I timed the commands using the Unix time command, like so:

time psql -X -U mwood -h localhost -d postgres -c '\copy t from /tmp/t_4096_escape.txt'

For each file, I timed the copy 6 times and took the average.

This was done on my work Linux machine while also running Chrome and an Open Office spreadsheet; not a dedicated machine only running postgres.
Hello,
I think if you do a perf benchmark (if it still reproduces) it would probably be possible to explain why it's performing like that looking at the CPI and other metrics and compare it to my findings.
What i also suggest is to make the data close even closer to the worst case i.e: more special characters where it hurts the switching between SIMD and scalar processing (in simd-copy-from-bench.sql file), if still does a good job then there's something to look at.
 

All of the copy results took between 4.5 seconds (Shinyo's t.csv copied into postgres compiled from master) to 2 seconds (Ayoub Kazar's t_4096_none.csv copied into postgres compiled from master plus Nazir's v3 patches).

Perhaps I need to fiddle with the provided SQL to produce larger files to get longer run times? Maybe sub-second differences won't tell as interesting a story as minutes-long copy commands?
I did try it on some GBs (around 2-5GB only), the differences were not that much, but if you can run this on more GBs (at least 10GB) it would be good to look at, although i don't suspect anything interesting since the shape of data is the same for the totality of the COPY.

Thanks for reading this.
--
-- Manni Wood EDB: https://www.enterprisedb.com
Thanks for the info.


Regards,
Ayoub Kazar. 

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions
Next
From: Daniel Gustafsson
Date:
Subject: Re: libpq OpenSSL and multithreading