Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers
| From | Manni Wood |
|---|---|
| Subject | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date | |
| Msg-id | CAKWEB6pq7C0Wv1wT9Y1_c_1fn-+cR8pb210Pj3w2FcEOmNGxbQ@mail.gmail.com Whole thread |
| In response to | Re: Speed up COPY FROM text/CSV parsing using SIMD (KAZAR Ayoub <ma_kazar@esi.dz>) |
| Responses |
Re: Speed up COPY FROM text/CSV parsing using SIMD
|
| List | pgsql-hackers |
On Thu, Feb 19, 2026 at 4:37 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
sudo cpupower frequency-set --governor=performanceThis is on an Intel I7-1255U CPU with:Hello,I ran some long benchmarks on this, and I got stable results across multiple runs (few milliseconds difference)
sudo cpupower idle-set -D 0
echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turboWIDE (500k rows)
TXT | none
Master avg: 22,183 ms
New avg: 20,435 ms
Improvement: -7.88%CSV | none
Master avg: 26,737 ms
New avg: 24,625 ms
Improvement: -7.90%TXT | escape
Master avg: 26,720 ms
New avg: 23,658 ms
Improvement: -11.46%CSV | quote
Master avg: 35,961 ms
New avg: 33,317 ms
Improvement: -7.35%--------------------------------------
NARROW (1.5M rows)TXT | none
Master avg: 2,220 ms
New avg: 2,125 ms
Improvement: -4.28%CSV | none
Master avg: 2,330 ms
New avg: 2,145 ms
Improvement: -7.92%TXT | escape
Master avg: 2,425 ms
New avg: 2,187 ms
Improvement: -9.79%CSV | quote
Master avg: 2,272 ms
New avg: 2,253 ms
Improvement: -0.85%No regressions as expected, overall this looks good.
Regards,
Ayoub
On Thu, Feb 19, 2026 at 10:01 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:Hi,
On Thu, 19 Feb 2026 at 07:02, Manni Wood <manni.wood@enterprisedb.com> wrote:
>
> I took some time tonight to apply v8 to the latest master (759b03b2) on my x86 tower and arm raspberry pi 5.
>
> Here are the results, using both narrow columns and the wider columns we've been using througout:
>
> x86 master NARROW
> TXT : 2587.642000 ms
> CSV : 2621.759000 ms
> TXT with 1/3 escapes: 2707.933500 ms
> CSV with 1/3 quotes: 3254.896500 ms
>
> x86 v8 NARROW
> TXT : 2488.655250 ms 3.825365% improvement
> CSV : 2628.818000 ms -0.269247% regression
> TXT with 1/3 escapes: 2615.522000 ms 3.412621% improvement
> CSV with 1/3 quotes: 3446.368000 ms -5.882568% regression
>
> x86 master WIDE
> TXT : 30583.229500 ms
> CSV : 35054.533500 ms
> TXT with 1/3 escapes: 32767.421500 ms
> CSV with 1/3 quotes: 44214.163500 ms
>
> x86 v8 WIDE
> TXT : 26527.494250 ms 13.261305% improvement
> CSV : 33364.443750 ms 4.821316% improvement
> TXT with 1/3 escapes: 29320.648000 ms 10.518904% improvement
> CSV with 1/3 quotes: 42334.074750 ms 4.252232% improvement
>
>
>
> arm master NARROW
> TXT : 1999.401000 ms
> CSV : 2081.610750 ms
> TXT with 1/3 escapes: 2053.230250 ms
> CSV with 1/3 quotes: 2431.608750 ms
>
> arm v8 NARROW
> TXT : 1981.663750 ms 0.887128% improvement
> CSV : 2023.892500 ms 2.772769% improvement
> TXT with 1/3 escapes: 2004.215250 ms 2.387214% improvement
> CSV with 1/3 quotes: 2616.872750 ms -7.618989% regression
>
> arm master WIDE
> TXT : 9120.731750 ms
> CSV : 11114.478250 ms
> TXT with 1/3 escapes: 10338.124500 ms
> CSV with 1/3 quotes: 13404.430250 ms
>
> arm v8 WIDE
> TXT : 8430.090750 ms 7.572210% improvement
> CSV : 10115.135500 ms 8.991360% improvement
> TXT with 1/3 escapes: 9624.383500 ms 6.903970% improvement
> CSV with 1/3 quotes: 12331.714000 ms 8.002699% improvement
Thank you for the results, they are interesting. I didn't expect to
see any regression for this benchmark. Also, I would expect the
non-special character cases and the 1/3 special character cases to
perform similarly, since we are not using SIMD for this benchmark.
I noticed that the timings in your narrow benchmark (both x86 and ARM)
are quite short. Would it be possible to extend the test so that the
total runtime is closer to ~10,000 ms? That might give us more stable
results.
Here is my benchmark with using your script:
WIDE: Total 500000 lines and each line is 4096 bytes.
NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and `A\\A`).
+---------+---------------+---------------+---------------+----------------+
| WIDE | TXT None | TXT 1/3 | CSV None | CSV 1/3 |
+---------+---------------+---------------+---------------+----------------+
| master | 10512 | 11133 | 12241 | 14321 |
+---------+---------------+---------------+---------------+----------------+
| patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008 (-%2.18) |
+---------+---------------+---------------+---------------+----------------+
| | | | | |
+---------+---------------+---------------+---------------+----------------+
| NARROW | | | | |
+---------+---------------+---------------+---------------+----------------+
| master | 9702 | 9745 | 9784 | 10149 |
+---------+---------------+---------------+---------------+----------------+
| patched | 9344 (-%3.6) | 9477 (-%2.7) | 9439 (-%3.5) | 9751 (-%3.9) |
+---------+---------------+---------------+---------------+----------------+
The results look promising to me.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Hello!
Thanks for running benchmarks, Ayoub.
Nazir, I ran my benchmarks with more rows this time --- as many rows as would fit on my test computers without exhausting their RAM disks. That seems to have brought things more into line with what Ayoub saw. I did get some small regressions, but I suspect those are not a big deal. (For instance, on both machines I also noticed the occasional "truncate table" would take longer than the others, despite my scripts' best efforts to steady a CPU core and pin postmaster and children to that core.)
x86 WIDE master 500,000 rows
TXT : 30602.244000 ms
CSV : 35062.451250 ms
TXT with 1/3 escapes: 32704.250250 ms
CSV with 1/3 quotes: 44128.072500 ms
x86 WIDE v8 500,000 rows
TXT : 26611.953250 ms 13.039210% improvement
CSV : 33366.184000 ms 4.837846% improvement
TXT with 1/3 escapes: 29251.310000 ms 10.558078% improvement
CSV with 1/3 quotes: 42368.421000 ms 3.987601% improvement
x86 NARROW master 50mil rows
TXT : 25898.004000 ms
CSV : 27212.684500 ms
TXT with 1/3 escapes: 29189.518250 ms
CSV with 1/3 quotes: 33222.510250 ms
x86 NARROW v8 50mil rows
TXT : 26368.765000 ms -1.817750% regression
CSV : 26711.122250 ms 1.843119% improvement
TXT with 1/3 escapes: 28081.150750 ms 3.797142% improvement
CSV with 1/3 quotes: 32851.963500 ms 1.115348% improvement
arm WIDE master 250,000 rows
TXT : 11392.462750 ms
CSV : 13887.576500 ms
TXT with 1/3 escapes: 12908.560750 ms
CSV with 1/3 quotes: 16699.337000 ms
arm WIDE v8 250,000 rows
TXT : 10524.567750 ms 7.618151% improvement
CSV : 12621.211250 ms 9.118691% improvement
TXT with 1/3 escapes: 12017.030250 ms 6.906506% improvement
CSV with 1/3 quotes: 15428.020500 ms 7.612976% improvement
arm NARROW master 25mil rows
TXT : 10030.274000 ms
CSV : 10245.238750 ms
TXT with 1/3 escapes: 10345.224500 ms
CSV with 1/3 quotes: 12186.313250 ms
arm NARROW v8 25mil rows
TXT : 10197.386500 ms -1.666081% regression
CSV : 10257.918750 ms -0.123765% regression
TXT with 1/3 escapes: 10084.978500 ms 2.515615% improvement
CSV with 1/3 quotes: 12064.215000 ms 1.001929% improvement
TXT : 30602.244000 ms
CSV : 35062.451250 ms
TXT with 1/3 escapes: 32704.250250 ms
CSV with 1/3 quotes: 44128.072500 ms
x86 WIDE v8 500,000 rows
TXT : 26611.953250 ms 13.039210% improvement
CSV : 33366.184000 ms 4.837846% improvement
TXT with 1/3 escapes: 29251.310000 ms 10.558078% improvement
CSV with 1/3 quotes: 42368.421000 ms 3.987601% improvement
x86 NARROW master 50mil rows
TXT : 25898.004000 ms
CSV : 27212.684500 ms
TXT with 1/3 escapes: 29189.518250 ms
CSV with 1/3 quotes: 33222.510250 ms
x86 NARROW v8 50mil rows
TXT : 26368.765000 ms -1.817750% regression
CSV : 26711.122250 ms 1.843119% improvement
TXT with 1/3 escapes: 28081.150750 ms 3.797142% improvement
CSV with 1/3 quotes: 32851.963500 ms 1.115348% improvement
arm WIDE master 250,000 rows
TXT : 11392.462750 ms
CSV : 13887.576500 ms
TXT with 1/3 escapes: 12908.560750 ms
CSV with 1/3 quotes: 16699.337000 ms
arm WIDE v8 250,000 rows
TXT : 10524.567750 ms 7.618151% improvement
CSV : 12621.211250 ms 9.118691% improvement
TXT with 1/3 escapes: 12017.030250 ms 6.906506% improvement
CSV with 1/3 quotes: 15428.020500 ms 7.612976% improvement
arm NARROW master 25mil rows
TXT : 10030.274000 ms
CSV : 10245.238750 ms
TXT with 1/3 escapes: 10345.224500 ms
CSV with 1/3 quotes: 12186.313250 ms
arm NARROW v8 25mil rows
TXT : 10197.386500 ms -1.666081% regression
CSV : 10257.918750 ms -0.123765% regression
TXT with 1/3 escapes: 10084.978500 ms 2.515615% improvement
CSV with 1/3 quotes: 12064.215000 ms 1.001929% improvement
-- Manni Wood EDB: https://www.enterprisedb.com
pgsql-hackers by date: