Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers
| From | KAZAR Ayoub |
|---|---|
| Subject | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date | |
| Msg-id | CA+K2Ru=fFTUVgEDr-fBed5aOMeDbH9vrOEhapXzHEpBeOxkucg@mail.gmail.com Whole thread |
| In response to | Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
| Responses |
Re: Speed up COPY FROM text/CSV parsing using SIMD
|
| List | pgsql-hackers |
Hello,
I ran some long benchmarks on this, and I got stable results across multiple runs (few milliseconds difference)sudo cpupower idle-set -D 0
echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
WIDE (500k rows)
TXT | none
Master avg: 22,183 ms
New avg: 20,435 ms
Improvement: -7.88%
CSV | none
Master avg: 26,737 ms
New avg: 24,625 ms
Improvement: -7.90%
TXT | escape
Master avg: 26,720 ms
New avg: 23,658 ms
Improvement: -11.46%
CSV | quote
Master avg: 35,961 ms
New avg: 33,317 ms
Improvement: -7.35%
--------------------------------------
NARROW (1.5M rows)
TXT | none
Master avg: 2,220 ms
New avg: 2,125 ms
Improvement: -4.28%
CSV | none
Master avg: 2,330 ms
New avg: 2,145 ms
Improvement: -7.92%
TXT | escape
Master avg: 2,425 ms
New avg: 2,187 ms
Improvement: -9.79%
CSV | quote
Master avg: 2,272 ms
New avg: 2,253 ms
Improvement: -0.85%
No regressions as expected, overall this looks good.
Regards,
Ayoub
On Thu, Feb 19, 2026 at 10:01 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Hi,
On Thu, 19 Feb 2026 at 07:02, Manni Wood <manni.wood@enterprisedb.com> wrote:
>
> I took some time tonight to apply v8 to the latest master (759b03b2) on my x86 tower and arm raspberry pi 5.
>
> Here are the results, using both narrow columns and the wider columns we've been using througout:
>
> x86 master NARROW
> TXT : 2587.642000 ms
> CSV : 2621.759000 ms
> TXT with 1/3 escapes: 2707.933500 ms
> CSV with 1/3 quotes: 3254.896500 ms
>
> x86 v8 NARROW
> TXT : 2488.655250 ms 3.825365% improvement
> CSV : 2628.818000 ms -0.269247% regression
> TXT with 1/3 escapes: 2615.522000 ms 3.412621% improvement
> CSV with 1/3 quotes: 3446.368000 ms -5.882568% regression
>
> x86 master WIDE
> TXT : 30583.229500 ms
> CSV : 35054.533500 ms
> TXT with 1/3 escapes: 32767.421500 ms
> CSV with 1/3 quotes: 44214.163500 ms
>
> x86 v8 WIDE
> TXT : 26527.494250 ms 13.261305% improvement
> CSV : 33364.443750 ms 4.821316% improvement
> TXT with 1/3 escapes: 29320.648000 ms 10.518904% improvement
> CSV with 1/3 quotes: 42334.074750 ms 4.252232% improvement
>
>
>
> arm master NARROW
> TXT : 1999.401000 ms
> CSV : 2081.610750 ms
> TXT with 1/3 escapes: 2053.230250 ms
> CSV with 1/3 quotes: 2431.608750 ms
>
> arm v8 NARROW
> TXT : 1981.663750 ms 0.887128% improvement
> CSV : 2023.892500 ms 2.772769% improvement
> TXT with 1/3 escapes: 2004.215250 ms 2.387214% improvement
> CSV with 1/3 quotes: 2616.872750 ms -7.618989% regression
>
> arm master WIDE
> TXT : 9120.731750 ms
> CSV : 11114.478250 ms
> TXT with 1/3 escapes: 10338.124500 ms
> CSV with 1/3 quotes: 13404.430250 ms
>
> arm v8 WIDE
> TXT : 8430.090750 ms 7.572210% improvement
> CSV : 10115.135500 ms 8.991360% improvement
> TXT with 1/3 escapes: 9624.383500 ms 6.903970% improvement
> CSV with 1/3 quotes: 12331.714000 ms 8.002699% improvement
Thank you for the results, they are interesting. I didn't expect to
see any regression for this benchmark. Also, I would expect the
non-special character cases and the 1/3 special character cases to
perform similarly, since we are not using SIMD for this benchmark.
I noticed that the timings in your narrow benchmark (both x86 and ARM)
are quite short. Would it be possible to extend the test so that the
total runtime is closer to ~10,000 ms? That might give us more stable
results.
Here is my benchmark with using your script:
WIDE: Total 500000 lines and each line is 4096 bytes.
NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and `A\\A`).
+---------+---------------+---------------+---------------+----------------+
| WIDE | TXT None | TXT 1/3 | CSV None | CSV 1/3 |
+---------+---------------+---------------+---------------+----------------+
| master | 10512 | 11133 | 12241 | 14321 |
+---------+---------------+---------------+---------------+----------------+
| patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008 (-%2.18) |
+---------+---------------+---------------+---------------+----------------+
| | | | | |
+---------+---------------+---------------+---------------+----------------+
| NARROW | | | | |
+---------+---------------+---------------+---------------+----------------+
| master | 9702 | 9745 | 9784 | 10149 |
+---------+---------------+---------------+---------------+----------------+
| patched | 9344 (-%3.6) | 9477 (-%2.7) | 9439 (-%3.5) | 9751 (-%3.9) |
+---------+---------------+---------------+---------------+----------------+
The results look promising to me.
--
Regards,
Nazir Bilal Yavuz
Microsoft
pgsql-hackers by date: