Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From Manni Wood
Subject Re: Speed up COPY FROM text/CSV parsing using SIMD
Date
Msg-id CAKWEB6oE_1aNRV-utrAkUbYhM-0z1fZRvYLrDC4SxfW-UXOmpA@mail.gmail.com
Whole thread
In response to Re: Speed up COPY FROM text/CSV parsing using SIMD  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Speed up COPY FROM text/CSV parsing using SIMD
List pgsql-hackers


On Mon, Mar 9, 2026 at 1:25 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> +#ifndef USE_NO_SIMD
> +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> +                                                                        bool *temp_hit_eof, int *temp_input_buf_ptr);
> +#endif

Should we inline this, too?

> +                             /*
> +                              * Do not disable SIMD when we hit EOL or EOF characters. In
> +                              * practice, it does not matter for EOF because parsing ends
> +                              * there, but we keep the behavior consistent.
> +                              */
> +                             if (!(simd_hit_eof || simd_hit_eol))
> +                                     cstate->simd_enabled = false;

nitpick: I would personally avoid disabling it for EOF.  It probably
doesn't amount to much, but I don't see any point in the extra
complexity/work solely for consistency.

> +                             /*
> +                              * We encountered a EOL or EOF on the first vector. This means
> +                              * lines are not long enough to skip fully sized vector. If
> +                              * this happens two times consecutively, then disable the
> +                              * SIMD.
> +                              */
> +                             if (first_vector)
> +                             {
> +                                     if (cstate->simd_failed_first_vector)
> +                                             cstate->simd_enabled = false;
> +
> +                                     cstate->simd_failed_first_vector = true;
> +                             }

The first time I saw this, my mind immediately went to the extreme case
where this likely regresses: alternating long and short lines.  We might
just want to disable it the first time we see a short line, like we do for
special characters.  This is another thing that we can improve
independently later on.

> +     /* First try to run SIMD, then continue with the scalar path */
> +     if (cstate->simd_enabled)
> +     {
> +             int                     temp_input_buf_ptr = input_buf_ptr;
> +             bool            temp_hit_eof = false;
> +
> +             result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
> +                                                                                     &temp_input_buf_ptr);
> +             input_buf_ptr = temp_input_buf_ptr;
> +             hit_eof = temp_hit_eof;

Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
moving the SIMD stuff to its own function is nice.  The temp variables seem
a bit too magical to me, though.  If those really make a difference, IMHO
there ought to be a big comment explaining why.

--
nathan

Here are some benchmarks showing what performance will look like for users who continue to use default_toast_compression = pglz.

all compiled by meson with debugoptimized (-g -O2)

arm NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 10055.141000 ms
CSV :                 10549.174500 ms
TXT with 1/3 escapes: 10213.864750 ms
CSV with 1/3 quotes:  12188.039000 ms

arm NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 10070.153750 ms  -0.149304% regression
CSV :                 10161.348750 ms   3.676361% improvement
TXT with 1/3 escapes: 10618.005000 ms  -3.956781% regression
CSV with 1/3 quotes:  12279.366250 ms  -0.749319% regression

arm WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 11355.602750 ms
CSV :                 13893.110500 ms
TXT with 1/3 escapes: 12872.690500 ms
CSV with 1/3 quotes:  16722.262500 ms

arm WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 9001.007250 ms  20.735099% improvement
CSV :                 8988.679750 ms  35.301171% improvement
TXT with 1/3 escapes: 12191.137000 ms  5.294569% improvement
CSV with 1/3 quotes:  16297.541500 ms  2.539854% improvement


x86 NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 26243.084500 ms
CSV :                 27719.564000 ms
TXT with 1/3 escapes: 29578.192750 ms
CSV with 1/3 quotes:  34467.571250 ms

x86 NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 26371.996750 ms  -0.491224% regression
CSV :                 26137.186500 ms   5.708522% improvement
TXT with 1/3 escapes: 28080.201000 ms   5.064514% improvement
CSV with 1/3 quotes:  32557.377500 ms   5.542003% improvement

x86 WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 28734.774750 ms
CSV :                 35700.485000 ms
TXT with 1/3 escapes: 32376.878250 ms
CSV with 1/3 quotes:  47024.985750 ms

x86 WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 22753.755750 ms  20.814567% improvement
CSV :                 22977.195500 ms  35.638982% improvement
TXT with 1/3 escapes: 29526.887000 ms   8.802551% improvement
CSV with 1/3 quotes:  40298.196750 ms  14.304712% improvement
--
-- Manni Wood EDB: https://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Patch for migration of the pg_commit_ts directory