Re: Speed up COPY TO text/CSV parsing using SIMD - Mailing list pgsql-hackers

From KAZAR Ayoub
Subject Re: Speed up COPY TO text/CSV parsing using SIMD
Date
Msg-id CA+K2Rum-TB_iNzDWoXOJspf=jq0gd-wees8+9tBTJNyhy9cK5g@mail.gmail.com
Whole thread
In response to Re: Speed up COPY TO text/CSV parsing using SIMD  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Speed up COPY TO text/CSV parsing using SIMD
Re: Speed up COPY TO text/CSV parsing using SIMD
List pgsql-hackers
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?
I'll respond to this separately in a different email.

> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> +                                                                                                                     bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> +                                                                                                                bool use_quote, bool use_simd, size_t len);

Can you test this on its own, too?  We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
I tested inlining alone and found the results were about an improvement of 1% to 4% across all configurations.
The inlining is only meaningful in combination with the SIMD work, for the reason described below. 

>                       if (is_csv)
> -                             CopyAttributeOutCSV(cstate, string,
> -                                                                     cstate->opts.force_quote_flags[attnum - 1]);
> +                     {
> +                             if (use_simd)
> +                                     CopyAttributeOutCSV(cstate, string,
> +                                                                             cstate->opts.force_quote_flags[attnum - 1],
> +                                                                             true, len);
> +                             else
> +                                     CopyAttributeOutCSV(cstate, string,
> +                                                                             cstate->opts.force_quote_flags[attnum - 1],
> +                                                                             false, len);

There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference.  As above, it would
be good to measure it
I compiled three variants

v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are not inlined.

None of the helpers are explicitly inlined by us.

The assembly reveals two things:
1) The CSV SIMD helpers (CopyCheckCSVQuoteNeedSIMD, CopySkipCSVEscapeSIMD) are inlined by the compiler naturally in all
three variants, CopySkipTextSIMD is never inlined by the compiler in any variant.

2) The constant-emitting approach (v3) does matter (just a little apparently) specifically for CopySkipTextSIMD.  
Its the same story as COPY FROM patch's first commit it just emits code without use_simd branch
     jbe  ...   ; len > sizeof(Vector8)
     je   ...   ; need_transcoding
     call CopySkipTextSIMD

Whether the extra branching in for constant passing is worth it or not is demonstrated by the benchmark.


  Test                 Master    v3       v3_var   v3_var_noinl
  TEXT clean           1504ms   -24.1%   -23.0%   -21.5%
  CSV clean            1760ms   -34.9%   -32.7%   -33.0%
  TEXT 1/3 backslashes     3763ms    +4.6%    +6.9%   +4.1%
  CSV 1/3 quotes           3885ms    +3.1%    +2.7%    -0.8%

Wide table TEXT (integer columns):

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2083ms   -0.7%    -0.6%    +3.5%
  100     4094ms   -0.1%    -0.5%    +4.5%
  200     1560ms   +0.6%    -2.3%    +3.2%
  500     1905ms   -1.0%    -1.3%    +4.7%
  1000    1455ms   +1.8%    +0.4%    +4.3%

Wide table CSV:

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2421ms   +4.0%    +6.7%    +5.8%
  100     4980ms   +0.1%    +2.0%     +0.1%
  200     1901ms   +1.4%    +3.5%    +1.4%
  500     2328ms   +1.8%    +2.7%    +2.2%
  1000    1815ms   +2.0%    +2.8%    +2.5%

I'm not sure whether there's a diff between v3 and v3_var practically speaking, what do you think ?


Regards,
Ayoub

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Changing the state of data checksums in a running cluster
Next
From: David Rowley
Date:
Subject: Re: another autovacuum scheduling thread