Re: Speed up COPY TO text/CSV parsing using SIMD - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Speed up COPY TO text/CSV parsing using SIMD
Date
Msg-id abmiNPQOqBrRlf_m@nathan
Whole thread Raw
In response to Re: Speed up COPY TO text/CSV parsing using SIMD  (KAZAR Ayoub <ma_kazar@esi.dz>)
Responses Re: Speed up COPY TO text/CSV parsing using SIMD
List pgsql-hackers
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
> 
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
> 
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?

> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> +                                                            bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> +                                                           bool use_quote, bool use_simd, size_t len);

Can you test this on its own, too?  We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.

>              if (is_csv)
> -                CopyAttributeOutCSV(cstate, string,
> -                                    cstate->opts.force_quote_flags[attnum - 1]);
> +            {
> +                if (use_simd)
> +                    CopyAttributeOutCSV(cstate, string,
> +                                        cstate->opts.force_quote_flags[attnum - 1],
> +                                        true, len);
> +                else
> +                    CopyAttributeOutCSV(cstate, string,
> +                                        cstate->opts.force_quote_flags[attnum - 1],
> +                                        false, len);
> +            }
>              else
> -                CopyAttributeOutText(cstate, string);
> +            {
> +                if (use_simd)
> +                    CopyAttributeOutText(cstate, string, true, len);
> +                else
> +                    CopyAttributeOutText(cstate, string, false, len);
> +            }

There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference.  As above, it would
be good to measure it.

-- 
nathan



pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: Fix uninitialized xl_running_xacts padding
Next
From: Haibo Yan
Date:
Subject: Re: Return pg_control from pg_backup_stop().