Home > mailing lists

Re: Speed up COPY TO text/CSV parsing using SIMD - Mailing list pgsql-hackers

From	Nathan Bossart
Subject	Re: Speed up COPY TO text/CSV parsing using SIMD
Date	March 27 00:23:48
Msg-id	acWj5FntidHJ9nVP@nathan Whole thread
In response to	Re: Speed up COPY TO text/CSV parsing using SIMD (KAZAR Ayoub <ma_kazar@esi.dz>)
Responses	Re: Speed up COPY TO text/CSV parsing using SIMD
List	pgsql-hackers

Tree view

On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.

I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads.  Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes.  The extra branching for each attribute
might not be something we can just ignore.

> For cases where we do a false commitment on SIMD because we read a binary
> size >= sizeof(Vector8), which i found very niche too, the short circuit to
> scalar each time is even more negligible (the above CSV JSON case is the
> absolute worst case).

That's good to hear.

-- 
nathan

pgsql-hackers by date:

From: Nathan Bossart
Date: 27 March, 00:09:23
Subject: Re: Speed up COPY TO text/CSV parsing using SIMD

From: David Rowley
Date: 27 March, 00:29:47
Subject: Re: another autovacuum scheduling thread

Re: Speed up COPY TO text/CSV parsing using SIMD - Mailing list pgsql-hackers

Previous

Next