Home > mailing lists

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From	KAZAR Ayoub
Subject	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date	January 31 19:20:58
Msg-id	CA+K2Ru=C_woAnd-3-pGHoNSTR8FOf=7eeSWE1xaLt9ojVWndVg@mail.gmail.com Whole thread Raw
In response to	Re: Speed up COPY FROM text/CSV parsing using SIMD (Neil Conway <neil.conway@gmail.com>)
List	pgsql-hackers

Tree view

Hello,

On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <neil.conway@gmail.com> wrote:

A few suggestions:

* I'm curious if we'll see better performance on large inputs if we flush to `line_buf` periodically (e.g., at least every few thousand bytes or so). Otherwise we might see poor data cache behavior if large inputs with no control characters get evicted before we've copied them over. See the approach taken in escape_json_with_len() in utils/adt/json.c

So i gave this a try, attached is the small patch that has v3 + the suggestion added, here are the results with different threshold for line_buf refill:

Execution time compared to master:

Workload	v3	v3.1 (2k)	v3.1 (4k)	v3.1 (8k)	v3.1 (16k)	v3.1 (20k)	v3.1 (28k)
text/none	-16.5%	-17.4%	-14.3%	-12.6%	-13.6%	-10.5%	-16.3%
text/esc	+5.6%	+11.1%	+3.1%	+7.6%	+3.0%	+4.9%	+4.2%
csv/none	-31.0%	-29.9%	-26.7%	-30.1%	-27.9%	-30.2%	-29.6%
csv/quote	+0.2%	-0.6%	-0.4%	-1.0%	+0.1%	+2.5%	-1.0%

L1d cache miss rates:

Workload	Master	v3	v3.1 (2k)	v3.1 (4k)	v3.1 (8k)	v3.1 (16k)	v3.1 (20k)	v3.1 (28k)
text/none	0.20%	0.23%	0.21%	0.22%	0.21%	0.21%	0.21%	0.22%
text/esc	0.21%	0.22%	0.22%	0.22%	0.22%	0.21%	0.22%	0.22%
csv/none	0.17%	0.22%	0.21%	0.22%	0.21%	0.21%	0.22%	0.22%
csv/quote	0.18%	0.22%	0.19%	0.20%	0.20%	0.19%	0.20%	0.20%

On my laptop I have 32KB L1 cache per core.

Results are super close, it is hard to see in the cache misses numbers but execution times are saying other things, doing the periodic filling of line_buf seems good to do.
If Manni can rerun the benchmarks on these too, it would be nice to confirm this.

Regards,

Ayoub

Attachment

0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch

pgsql-hackers by date:

From: Tom Lane
Date: 31 January, 18:54:33
Subject: Re: ABI Compliance Checker GSoC Project

From: Nathan Bossart
Date: 31 January, 19:21:49
Subject: Re: pg_dumpall --roles-only interact with other options

Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

Attachment

Previous

Next