Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Re: Speed up COPY FROM text/CSV parsing using SIMD
Date
Msg-id CAN55FZ2v9GG2bUKYdjWKpMpoTT7-1LG0y0b9d3sh0j7Lms6tFA@mail.gmail.com
Whole thread Raw
In response to Re: Speed up COPY FROM text/CSV parsing using SIMD  (Manni Wood <manni.wood@enterprisedb.com>)
List pgsql-hackers
Hi,

On Thu, 8 Jan 2026 at 23:50, Manni Wood <manni.wood@enterprisedb.com> wrote:
>
> I tested master (bfb335d) and v3 and v4.2 patches on an amazon ec2 instance (t2.small) and, with Mark's help, proved
thaton such a small system with default storage configured, IO will be the bottleneck and the v3 and v4.2 patches show
nosignificant differences over master because the CPU is always waiting on IO. This is presumably an experience
Postgresusers will have when running on systems with IO so slow that the CPU is always waiting for data. 

I think this is expected and acceptable since there is no regression.

> I went in the other direction and tested an all-RAM setup on my tower PC. I put the entire data dir in RAM for each
postgresinstance (master, v3 patch, v4.2 patch), and wrote and copied the test copyfiles from RAM. On Linux,
/ram/user/<myuserid>is tmpfs (ramdisk), so I just put everything there. I had to shrink the data sizes compared to
previousruns (to not run out of ramdisk space) but Nazir's cpupower tips are making all of my test runs much more
uniform,so I no longer feel that I need huge data sizes to get good results. 

This is an informative scenario to test, thank you for doing this.

> Here are the results when all of the files are on RAM disks:
>
> master: bfb335df
>
> text, no special: 30372
> text, 1/3 special: 32665
> csv, no special: 34925
> csv, 1/3 special: 44044
>
> v3
>
> text, no special: 22840 (24.7% speedup)
> text, 1/3 special: 32448 (0.6% speedup)
> csv, no special: 22642 (35.1% speedup)
> csv, 1/3 special: 46280 (5.1% regression)
>
> v4.2
>
> text, no special: 22677 (25.3% speedup)
> text, 1/3 special: 34512 (6.5% regression)
> csv, no special: 22686 (35.0% speedup)
> csv, 1/3 special: 51411 (16.7% regression)
>
> Assuming all-storage-is-RAM setups get us closer to the theoretical limit of each patch, it looks like v3 holds up
quitewell to v4.2 in the best case scenarios, while v3 has better performance than v4.2 in the worst-case scenarios.! 

I agree with you, I think all-storage-is-RAM is the best scenario to
benchmark regressions.

Your all-storage-is-RAM benchmark results are similar to your normal
benchmark here [1]. I expect the regressions in the all-storage-is-RAM
results to be worse than [1] since the effect of SIMD will be more
visible as there is no IO wait. What do you think about that?

[1] https://postgr.es/m/CAKWEB6qQmPejLwndzMcVqhO%2Bu9vgUf44kYAKg4QtyG-%3DKH%3DYGg%40mail.gmail.com


--
Regards,
Nazir Bilal Yavuz
Microsoft



pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Next
From: Greg Sabino Mullane
Date:
Subject: Re: Fix how some lists are displayed by psql \d+