Home > mailing lists

Re: Optimizing COPY with SIMD - Mailing list pgsql-hackers

From	Neil Conway
Subject	Re: Optimizing COPY with SIMD
Date	June 7, 2024 18:07:36
Msg-id	CAOW5sYaNuci8gNgEPuk0mx2QXi1rJBikmS=dNmR2jpf0K+4svg@mail.gmail.com Whole thread
In response to	Re: Optimizing COPY with SIMD (Nathan Bossart <nathandbossart@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

For pg_lfind32(), we ended up using an overlapping approach for the
vectorized case (see commit 7644a73). That appeared to help more than it
harmed in the many (admittedly branch predictor friendly) tests I ran. I
wonder if you could do something similar here.

I didn't entirely follow what you are suggesting here -- seems like we would need to do strlen() for the non-SIMD case if we tried to use a similar approach.

It'd be interesting to see the threshold where your patch starts winning.
IIUC the vector stuff won't take effect until there are 16 bytes to
process. If we don't expect attributes to ordinarily be >= 16 bytes, it
might be worth trying to mitigate this ~3% regression. Maybe we can find
some other small gains elsewhere to offset it.

For the particular short-strings benchmark I have been using (3 columns with 8-character ASCII strings in each), I suspect the regression is caused by the need to do a strlen(), rather than the vectorized loop itself (we skip the vectorized loop anyway because sizeof(Vector8) == 16 on this machine). (This explains why we see a regression on short strings for text but not CSV: CSV needed to do a strlen() for the non-quoted-string case regardless). Unfortunately this makes it tricky to make the optimization conditional on the length of the string. I suppose we could play some games where we start with a byte-by-byte loop and then switch over to the vectorized path (and take a strlen()) if we have seen more than, say, sizeof(Vector8) bytes so far. Seems a bit kludgy though.

I will do some more benchmarking and report back. For the time being, I'm not inclined to push to get the CopyAttributeOutTextVector() into the tree in its current state, as I agree that the short-attribute case is quite important.

In the meantime, attached is a revised patch series. This uses SIMD to optimize CopyReadLineText in COPY FROM. Performance results:

====

master @ 8fea1bd5411b:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.944 s ± 0.013 s [User: 0.001 s, System: 0.000 s]
Range (min … max): 1.927 s … 1.975 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.021 s ± 0.017 s [User: 0.002 s, System: 0.001 s]
Range (min … max): 1.005 s … 1.053 s 10 runs

master + SIMD patches:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.513 s ± 0.022 s [User: 0.001 s, System: 0.000 s]
Range (min … max): 1.493 s … 1.552 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.032 s ± 0.032 s [User: 0.002 s, System: 0.001 s]
Range (min … max): 1.009 s … 1.113 s 10 runs

====

Neil

Attachment

pgsql-hackers by date:

From: Tomas Vondra
Date: 07 June 2024, 17:41:10
Subject: WIP: parallel GiST index builds

From: Radu Radutiu
Date: 07 June 2024, 18:42:58
Subject: Re: Postgresql OOM

Re: Optimizing COPY with SIMD - Mailing list pgsql-hackers

Attachment

Previous

Next