Re: Optimizing COPY with SIMD - Mailing list pgsql-hackers

From Neil Conway
Subject Re: Optimizing COPY with SIMD
Date
Msg-id CAOW5sYaNuci8gNgEPuk0mx2QXi1rJBikmS=dNmR2jpf0K+4svg@mail.gmail.com
Whole thread Raw
In response to Re: Optimizing COPY with SIMD  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
For pg_lfind32(), we ended up using an overlapping approach for the
vectorized case (see commit 7644a73).  That appeared to help more than it
harmed in the many (admittedly branch predictor friendly) tests I ran.  I
wonder if you could do something similar here.

I didn't entirely follow what you are suggesting here -- seems like we would need to do strlen() for the non-SIMD case if we tried to use a similar approach.

It'd be interesting to see the threshold where your patch starts winning.
IIUC the vector stuff won't take effect until there are 16 bytes to
process.  If we don't expect attributes to ordinarily be >= 16 bytes, it
might be worth trying to mitigate this ~3% regression.  Maybe we can find
some other small gains elsewhere to offset it.

For the particular short-strings benchmark I have been using (3 columns with 8-character ASCII strings in each), I suspect the regression is caused by the need to do a strlen(), rather than the vectorized loop itself (we skip the vectorized loop anyway because sizeof(Vector8) == 16 on this machine). (This explains why we see a regression on short strings for text but not CSV: CSV needed to do a strlen() for the non-quoted-string case regardless). Unfortunately this makes it tricky to make the optimization conditional on the length of the string. I suppose we could play some games where we start with a byte-by-byte loop and then switch over to the vectorized path (and take a strlen()) if we have seen more than, say, sizeof(Vector8) bytes so far. Seems a bit kludgy though.

I will do some more benchmarking and report back. For the time being, I'm not inclined to push to get the CopyAttributeOutTextVector() into the tree in its current state, as I agree that the short-attribute case is quite important.

In the meantime, attached is a revised patch series. This uses SIMD to optimize CopyReadLineText in COPY FROM. Performance results:

====
master @ 8fea1bd5411b:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
  Time (mean ± σ):      1.944 s ±  0.013 s    [User: 0.001 s, System: 0.000 s]
  Range (min … max):    1.927 s …  1.975 s    10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
  Time (mean ± σ):      1.021 s ±  0.017 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):    1.005 s …  1.053 s    10 runs

master + SIMD patches:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
  Time (mean ± σ):      1.513 s ±  0.022 s    [User: 0.001 s, System: 0.000 s]
  Range (min … max):    1.493 s …  1.552 s    10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
  Time (mean ± σ):      1.032 s ±  0.032 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):    1.009 s …  1.113 s    10 runs
====

Neil
 
Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: WIP: parallel GiST index builds
Next
From: Radu Radutiu
Date:
Subject: Re: Postgresql OOM