Re: Optimizing COPY with SIMD - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Optimizing COPY with SIMD
Date
Msg-id Zl3ZjDRGxAF9Eg7a@nathan
Whole thread Raw
In response to Optimizing COPY with SIMD  (Neil Conway <neil.conway@gmail.com>)
Responses Re: Optimizing COPY with SIMD
List pgsql-hackers
On Sun, Jun 02, 2024 at 03:17:21PM -0400, Neil Conway wrote:
> master @ 8fea1bd541:
> 
> $ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
> Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
>   Time (mean ± σ):      2.027 s ±  0.075 s    [User: 0.001 s, System: 0.000
> s]
>   Range (min … max):    1.928 s …  2.207 s    10 runs
> 
> Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
>   Time (mean ± σ):      1.420 s ±  0.027 s    [User: 0.001 s, System: 0.000
> s]
>   Range (min … max):    1.379 s …  1.473 s    10 runs
> 
> Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
>   Time (mean ± σ):     546.0 ms ±   9.6 ms    [User: 1.4 ms, System: 0.3 ms]
>   Range (min … max):   539.0 ms … 572.1 ms    10 runs
> 
> master + SIMD patch:
> 
> $ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
> Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
>   Time (mean ± σ):     797.8 ms ±  19.4 ms    [User: 0.9 ms, System: 0.0 ms]
>   Range (min … max):   770.0 ms … 828.5 ms    10 runs
> 
> Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
>   Time (mean ± σ):     732.3 ms ±  20.8 ms    [User: 1.2 ms, System: 0.0 ms]
>   Range (min … max):   701.1 ms … 763.5 ms    10 runs
> 
> Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
>   Time (mean ± σ):     545.7 ms ±  13.5 ms    [User: 1.3 ms, System: 0.1 ms]
>   Range (min … max):   533.6 ms … 580.2 ms    10 runs

These are nice results.

> -/*
> - * Send text representation of one attribute, with conversion and escaping
> - */
>  #define DUMPSOFAR() \

IIUC this comment was meant to describe the CopyAttributeOutText() function
just below this macro.  When the macro was added in commit 0a5fdb0 from
2006, the comment became detached from the function.  Maybe we should just
move it back down below the macro.

> +/*
> + * Send text representation of one attribute, with conversion and CSV-style
> + * escaping. This variant uses SIMD instructions to optimize processing, but
> + * we can only use this approach when encoding_embeds_ascii if false.
> + */

nitpick: Can we add a few words about why using SIMD instructions when
encoding_embeds_ascii is true is difficult?  I don't dispute that it is
complex and/or not worth the effort, but it's not clear to me why that's
the case just from reading the patch.

> +static void
> +CopyAttributeOutCSVFast(CopyToState cstate, const char *ptr,
> +                        bool use_quote)

nitpick: Can we add "vector" or "simd" to the name instead of "fast"?  IMHO
it's better to be more descriptive.

At a glance, the code look pretty reasonable to me.  I might have some
other nitpicks, such as styling tricks to avoid too many levels of
indentation, but that's not terribly important.

-- 
nathan



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
Next
From: Wolfgang Walther
Date:
Subject: Re: Build with LTO / -flto on macOS