Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20240805.072012.2006870620510018355.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
List pgsql-hackers
Hi,

I re-ran the benchmark(*) with the v19 patch set and the
following CPUs:

1. AMD Ryzen 9 3900X 12-Core Processor
2. Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

(*)
* Use tables that have {5,10,15,20,25,30} integer columns
* Use tables that have {1,2,3,4,5,6,7,8,9,10}M rows
* Use '/dev/null' for COPY TO
* Use blackhole_am for COPY FROM

See the attached graphs for details.

Notes:
* X-axis is the number of columns
* Y-axis is the number of M rows
* Z-axis is the elapsed time percent (smaller is faster,
  e.g. 99% is a bit faster than the HEAD and 101% is a bit
  slower than the HEAD)
* Z-ranges aren't same (The Ryzen case uses about 79%-121%
  but the Intel case uses about 91%-111%)
* Red means the patch is slower than HEAD
* Blue means the patch is faster than HEAD
* The upper row shows FROM results
* The lower row shows TO results

Here are summaries based on the results:

For FROM:
* With Ryzen: It shows that negative performance impact
* With Intel: It shows that negative performance impact with
  1-5M rows and positive performance impact with 6M-10M rows
For TO:
* With Ryzen: It shows that positive performance impact
* With Intel: It shows that positive performance impact

Here are insights based on the results:

* 0001 (that introduces Copy{From,To}Routine} and adds some
  "if () {...}" for them but the existing formats still
  doesn't use them) has a bit negative performance impact
* 0002 (that migrates the existing codes to
  Copy{From,To}Routine} based implementations) has positive
  performance impact
  * For FROM: Negative impact by 0001 and positive impact by
    0002 almost balanced
    * We should use both of 0001 and 0002 than only 0001
    * With Ryzon: It's a bit slower than HEAD. So we may not
      want to reject this propose for FROM
    * With Intel:
      * With 1-5M rows: It's a bit slower than HEAD
      * With 6-10M rows: It's a bit faster than HEAD
  * For TO: Positive impact by 0002 is larger than negative
    impact by 0002
    * We should use both of 0001 and 0002 than only 0001
* 0003 (that makes Copy{From,To}Routine Node) has a bit
  negative performance impact
  * But I don't know why. This doesn't change per row
    related codes. Increasing Copy{From,To}Routine size
    (NodeTag is added) may be related.
* 0004 (that moves Copy{From,To}StateData to copyapi.h)
  doesn't have impact
  * It makes sense because this doesn't change any
    implementations.
* 0005 (that add "void *opaque" to Copy{From,To}StateData)
  has a bit negative impact for FROM and a bit positive
  impact for TO
  * But I don't know why. This doesn't change per row
    related codes. Increasing Copy{From,To}StateData size
    ("void *opaque" is added) may be related.


How to proceed this proposal?

* Do we need more numbers to judge this proposal?
  * If so, could someone help us?
* There is no negative performance impact for TO with both
  of Ryzen and Intel based on my results. Can we merge only
  the TO part?
  * Can we defer the FROM part? Should we proceed this
    proposal with both of the FROM and TO part?
* Could someone provide a hint why the FROM part is more
  slower with Ryzen?

(If nobody responds to this, this proposal will get stuck
again. If you're interested in this proposal, could you help
us?)


How to run this benchmark on your machine:

$ cd your-postgres
$ git switch -c copy-format-extendable
$ git am v19-*.patch
$ git clone https://gitlab.com/ktou/pg-bench.git ../pg-bench
$ ../pg-bench/bench.sh copy-format-extendable ../pg-bench/copy-format-extendable/run.sh
(This will take about 5 hours...)

If you want to visualize your results on your machine:

$ sudo gem install ruby-gr
$ ../pg-bench/visualize.rb 5

If you share your results to me, I can visualize it and
share.


Thanks,
-- 
kou


Attachment

pgsql-hackers by date:

Previous
From: Erik Wienhold
Date:
Subject: Re: psql: Add leakproof field to \dAo+ meta-command results
Next
From: Joseph Koshakow
Date:
Subject: Re: Remove dependence on integer wrapping