Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Suraj Kharage
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CAF1DzPWwG_BcxcR5wL17qs=pbZ4iR_ROK_vBZbAKBpn8rM0oWA@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: WIP/PoC for parallel backup
Re: WIP/PoC for parallel backup
List pgsql-hackers
Hi,

We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup - 
1) Without Asif’s patch 
2) With Asif’s patch and combination of workers 1,2,4,8. 

We run those test on two setup

1) Client and Server both on the same machine (Local backups)

2) Client and server on a different machine (remote backups)


Machine details: 

1: Server (on which local backups performed and used as server for remote backups)

2: Client (Used as a client for remote backups)


Server:

RAM: 500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem: ext4


Client:

RAM: 490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem: ext4
  
Below are the results for the local test: 

Data sizewithout paralle backup
patch
parallel backup with
1 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
parallel backup with
2 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
parallel backup with
4 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
parallel backup with
8 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB)
real 0m27.016s
user 0m3.378s
sys 0m23.059s
real 0m30.314s
user 0m3.575s
sys 0m22.946s
12% performance
decreased
real 0m20.400s
user 0m3.622s
sys 0m29.670s
27% performace
increased
real 0m15.331s
user 0m3.706s
sys 0m39.189s
43% performance
increased
real 0m15.094s
user 0m3.915s
sys 1m23.350s
44% performace
increased.
50GB
(50 tables - each table around 1.05 GB)
real 2m11.049s
user 0m16.464s
sys 2m1.757s
real 2m26.621s
user 0m18.497s
sys 2m4.792s
21% performance
decreased
real 1m9.581s
user 0m18.298s
sys 2m12.030s
46% performance
increased
real 0m53.894s
user 0m18.588s
sys 2m47.390s
58% performance
increased.
real 0m55.373s
user 0m18.423s
sys 5m57.470s
57% performance
increased.
100GB
(100 tables - each table around 1.05 GB)
real 4m4.776s
user 0m33.699s
sys 3m27.777s
real 4m20.862s
user 0m35.753s
sys 3m28.262s
6% performance
decreased
real 2m37.411s
user 0m36.440s
sys 4m16.424s"
35% performance
increased
real 1m49.503s
user 0m37.200s
sys 5m58.077s
55% performace
increased
real 1m36.762s
user 0m36.987s
sys 9m36.906s
60% performace
increased.
200GB
(200 tables - each table around 1.05 GB)
real 10m34.998s
user 1m8.471s
sys 7m21.520s
real 11m30.899s
user 1m12.933s
sys 8m14.496s
8% performance
decreased
real 6m8.481s
user 1m13.771s
sys 9m31.216s
41% performance
increased
real 4m2.403s
user 1m18.331s
sys 12m29.661s
61% performance
increased
real 4m3.768s
user 1m24.547s
sys 15m21.421s
61% performance
increased

Results for the remote test: 

Data sizewithout paralle backup
patch
parallel backup with
1 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
parallel backup with
2 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
parallel backup with
4 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
parallel backup with
8 worker
% performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB)
real 1m36.829s
user 0m2.124s
sys 0m14.004s
real 1m37.598s
user 0m3.272s
sys 0m11.110s
0.8% performance
decreased
real 1m36.753s
user 0m2.627s
sys 0m15.312s
0.08% performance
increased.
real 1m37.212s
user 0m3.835s
sys 0m13.221s
0.3% performance
decreased.
real 1m36.977s
user 0m4.475s
sys 0m17.937s
0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB)
real 7m54.211s
user 0m10.826s
sys 1m10.435s
real 7m55.603s
user 0m16.535s
sys 1m8.147s
0.2% performance
decreased
real 7m53.499s
user 0m18.131s
sys 1m8.822s
0.1% performance
increased.
real 7m54.687s
user 0m15.818s
sys 1m30.991s
0.1% performance
decreased
real 7m54.658s
user 0m20.783s
sys 1m34.460s
0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB)
real 15m45.776s
user 0m21.802s
sys 2m59.006s
real 15m46.315s
user 0m32.499s
sys 2m47.245s
0.05% performance
decreased
real 15m46.065s
user 0m28.877s
sys 2m21.181s
0.03% performacne
drcreased
real 15m47.793s
user 0m30.932s
sys 2m36.708s
0.2% performance
decresed
real 15m47.129s
user 0m35.151s
sys 3m23.572s
0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB)
real 32m55.720s
user 0m50.602s
sys 5m38.875s
real 31m30.602s
user 0m45.377s
sys 4m57.405s
4% performance
increased
real 31m30.214s
user 0m55.023s
sys 5m8.689s
4% performance
increased
real 31m31.187s
user 1m13.390s
sys 5m40.861s
4% performance
increased
real 31m31.729s
user 1m4.955s
sys 6m35.774s
4% performance
decreased


Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.


Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.



We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.


Attaching captured output for

1) Top command output on the server after every 5 second

2) pg_stat_activity output after every 5 second

3) Top command output on the client after every 5 second


Do let me know if anyone has further questions/inputs for the benchmarking. 


Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
>      Instance Type        :t2.xlarge
>      Volume type          :io1
>      Memory (MiB)         :16GB
>      vCPU #               :4
>      Architecture         :x86_64
>      IOP                  :6000
>      Database Size (GB)   :108
>
> Performance test results:
> without patch:
>      real 18m49.346s
>      user 1m24.178s
>      sys 7m2.966s
>
> 1 worker with patch:
>      real 18m43.201s
>      user 1m55.787s
>      sys 7m24.724s
>
> 2 worker with patch:
>      real 18m47.373s
>      user 2m22.970s
>      sys 11m23.891s
>
> 4 worker with patch:
>      real 18m46.878s
>      user 2m26.791s
>      sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario.  Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously.  I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process.  And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster.  One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code).  Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports.  I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PG compilation error with Visual Studio 2015/2017/2019
Next
From: Andy Fan
Date:
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey