Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers
From | Suraj Kharage |
---|---|
Subject | Re: WIP/PoC for parallel backup |
Date | |
Msg-id | CAF1DzPWwG_BcxcR5wL17qs=pbZ4iR_ROK_vBZbAKBpn8rM0oWA@mail.gmail.com Whole thread Raw |
In response to | Re: WIP/PoC for parallel backup (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: WIP/PoC for parallel backup
Re: WIP/PoC for parallel backup |
List | pgsql-hackers |
We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)
Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)
Server:
RAM: 500 GB |
CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 |
Filesystem: ext4 |
Client:
RAM: 490 GB |
CPU details: Architecture: ppc64le Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 24 |
Filesystem: ext4 |
Data size | without paralle backup patch | parallel backup with 1 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 2 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 4 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 8 worker | % performance increased/decreased compare to normal backup (without patch) |
10 GB (10 tables - each table around 1.05 GB) | real 0m27.016s user 0m3.378s sys 0m23.059s | real 0m30.314s user 0m3.575s sys 0m22.946s | 12% performance decreased | real 0m20.400s user 0m3.622s sys 0m29.670s | 27% performace increased | real 0m15.331s user 0m3.706s sys 0m39.189s | 43% performance increased | real 0m15.094s user 0m3.915s sys 1m23.350s | 44% performace increased. |
50GB (50 tables - each table around 1.05 GB) | real 2m11.049s user 0m16.464s sys 2m1.757s | real 2m26.621s user 0m18.497s sys 2m4.792s | 21% performance decreased | real 1m9.581s user 0m18.298s sys 2m12.030s | 46% performance increased | real 0m53.894s user 0m18.588s sys 2m47.390s | 58% performance increased. | real 0m55.373s user 0m18.423s sys 5m57.470s | 57% performance increased. |
100GB (100 tables - each table around 1.05 GB) | real 4m4.776s user 0m33.699s sys 3m27.777s | real 4m20.862s user 0m35.753s sys 3m28.262s | 6% performance decreased | real 2m37.411s user 0m36.440s sys 4m16.424s" | 35% performance increased | real 1m49.503s user 0m37.200s sys 5m58.077s | 55% performace increased | real 1m36.762s user 0m36.987s sys 9m36.906s | 60% performace increased. |
200GB (200 tables - each table around 1.05 GB) | real 10m34.998s user 1m8.471s sys 7m21.520s | real 11m30.899s user 1m12.933s sys 8m14.496s | 8% performance decreased | real 6m8.481s user 1m13.771s sys 9m31.216s | 41% performance increased | real 4m2.403s user 1m18.331s sys 12m29.661s | 61% performance increased | real 4m3.768s user 1m24.547s sys 15m21.421s | 61% performance increased |
Data size | without paralle backup patch | parallel backup with 1 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 2 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 4 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 8 worker | % performance increased/decreased compare to normal backup (without patch) |
10 GB (10 tables - each table around 1.05 GB) | real 1m36.829s user 0m2.124s sys 0m14.004s | real 1m37.598s user 0m3.272s sys 0m11.110s | 0.8% performance decreased | real 1m36.753s user 0m2.627s sys 0m15.312s | 0.08% performance increased. | real 1m37.212s user 0m3.835s sys 0m13.221s | 0.3% performance decreased. | real 1m36.977s user 0m4.475s sys 0m17.937s | 0.1% perfomance decreased. |
50GB (50 tables - each table around 1.05 GB) | real 7m54.211s user 0m10.826s sys 1m10.435s | real 7m55.603s user 0m16.535s sys 1m8.147s | 0.2% performance decreased | real 7m53.499s user 0m18.131s sys 1m8.822s | 0.1% performance increased. | real 7m54.687s user 0m15.818s sys 1m30.991s | 0.1% performance decreased | real 7m54.658s user 0m20.783s sys 1m34.460s | 0.1% performance decreased |
100GB (100 tables - each table around 1.05 GB) | real 15m45.776s user 0m21.802s sys 2m59.006s | real 15m46.315s user 0m32.499s sys 2m47.245s | 0.05% performance decreased | real 15m46.065s user 0m28.877s sys 2m21.181s | 0.03% performacne drcreased | real 15m47.793s user 0m30.932s sys 2m36.708s | 0.2% performance decresed | real 15m47.129s user 0m35.151s sys 3m23.572s | 0.14% performance decreased. |
200GB (200 tables - each table around 1.05 GB) | real 32m55.720s user 0m50.602s sys 5m38.875s | real 31m30.602s user 0m45.377s sys 4m57.405s | 4% performance increased | real 31m30.214s user 0m55.023s sys 5m8.689s | 4% performance increased | real 31m31.187s user 1m13.390s sys 5m40.861s | 4% performance increased | real 31m31.729s user 1m4.955s sys 6m35.774s | 4% performance decreased |
Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second
Do let me know if anyone has further questions/inputs for the benchmarking.
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>
So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.
> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>
It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: