Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CAGPqQf0Ehh-jxGRgYAk7j0oPRrW2Xk_d+h7f9yykznN2ewG=dQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Ahsan Hadi <ahsan.hadi@gmail.com>)
Responses Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:


On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine?  If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

In the particular setup, the client and server were on different machines. 


During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows?  Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com


Sure, we can try out this and do a similar run to collect the pg_stat_activity output.

Have you had the chance to try this out?

Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.

Captured wait events after every 3 seconds during the backup for -
1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)

Here is the observation:

The total number of events (pg_stat_activity) captured during above runs:
- 314 events for normal backups
- 316 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
37 - in normal backups
25 - in the parallel backup (-j 4)

ClientWrite wait event numbers:
175 - in normal backup
1098 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
326 - ClientRead in parallel backups for diff processes. (all in idle state)




Thanks,

Rushabh Lathia
Attachment

pgsql-hackers by date:

Previous
From: Ahsan Hadi
Date:
Subject: Re: WIP/PoC for parallel backup
Next
From: Michael Paquier
Date:
Subject: Schedule of commit fests for PG14