Thread: [HACKERS] pg_basebackup throttling doesn't throttle as promised
The "-r" option to pg_basebackup is supposed to throttle the rate of the backup. But it only works properly if the server is mostly idle.
Every non-trivial call to XLogFlush or XLogBackgroundFlush will wake up the wal sender (the one which is not really sending wal, but base files), and the throttling routine doesn't go back to sleep after being awoke early. Rather, it releases another 32kb of data.
Should the basebackup.c throttle sleep in a loop until its time has expired?
Or should walsender.c WalSndWakeup not wake a wal sender whose status is WALSNDSTATE_BACKUP?
Or both?
Cheers,
Jeff
On Fri, Sep 1, 2017 at 1:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
The "-r" option to pg_basebackup is supposed to throttle the rate of the backup. But it only works properly if the server is mostly idle.Every non-trivial call to XLogFlush or XLogBackgroundFlush will wake up the wal sender (the one which is not really sending wal, but base files), and the throttling routine doesn't go back to sleep after being awoke early. Rather, it releases another 32kb of data.Should the basebackup.c throttle sleep in a loop until its time has expired?Or should walsender.c WalSndWakeup not wake a wal sender whose status is WALSNDSTATE_BACKUP?Or both?
I'm attaching a patch for each option. Each one independently solves the problem. But I think we should do both. There is no point in issuing unnecessary kill system calls, and there may also be more spurious wake-ups than just these ones.
Cheers,
Jeff
Attachment
Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Sep 1, 2017 at 1:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > > The "-r" option to pg_basebackup is supposed to throttle the rate of the > backup. But it only works properly if the server is mostly idle. > > Every non-trivial call to XLogFlush or XLogBackgroundFlush will wake up the > wal sender (the one which is not really sending wal, but base files), and > the throttling routine doesn't go back to sleep after being awoke > early. Rather, it releases another 32kb of data. Sorry, I missed this fact when coding the feature. > Should the basebackup.c throttle sleep in a loop until its time has > expired? I think this is the correct approach because latch can be set for unrelated reasons. The patch makes sense to me. I just recommend moving this part in front of the loop because elapsed_min does not have to be re-computed each time: /* How much should have elapsed at minimum? */ elapsed_min = elapsed_min_unit * (throttling_counter / throttling_sample); And also a comment explaining the purpose of the loop would be appreciated. Thanks. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On Sat, Sep 2, 2017 at 6:42 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > I'm attaching a patch for each option. Each one independently solves the > problem. But I think we should do both. There is no point in issuing > unnecessary kill system calls, and there may also be more spurious wake-ups > than just these ones. I agree that 1) and 2) are sensible things to do now. At some point I think that we will want do_pg_stop_backup() to use a wait event instead of a pg_usleep call when waiting for a segment to be archived, in which case we could use WalSndWakeup() to set the latch and accelerate things. So it would be nice to keep track of this possibility in the code. We could as well do that with a new API only signalling WAL senders in backup state though. SpinLockAcquire(&walsnd->mutex); latch = walsnd->latch; + state = walsnd->state; SpinLockRelease(&walsnd->mutex); - if (latch != NULL) + if (latch != NULL && state != WALSNDSTATE_BACKUP) SetLatch(latch); This surely meritates a comment. -- Michael
Jeff Janes wrote: > I'm attaching a patch for each option. Each one independently solves the > problem. But I think we should do both. There is no point in issuing > unnecessary kill system calls, and there may also be more spurious wake-ups > than just these ones. I modified patch 1 a bit -- result attached. I would like to back-patch this all the way back to 9.4, but there are a few conflicts due to c29aff959dc so I think I'm going to do pg10 only unless I get some votes that it should go further back. I think it should be fairly trivial to resolve. The other patch should be applied to master only IMO. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Alvaro Herrera wrote: > Jeff Janes wrote: > > > I'm attaching a patch for each option. Each one independently solves the > > problem. But I think we should do both. There is no point in issuing > > unnecessary kill system calls, and there may also be more spurious wake-ups > > than just these ones. > > I modified patch 1 a bit -- result attached. I would like to back-patch > this all the way back to 9.4, but there are a few conflicts due to > c29aff959dc so I think I'm going to do pg10 only unless I get some votes > that it should go further back. I think it should be fairly trivial to > resolve. Pushed patch 1 to master and pg10. Please add the other one to commitfest. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services