Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side - Mailing list pgsql-hackers

From Amit Langote
Subject Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
Date
Msg-id CA+HiwqGa9H4Ryqmet9SZJeE3GHqYi3sc3Bz-tDScVaQQ4wzd3Q@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
List pgsql-hackers
On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/02/04 10:34, Amit Langote wrote:
> > I like:
>
> Thanks for reviewing the patch!
>
> > 1. initializing
> > 2-5 waiting for backup start checkpoint to finish
>
> Can we shorten this to "waiting for checkpoint"? IMO the simpler
> phase name is better and "to finish" sounds a bit redundant. Also
> in the description of pg_stat_progress_create_index, basically
> "waiting for xxx" is used.

"waiting for checkpoint" works for me.

> > 3-3 streaming database files
> > 4-5 waiting for wal archiving to finish
>
> Can we shorten this to "waiting for wal archiving" because of
> the same reason as above?

Yes.

> > 5-1 transferring wal (or streaming wal)
>
> IMO "transferring wal" is better because this phase happens only when
> "--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
> seems to implie "--wal-method=stream", instead.

Ah, okay,

> > Reading this:
> >
> > +     <entry><structfield>backup_total</structfield></entry>
> > +     <entry><type>bigint</type></entry>
> > +     <entry>
> > +      Total amount of data that will be streamed. If progress reporting
> > +      is not enabled in <application>pg_basebackup</application>
> > +      (i.e., <literal>--progress</literal> option is not specified),
> > +      this is <literal>0</literal>.
> >
> > I wonder how hard would it be to change basebackup.c to always set
> > backup_total, irrespective of whether --progress is specified with
> > pg_basebackup or not?  It seems quite misleading to leave it set to 0,
> > because one may panic that they have lost their data, that is, if they
> > haven't first read the documentation.
>
> Yeah, I understand your concern. The pg_basebackup document explains
> the risk when --progress is specified, as follows. Since I imagined that
> someone may explicitly disable --progress to avoid this risk, I made
> the server estimate the total size only when --progress is specified.
> But you think that this overhead by --progress is negligibly small?
>
> --------------------
> When this is enabled, the backup will start by enumerating the size of
> the entire database, and then go back and send the actual contents.
> This may make the backup take slightly longer, and in particular it will
> take longer before the first data is sent.
> --------------------

Sorry, I hadn't read this before.  So, my proposal would make this a lie.

Still, if "streaming database files" is the longest phase, then not
having even an approximation of how much data is to be streamed over
doesn't much help estimating progress,  at least as long as one only
has this view to look at.

That said, the overhead of checking the size before sending any data
may be worse for some people than others, so having the option to
avoid that might be good after all.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Julien Rouhaud
Date:
Subject: Re: A bug in LWLOCK_STATS