Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
Date
Msg-id 20200205.173038.1841242875909024142.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
List pgsql-hackers
At Wed, 5 Feb 2020 16:29:54 +0900, Amit Langote <amitlangote09@gmail.com> wrote in 
> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > On 2020/02/04 10:34, Amit Langote wrote:
> > > I like:
> >
> > Thanks for reviewing the patch!
> >
> > > 1. initializing
> > > 2-5 waiting for backup start checkpoint to finish
> >
> > Can we shorten this to "waiting for checkpoint"? IMO the simpler
> > phase name is better and "to finish" sounds a bit redundant. Also
> > in the description of pg_stat_progress_create_index, basically
> > "waiting for xxx" is used.
> 
> "waiting for checkpoint" works for me.

I'm not sure, but doesn't that mean "waiting for a checkpoint to
start"?  Sorry in advance if that is not the case.

> > > 3-3 streaming database files
> > > 4-5 waiting for wal archiving to finish
> >
> > Can we shorten this to "waiting for wal archiving" because of
> > the same reason as above?
> 
> Yes.
> 
> > > 5-1 transferring wal (or streaming wal)
> >
> > IMO "transferring wal" is better because this phase happens only when
> > "--wal-method=fetch" is specified in pg_basebackup. "streaming wal"
> > seems to implie "--wal-method=stream", instead.
> 
> Ah, okay,
> 
> > > Reading this:
> > >
> > > +     <entry><structfield>backup_total</structfield></entry>
> > > +     <entry><type>bigint</type></entry>
> > > +     <entry>
> > > +      Total amount of data that will be streamed. If progress reporting
> > > +      is not enabled in <application>pg_basebackup</application>
> > > +      (i.e., <literal>--progress</literal> option is not specified),
> > > +      this is <literal>0</literal>.
> > >
> > > I wonder how hard would it be to change basebackup.c to always set
> > > backup_total, irrespective of whether --progress is specified with
> > > pg_basebackup or not?  It seems quite misleading to leave it set to 0,
> > > because one may panic that they have lost their data, that is, if they
> > > haven't first read the documentation.
> >
> > Yeah, I understand your concern. The pg_basebackup document explains
> > the risk when --progress is specified, as follows. Since I imagined that
> > someone may explicitly disable --progress to avoid this risk, I made
> > the server estimate the total size only when --progress is specified.
> > But you think that this overhead by --progress is negligibly small?
> >
> > --------------------
> > When this is enabled, the backup will start by enumerating the size of
> > the entire database, and then go back and send the actual contents.
> > This may make the backup take slightly longer, and in particular it will
> > take longer before the first data is sent.
> > --------------------
> 
> Sorry, I hadn't read this before.  So, my proposal would make this a lie.
> 
> Still, if "streaming database files" is the longest phase, then not
> having even an approximation of how much data is to be streamed over
> doesn't much help estimating progress,  at least as long as one only
> has this view to look at.
> 
> That said, the overhead of checking the size before sending any data
> may be worse for some people than others, so having the option to
> avoid that might be good after all.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: A bug in LWLOCK_STATS
Next
From: Justin Pryzby
Date:
Subject: Re: ALTER tbl rewrite loses CLUSTER ON index