Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side - Mailing list pgsql-hackers

From Amit Langote
Subject Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
Date
Msg-id CA+HiwqHdq8KB8MVt4x+WRqBfFfmJxAim+2fdwqJF_PjFOTZ-eA@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
List pgsql-hackers
On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Yeah, I understand your concern. The pg_basebackup document explains
> > the risk when --progress is specified, as follows. Since I imagined that
> > someone may explicitly disable --progress to avoid this risk, I made
> > the server estimate the total size only when --progress is specified.
> > But you think that this overhead by --progress is negligibly small?
> >
> > --------------------
> > When this is enabled, the backup will start by enumerating the size of
> > the entire database, and then go back and send the actual contents.
> > This may make the backup take slightly longer, and in particular it will
> > take longer before the first data is sent.
> > --------------------
>
> Sorry, I hadn't read this before.  So, my proposal would make this a lie.
>
> Still, if "streaming database files" is the longest phase, then not
> having even an approximation of how much data is to be streamed over
> doesn't much help estimating progress,  at least as long as one only
> has this view to look at.
>
> That said, the overhead of checking the size before sending any data
> may be worse for some people than others, so having the option to
> avoid that might be good after all.

By the way, if calculating backup total size can take significantly
long in some cases, that is when requested by specifying --progress,
then it might be a good idea to define a separate phase for that, like
"estimating backup size" or some such.  Currently, it's part of
"starting backup", which covers both running the checkpoint and size
estimation which run one after another.

I suspect people might never get stuck on "estimating backup size" as
they might on "running checkpoint", which perhaps only strengthens the
case for *always* calculating the size before sending the backup
header.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Memory-Bounded Hash Aggregation
Next
From: Michael Paquier
Date:
Subject: Re: Add %x to PROMPT1 and PROMPT2