On 2020/02/06 11:35, Amit Langote wrote:
> On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Yeah, I understand your concern. The pg_basebackup document explains
>>> the risk when --progress is specified, as follows. Since I imagined that
>>> someone may explicitly disable --progress to avoid this risk, I made
>>> the server estimate the total size only when --progress is specified.
>>> But you think that this overhead by --progress is negligibly small?
>>>
>>> --------------------
>>> When this is enabled, the backup will start by enumerating the size of
>>> the entire database, and then go back and send the actual contents.
>>> This may make the backup take slightly longer, and in particular it will
>>> take longer before the first data is sent.
>>> --------------------
>>
>> Sorry, I hadn't read this before. So, my proposal would make this a lie.
>>
>> Still, if "streaming database files" is the longest phase, then not
>> having even an approximation of how much data is to be streamed over
>> doesn't much help estimating progress, at least as long as one only
>> has this view to look at.
>>
>> That said, the overhead of checking the size before sending any data
>> may be worse for some people than others, so having the option to
>> avoid that might be good after all.
>
> By the way, if calculating backup total size can take significantly
> long in some cases, that is when requested by specifying --progress,
> then it might be a good idea to define a separate phase for that, like
> "estimating backup size" or some such. Currently, it's part of
> "starting backup", which covers both running the checkpoint and size
> estimation which run one after another.
OK, I added this phase in the latest patch that I posted upthread.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters