Re: [Proposal] Progress bar for pg_dump/pg_restore - Mailing list pgsql-hackers

From Taiki Kondo
Subject Re: [Proposal] Progress bar for pg_dump/pg_restore
Date
Msg-id 12A9442FBAE80D4E8953883E0B84E0885728A1@BPXM01GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: [Proposal] Progress bar for pg_dump/pg_restore  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
Hi, Merlin.

Thank you for your comment, and sorry for late response.

> *) how do you estimate %done and ETA when dumping?

I mentioned in the mail I replied to Andres, I think %done and ETA can be estimated from number of tuples in
"pg_class.reltuples".
Pg_dump, you maybe know, writes in file whenever it reads one tuple when executing "COPY FROM".
Therefore pg_dump can calculate %done and ETA by getting "pg_class.reltuples" and measuring number of dumped tuples per
second.

> *) what's the benefit of doing this instead of using a utility like 'pv'?

Thank you for giving new point of view. I have never known about the utility 'pv'. :)
I tried pg_dump with pv, and then I found this approach uses the number of how many chars passed through the pipe.
In my point of view, it seems that using 'pv' has some problems as following.
At least, I think the following points from No.1 to No.4 are benefits.

1) %done and ETA is calculated from number of chars passed through the pipe (mentioned above), and total amount of
charsis specified by "hand".  Therefore, if specified total amount is completely wrong, %done and ETA have a large gap
fromtheir true value.
 
2) Since 'pv' is used with pipe processing, pg_dump/pg_restore can't be used together with '-j' option.  This forces
pg_dump/pg_restoreto be processing with only 1 process even if processing with 2+ processes is possible.
 
3) Since same reason, command line for pg_dump/pg_restore is longer and less easier.  This may spoil user experiences.

4) To pass data through pipe, pg_dump can't be used together with '-f' option, and pg_restore also can't be used
togetherwith '-d' option.  This also may spoil user experiences because command line is longer and less easier.
 
5) Neither this approach nor my proposal resolve the concern about "CREATE INDEX".  We have to discuss more further for
it.



regards,
--
Taiki Kondo



-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Friday, June 12, 2015 10:42 PM
To: Taiki Kondo
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

On Fri, Jun 12, 2015 at 7:45 AM, Taiki Kondo <tai-kondo@yk.jp.nec.com> wrote:
> Hi, all.
>
> I am newbie in hackers.
> I have an idea from my point of view as one user, I would like to propose the following.
>
>
> Progress bar for pg_dump / pg_restore
> =====================================
>
> Motivation
> ----------
> "pg_dump" and "pg_restore" show nothing if users don't specify verbose (-v) option.
> In too large table to finish in a few minutes, this behavior worries some users about if this situation (nothing
showsup) is all right.
 
>
> I propose this feature to free these users from worrying.
>
>
> Design & API
> ------------
> When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following.
>
>
> =========>           (50%)  15:50
>
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in
above)shows estimated time to finish.
 
> (This percentage is the ratio for the whole processing.)
>
> Percentage and time are calculated and shown for every 1 second.
>
> In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.
>
> In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I
wouldlike to add a new element to "Archive" structure.
 
> (This means that version number of archive format is changed.)
>
>
> Usage
> ------
> To use this feature, user must specify "-P" option in command line.
> (This definition is also temporary, so this is changeable if this leads problem.)
>
> $ pg_dump -Fc -P -f foo.pgdump foo
>
> I also think it's better that this feature is enabled as the default and does not force users to specify any options,
butit means changing the default behavior, and can make problem in some programs expecting no output on stdout.
 
>
>
> I will implement this feature if this proposal is accepted by hackers.
> (Maybe, I will not use ncurses for implementing this feature, because ncurses can not be used with standard printf
familyfunctions.)
 
>
>
> Any comments are welcome.

*) how do you estimate %done and ETA when dumping?

*) what's the benefit of doing this instead of using a utility like 'pv'?

merlin

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: 9.5 release notes
Next
From: Uriy Zhuravlev
Date:
Subject: Re: WIP: Enhanced ALTER OPERATOR