Thread: [Proposal] Progress bar for pg_dump/pg_restore

[Proposal] Progress bar for pg_dump/pg_restore

From

Taiki Kondo

Date:

12 June 2015, 12:47:03

Hi, all.

I am newbie in hackers.
I have an idea from my point of view as one user, I would like to propose the following.


Progress bar for pg_dump / pg_restore
=====================================

Motivation
----------
"pg_dump" and "pg_restore" show nothing if users don't specify verbose (-v) option.
In too large table to finish in a few minutes, this behavior worries some users about if this situation (nothing shows
up)is all right. 

I propose this feature to free these users from worrying.


Design & API
------------
When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following.


=========>           (50%)  15:50

The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in
above)shows estimated time to finish. 
(This percentage is the ratio for the whole processing.)

Percentage and time are calculated and shown for every 1 second.

In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.

In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I
wouldlike to add a new element to "Archive" structure. 
(This means that version number of archive format is changed.)


Usage
------
To use this feature, user must specify "-P" option in command line.
(This definition is also temporary, so this is changeable if this leads problem.)

$ pg_dump -Fc -P -f foo.pgdump foo

I also think it's better that this feature is enabled as the default and does not force users to specify any options,
butit means changing the default behavior, and can make problem in some programs expecting no output on stdout. 


I will implement this feature if this proposal is accepted by hackers.
(Maybe, I will not use ncurses for implementing this feature, because ncurses can not be used with standard printf
familyfunctions.) 


Any comments are welcome.



Best Regards,

--
Taiki Kondo

Re: [Proposal] Progress bar for pg_dump/pg_restore

From

Merlin Moncure

Date:

12 June 2015, 13:42:09

On Fri, Jun 12, 2015 at 7:45 AM, Taiki Kondo <tai-kondo@yk.jp.nec.com> wrote:
> Hi, all.
>
> I am newbie in hackers.
> I have an idea from my point of view as one user, I would like to propose the following.
>
>
> Progress bar for pg_dump / pg_restore
> =====================================
>
> Motivation
> ----------
> "pg_dump" and "pg_restore" show nothing if users don't specify verbose (-v) option.
> In too large table to finish in a few minutes, this behavior worries some users about if this situation (nothing
showsup) is all right.
 
>
> I propose this feature to free these users from worrying.
>
>
> Design & API
> ------------
> When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following.
>
>
> =========>           (50%)  15:50
>
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in
above)shows estimated time to finish.
 
> (This percentage is the ratio for the whole processing.)
>
> Percentage and time are calculated and shown for every 1 second.
>
> In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.
>
> In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I
wouldlike to add a new element to "Archive" structure.
 
> (This means that version number of archive format is changed.)
>
>
> Usage
> ------
> To use this feature, user must specify "-P" option in command line.
> (This definition is also temporary, so this is changeable if this leads problem.)
>
> $ pg_dump -Fc -P -f foo.pgdump foo
>
> I also think it's better that this feature is enabled as the default and does not force users to specify any options,
butit means changing the default behavior, and can make problem in some programs expecting no output on stdout.
 
>
>
> I will implement this feature if this proposal is accepted by hackers.
> (Maybe, I will not use ncurses for implementing this feature, because ncurses can not be used with standard printf
familyfunctions.)
 
>
>
> Any comments are welcome.

*) how do you estimate %done and ETA when dumping?

*) what's the benefit of doing this instead of using a utility like 'pv'?

merlin

Re: [Proposal] Progress bar for pg_dump/pg_restore

From

Andres Freund

Date:

12 June 2015, 13:48:14

Hi,

On 2015-06-12 12:45:50 +0000, Taiki Kondo wrote:
> Design & API
> ------------
> When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following. 
> 
> 
> =========>           (50%)  15:50
> 
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in
above)shows estimated time to finish.
 
> (This percentage is the ratio for the whole processing.)
> 
> Percentage and time are calculated and shown for every 1 second.
> 
> In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.
> 
> In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I
wouldlike to add a new element to "Archive" structure.
 
> (This means that version number of archive format is changed.)

The question is how to actually get useful estimates. As there's no
progress report for indvidiual COPY and CREATE INDEX commands you'll, in
many cases, have very irregular progress updates. In many many cases
most of the time is spent on a very small subset of the total objects.

Greetings,

Andres Freund

Re: [Proposal] Progress bar for pg_dump/pg_restore

From

Taiki Kondo

Date:

19 June 2015, 08:47:05

Hi, andres

Thank you for your comment, and sorry for late response.

> The question is how to actually get useful estimates. As there's no
> progress report for indvidiual COPY and CREATE INDEX commands you'll, in
> many cases, have very irregular progress updates. In many many cases
> most of the time is spent on a very small subset of the total objects.

When dumping, I think number of tuples can be got from pg_class.reltuples, therefore I want pg_dump to run "select
reltuples"to get it, and then pg_dump will calculate estimated time to execute "COPY FROM" command in getting each
tuples.

For restoring, I think it's better to record above information (number of tuples) into pg_dump file to estimate time to
restoretables. 

And, I also understood your concern about "CREATE INDEX", but we have no way to get progress information of "CREATE
INDEX".
At present, I think it may be good to refer to the same time as estimated time to execute "COPY TO".
But it's better to get information from pg_stat_activity which is proposed at other thread from Anzai-san as following.

http://www.postgresql.org/message-id/116262CF971C844FB6E793F8809B51C6EA6E21@BPXM02GP.gisp.nec.co.jp

How about your opinion?

regards,
--
Taiki Kondo

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andres Freund
Sent: Friday, June 12, 2015 10:48 PM
To: Taiki Kondo
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

Hi,

On 2015-06-12 12:45:50 +0000, Taiki Kondo wrote:
> Design & API
> ------------
> When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following.
>
>
> =========>           (50%)  15:50
>
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in
above)shows estimated time to finish. 
> (This percentage is the ratio for the whole processing.)
>
> Percentage and time are calculated and shown for every 1 second.
>
> In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.
>
> In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I
wouldlike to add a new element to "Archive" structure. 
> (This means that version number of archive format is changed.)

The question is how to actually get useful estimates. As there's no progress report for indvidiual COPY and CREATE
INDEXcommands you'll, in many cases, have very irregular progress updates. In many many cases most of the time is spent
ona very small subset of the total objects. 

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Proposal] Progress bar for pg_dump/pg_restore

From

Craig Ringer

Date:

22 June 2015, 02:45:59

On 19 June 2015 at 16:45, Taiki Kondo <tai-kondo@yk.jp.nec.com> wrote:
> Hi, andres
>
> Thank you for your comment, and sorry for late response.
>
>> The question is how to actually get useful estimates. As there's no
>> progress report for indvidiual COPY and CREATE INDEX commands you'll, in
>> many cases, have very irregular progress updates. In many many cases
>> most of the time is spent on a very small subset of the total objects.
>
> When dumping, I think number of tuples can be got from pg_class.reltuples, therefore I want pg_dump to run "select
reltuples"to get it, and then pg_dump will calculate estimated time to execute "COPY FROM" command in getting each
tuples.

It'd need to be a bit smarter than that, since it'd have to take some
account of average tuple size, etc, but it's an interesting idea to
use the stats to guestimate copy times.

> For restoring, I think it's better to record above information (number of tuples) into pg_dump file to estimate time
torestore tables.

Since we generally suggest that people use a pg_dump and pg_restore
from the server version they're going to be restoring to, that should
be OK. It'd create some new entries in the pg_restore file manifest
that older pg_restore versions wouldn't understand.

> And, I also understood your concern about "CREATE INDEX", but we have no way to get progress information of "CREATE
INDEX".
> At present, I think it may be good to refer to the same time as estimated time to execute "COPY TO".

You could probably get a handwave-quality estimate by looking at the
average column widths for the cols included in the index plus the
number of tuples in the table. It'd be useless for expression indexes,
partial indexes, etc, but you can't have everything...

Interesting idea to explore.

Re: [Proposal] Progress bar for pg_dump/pg_restore

From

Jim Nasby

Date:

22 June 2015, 23:06:23

On 6/21/15 9:45 PM, Craig Ringer wrote:
>> And, I also understood your concern about "CREATE INDEX", but we have no way to get progress information of "CREATE
INDEX".
>> >At present, I think it may be good to refer to the same time as estimated time to execute "COPY TO".
> You could probably get a handwave-quality estimate by looking at the
> average column widths for the cols included in the index plus the
> number of tuples in the table. It'd be useless for expression indexes,
> partial indexes, etc, but you can't have everything...

Jan Urbański demonstrated[1] getting progress stats for long running 
queries[2] at pgCon 2013. Perhaps some of that code would be useful here 
(or better yet perhaps we could include at least the measuring portion 
of his stuff in core... ;)

[1] https://www.pgcon.org/2013/schedule/events/576.en.html
[2] https://github.com/wulczer/pg-progress
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [Proposal] Progress bar for pg_dump/pg_restore

From

Taiki Kondo

Date:

24 June 2015, 10:51:35

Hi, Merlin.

Thank you for your comment, and sorry for late response.

> *) how do you estimate %done and ETA when dumping?

I mentioned in the mail I replied to Andres, I think %done and ETA can be estimated from number of tuples in
"pg_class.reltuples".
Pg_dump, you maybe know, writes in file whenever it reads one tuple when executing "COPY FROM".
Therefore pg_dump can calculate %done and ETA by getting "pg_class.reltuples" and measuring number of dumped tuples per
second.

> *) what's the benefit of doing this instead of using a utility like 'pv'?

Thank you for giving new point of view. I have never known about the utility 'pv'. :)
I tried pg_dump with pv, and then I found this approach uses the number of how many chars passed through the pipe.
In my point of view, it seems that using 'pv' has some problems as following.
At least, I think the following points from No.1 to No.4 are benefits.

1) %done and ETA is calculated from number of chars passed through the pipe (mentioned above), and total amount of
charsis specified by "hand".  Therefore, if specified total amount is completely wrong, %done and ETA have a large gap
fromtheir true value.

2) Since 'pv' is used with pipe processing, pg_dump/pg_restore can't be used together with '-j' option.  This forces
pg_dump/pg_restoreto be processing with only 1 process even if processing with 2+ processes is possible.

3) Since same reason, command line for pg_dump/pg_restore is longer and less easier.  This may spoil user experiences.

4) To pass data through pipe, pg_dump can't be used together with '-f' option, and pg_restore also can't be used
togetherwith '-d' option.  This also may spoil user experiences because command line is longer and less easier.

5) Neither this approach nor my proposal resolve the concern about "CREATE INDEX".  We have to discuss more further for
it.

regards,
--
Taiki Kondo

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com] 
Sent: Friday, June 12, 2015 10:42 PM
To: Taiki Kondo
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

On Fri, Jun 12, 2015 at 7:45 AM, Taiki Kondo <tai-kondo@yk.jp.nec.com> wrote:
> Hi, all.
>
> I am newbie in hackers.
> I have an idea from my point of view as one user, I would like to propose the following.
>
>
> Progress bar for pg_dump / pg_restore
> =====================================
>
> Motivation
> ----------
> "pg_dump" and "pg_restore" show nothing if users don't specify verbose (-v) option.
> In too large table to finish in a few minutes, this behavior worries some users about if this situation (nothing
showsup) is all right.

>
> I propose this feature to free these users from worrying.
>
>
> Design & API
> ------------
> When pg_dump / pg_restore is running, progress bar and estimated time to finish is shown on screen like following.
>
>
> =========>           (50%)  15:50
>
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage of progress, and the time ("15:50" in
above)shows estimated time to finish.

> (This percentage is the ratio for the whole processing.)
>
> Percentage and time are calculated and shown for every 1 second.
>
> In pg_dump, the information, which is required for calculating percentage and time, is from pg_class.
>
> In pg_restore, to calculate the same things, I want to record total amount of command lines into pg_dump file, thus I
wouldlike to add a new element to "Archive" structure.

> (This means that version number of archive format is changed.)
>
>
> Usage
> ------
> To use this feature, user must specify "-P" option in command line.
> (This definition is also temporary, so this is changeable if this leads problem.)
>
> $ pg_dump -Fc -P -f foo.pgdump foo
>
> I also think it's better that this feature is enabled as the default and does not force users to specify any options,
butit means changing the default behavior, and can make problem in some programs expecting no output on stdout.

>
>
> I will implement this feature if this proposal is accepted by hackers.
> (Maybe, I will not use ncurses for implementing this feature, because ncurses can not be used with standard printf
familyfunctions.)

>
>
> Any comments are welcome.

*) how do you estimate %done and ETA when dumping?

*) what's the benefit of doing this instead of using a utility like 'pv'?

merlin