Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id CANP8+jLLLng=cc+jOcGzFPVy9H6Pg5Zv5s0eCM53obLb6GiPwg@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.
List pgsql-hackers
On 22 July 2015 at 13:00, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 22, 2015 at 3:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> For me, the user workflow looks like these....
>
> Worried: "Task X is taking ages? When is it expected to finish?"
> Ops: 13:50
> <sometime later, about 14:00>
> Worried: "Task X is still running? But I thought its ETA was 13:50?"
> Ops: Now says 14:30
> Worried: "Is it stuck, or is it making progress?"
> Ops: Looks like its making progress
> Worried: "Can we have a look at it and find out what its doing?"

How does Ops know that it is making progress?  Just because the
completion percentage is changing?

You could, but that is not the way I suggested.

We need 
* Some measure of actual progress (the definition of which may vary from action to action, e.g. blocks scanned)
* Some estimate of the total work required
* An estimate of the estimated time of completion - I liked your view that this prediction may be costly to request
 
> In terms of VACUUM specifically: VACUUM should be able to assess beforehand
> whether it will scan the indexes, or it can just assume that it will need to
> scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
> task it has before it starts.

Well, we can assume that it will scan the indexes exactly once, but
the actual number may be more or less; and the cost of rescanning the
heap in phase 2 is also hard to estimate.

Maybe I'm worrying over nothing, but I have a feeling that if we try
to do what you're proposing here, we're gonna end up with this:

https://xkcd.com/612/

Most of the progress estimators I have seen over the ~30 years that
I've been playing with computers have been unreliable, and many of
those have been unreliable to the point of being annoying.  I think
that's likely to happen with what you are proposing too, though of
course like all predictions of the future it could turn out to be
wrong.

Almost like an Optimizer then. Important, often annoyingly wrong, needs more work.

I'm not proposing this feature, I'm merely asking for it to be defined in a way that makes it work for more than just VACUUM. Once we have a way of reporting useful information, other processes can be made to follow that mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are important, even if we never get such information for user queries. But I hope we do.

I won't get in the way of your search for detailed information in more complex forms. Both things are needed.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Kouhei Kaigai
Date:
Subject: Re: fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?
Next
From: Teodor Sigaev
Date:
Subject: stringify MAKE_SQLSTATE()