Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id CANP8+j+NbHL70=ppm=9gkw56QDSAF=cc+Jk_iO0CBUQ_bUMjGQ@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 21 July 2015 at 21:24, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Yes, I suggest just a single column on pg_stat_activity called pct_complete
>
> trace_completion_interval = 5s (default)
>
> Every interval, we report the current % complete for any operation that
> supports it. We just show NULL if the current operation has not reported
> anything or never will.

I am deeply skeptical about the usefulness of a progress-reporting
system that can only report one number.  I think that, in many cases,
it won't be possible to compute an accurate completion percentage, and
even if it is, people may want more data than that for various reasons
anyway.

The goal here was to have a common metric for all tasks. Multiple numbers are OK for me, but not extended detail (yet!)

As I said later:

Simon Riggs <simon@2ndquadrant.com> wrote:
> I'm interested in seeing a report that relates to actual progress made.
> Predicted total work required is also interesting, but is much less trustworthy figure.

I'm aware of the difficulties for VACUUM in particular and agree with your scenario/details. 

That does not deter me from wishing to see high level information, even it varies or is inaccurate. The "arrival time" on my Sat Nav is still useful, even if it changes because of traffic jams that develop while my journey is in progress. If the value bothers me, I look at the detail. So both summary and detail information are useful, but summary is more important even though it is less accurate.


Instead of getting told "we're X% done" (according to some arcane
formula), it's quite reasonable to think that people will want to get
a bunch of values, e.g.:

1. For what percentage of heap pages have we completed phase one (HOT
prune + mark all visible if appropriate + freeze + remember dead
tuples)?
2. For what percentage of heap pages have we completed phase two (mark
line pointers unused)?
3. What percentage of maintenance_work_mem are we currently using to
remember tuples?

For a query, the information we want back is likely to be even more
complicated; e.g. EXPLAIN output with row counts and perhaps timings
to date for each plan node.  We can insist that all status reporting
get boiled down to one number, but I suspect we would be better off
asking ourselves how we could let commands return a richer set of
data.

I agree that it is desirable to have a more detailed breakdown of what is happening. As soon as we do that we hit the need for very action-specific information reporting, which renders the topic much harder and much more specific.

For me, the user workflow looks like these....

Worried: "Task X is taking ages? When is it expected to finish?"
Ops: 13:50
<sometime later, about 14:00>
Worried: "Task X is still running? But I thought its ETA was 13:50?"
Ops: Now says 14:30
Worried: "Is it stuck, or is it making progress?"
Ops: Looks like its making progress
Worried: "Can we have a look at it and find out what its doing?"

Worried: "When will Task Y finish?"
Ops: Monday at 11am
Worried: "Bad news! We should cancel it on Sunday evening."

The point is that nobody looks at the detailed info until we have looked at the summary. So the summary of progress/completion time is important, even if it is possibly wrong. The detail is also useful. I think we should have both, but I'd like to see the summary info first, because it is the most useful, best leading indicator of problems.

In terms of VACUUM specifically: VACUUM should be able to assess beforehand whether it will scan the indexes, or it can just assume that it will need to scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a task it has before it starts.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Alpha2/Beta1
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Asynchronous execution on FDW