Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id CA+Tgmoa4a4VcRZMjgL5wv=2S8ExYU-HMkfg=abf3ekO-q6OU7Q@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.
List pgsql-hackers
On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Yes, I suggest just a single column on pg_stat_activity called pct_complete
>
> trace_completion_interval = 5s (default)
>
> Every interval, we report the current % complete for any operation that
> supports it. We just show NULL if the current operation has not reported
> anything or never will.

I am deeply skeptical about the usefulness of a progress-reporting
system that can only report one number.  I think that, in many cases,
it won't be possible to compute an accurate completion percentage, and
even if it is, people may want more data than that for various reasons
anyway.

For example, in the case of VACUUM, suppose there is a table with
1,000,000 heap pages and 200,000 index pages (so it's probably
over-indexed, but whatever).  After reading 500,000 heap pages, we
have found 0 dead tuples.  What percentage of the work have we
finished?

It's hard to say.  If we don't find any dead tuples, we've read half
the pages we will eventually read and are therefore half done.  But if
we find even 1 dead tuple, then we've got to scan all 200,000 index
pages, so we've read only 41.7% of the pages we'll eventually touch.
If we find so many dead tuples that we have to scan the indexes
multiple times for lack of maintenance_work_mem, we'll eventually read
1,000,000 + 200,000k pages, where k is the number of index scans; if
say k = 5 then we are only 25% done.  All of these scenarios are
plausible because, in all likelihood, the dirty pages in the table are
concentrated near the end.

Now we could come up with ways of making good guesses about what is
likely to happen.  We could look at the data from pg_stat_all_tables,
historical results of vacuuming this table, the state of the
visibility map, and so on.  And that all might help.  But it's going
to be fairly hard to produce a percentage of completion that is
monotonically increasing and always accurately reflects the time
remaining.  Even if we can do it, it doesn't seem like a stretch to
suppose that sometimes people will want to look at the detail data.
Instead of getting told "we're X% done" (according to some arcane
formula), it's quite reasonable to think that people will want to get
a bunch of values, e.g.:

1. For what percentage of heap pages have we completed phase one (HOT
prune + mark all visible if appropriate + freeze + remember dead
tuples)?
2. For what percentage of heap pages have we completed phase two (mark
line pointers unused)?
3. What percentage of maintenance_work_mem are we currently using to
remember tuples?

For a query, the information we want back is likely to be even more
complicated; e.g. EXPLAIN output with row counts and perhaps timings
to date for each plan node.  We can insist that all status reporting
get boiled down to one number, but I suspect we would be better off
asking ourselves how we could let commands return a richer set of
data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Eliminating CREATE INDEX comparator TID tie-breaker overhead
Next
From: Robert Haas
Date:
Subject: Re: BRIN index and aborted transaction