Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id CANP8+jKYNeX0P1BR3zt+826RGfWDrOMLVMS8h-5-ZRSdqHG5aw@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  ("Syed, Rahila" <Rahila.Syed@nttdata.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On 10 August 2015 at 15:59, Syed, Rahila <Rahila.Syed@nttdata.com> wrote:
Hello,

>When we're in Phase2 or 3, don't we need to report the number of total page scanned or percentage of how many table pages scanned, as well?
The total heap pages scanned need to be reported with phase 2 or 3. Complete progress report need to have numbers from each phase when applicable.

> Phase 1. Report 2 integer counters: heap pages scanned and total heap
> pages,
> 1 float counter: percentage_complete and progress message.
> Phase 2. Report 2 integer counters: index pages scanned and total
> index pages(across all indexes) and progress message.
> Phase 3. 1 integer counter: heap pages vacuumed.

Sorry for being unclear here. What I meant to say is, each phase of a command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a separate array element and
will comprise of n integers, n floats, string. So , in the view reporting progress, VACUUM command can have 3 entries one for each phase.

VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an unbounded number of phases. But that assumes that we don't count truncation as a 4th phase of VACUUM...

SELECT statements also have a variable number of phases, hash, materialize, sorts all act as blocking nodes where you cannot progress to next phase until it is complete and you don't know for certain how much data will come in later phases.

I think the best you'll do is an array of pairs of values [(current blocks, total blocks), ... ]
 
Knowing how many phases there are is a tough problem. I think the only way forwards is to admit that we will publish our best initial estimate of total workload size and then later we may realise it was wrong and publish a better number (do until complete). It's not wonderful, but la vida es loca.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Tom Lane
Date:
Subject: Re: Precedence of standard comparison operators