Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id 564E7790.4000403@lab.ntt.co.jp
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On 2015/11/20 0:57, Jim Nasby wrote:
> On 11/19/15 1:18 AM, Amit Langote wrote:
>> 1) General purpose interface for (maintenance?) commands to report a set
> 
> I'm surprised no one has picked up on using this for DML. Certainly anyone
> works with ETL processes would love to be able to get some clue on the
> status of a long running query...

Instrumenting query execution for progress info would be a complex beast
though. Also, what kind of reporting interface it would require is also
not clear, at least to me. Jan Urbanski's PGCon presentation[1] is a good
source on the matter I discovered in this thread, thanks! But IMHO, for
now, it would be worthwhile to focus our resources on the modest goal of
implementing a reporting interface for utility commands. Sure it would be
nice to investigate how much the requirements of the two overlap.

> 
>> About pass 2, ie, lazy_index_vacuum() and
>> lazy_vacuum_heap(), I don't see how we can do better than reporting its
>> progress only after finishing all of it without any finer-grained
>> instrumentation. They are essentially block-box as far as the proposed
>> instrumentation approach is concerned. Being able to report progress per
>> index seems good but as a whole, a user would have to wait arbitrarily
>> long before numbers move forward. We might as well just report a bool
>> saying we're about to enter a potentially time-consuming index vacuum
>> round with possibly multiple indexes followed by lazy_vacuum_heap()
>> processing. Additionally, we can report the incremented count of the
>> vacuuming round (pass 2) once we are through it.
> 
> Another option is to provide the means for the index scan routines to
> report their progress. Maybe every index AM won't use it, but it'd
> certainly be a lot better than staring at a long_running boolean.

The boolean would be a workaround for sure. I'm also slightly tempted by
the idea of instrumenting vacuum scans of individual index AM's bulkdelete
methods. One precedent is how vacuum_delay_point() are sprinkled around in
the code. Another problem to solve would be to figure out how to pass
progress parameters around - via some struct or could they be globals just
like VacuumCost* variables are...

> 
>> Note that we can leave them out of
>> percent_done of overall vacuum progress. Until we have a good solution for
>> number (3) above, it seems to difficult to incorporate index pages into
>> overall progress.
> 
> IMHO we need to either put a big caution sign on any % estimate that it
> could be wildly off, or just forgo it completely for now. I'll bet that if
> we don't provide it some enterprising users will figure out the best way
> to do this (similar to how the bloat estimate query has evolved over time).
> 
> Even if we never get a % done indicator, just being able to see what
> 'position' a command is at will be very valuable.

Agreed. If we provide enough information in whatever view we choose to
expose, that would be a good start.

> 
>> As someone pointed out upthread, the final heap truncate phase can take
>> arbitrarily long and is outside the scope of lazy_scan_heap() to
>> instrument. Perhaps a bool, say, waiting_heap_trunc could be reported for
>> the same. Note that, it would have to be reported from lazy_vacuum_rel().
> 
> ISTM this is similar to the problem of reporting index status, namely that
> a progress reporting method needs to accept reports from multiple places
> in the code.

Yes.

Thanks,
Amit

[1] http://www.pgcon.org/2013/schedule/events/576.en.html




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Typo in file header comment of replorigindesc.c
Next
From: Greg Stark
Date:
Subject: Re: Using quicksort for every external sort run