Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id CAD21AoDT=rtXgyKdA-JLtpsDX8vxB6J51f2OEyqT8LKV57a7Xg@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.  (Rahila Syed <rahilasyed90@gmail.com>)
List pgsql-hackers
On Tue, Aug 11, 2015 at 1:50 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Masahiko Sawada wrote:
>
>> This topic may have been already discussed but, why don't we use just
>> total scanned pages and total pages?
>
> Because those numbers don't extrapolate nicely.  If the density of dead
> tuples is irregular across the table, such absolute numbers might be
> completely meaningless: you could scan 90% of the table without seeing
> any index scan, and then at the final 10% be hit by many index scans
> cleaning dead tuples.  Thus you would see progress go up to 90% very
> quickly and then take hours to have it go to 91%.  (Additionally, and a
> comparatively minor point: since you don't know how many index scans are
> going to happen, there's no way to know the total number of blocks
> scanned, unless you don't count index blocks at all, and then the
> numbers become a lie.)
> If you instead track number of heap pages separately from index pages,
> and indicate how many index scans have taken place, you have a chance of
> actually figuring out how many heap pages are left to scan and how many
> more index scans will occur.

Thank you for your explanation!
I understood about this.

> VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an unbounded number of phases. But that
assumesthat we don't count truncation as a 4th phase of VACUUM...
 

In case of vacuum, I think we need to track the number of scanned heap
pages at least, and the information about index scan is the additional
information.
The another idea for displaying progress is to have two kind of
information: essential information and additional information.

Essential information has one numeric data, which is stored
essentially information regarding of its processing.
Additional information has two data: text and numeric. These data is
free-style data which is stored by each backend as it like.
And these three data are output at same time.

For example, In case of vacuum, essential information is the number of
total scanned heap page.

* When lazy_scan_heap starts, the two additional data are NULL.

* When lazy_vacuum_index starts, the backend set additional data like
followings. - "Index vacuuming" into text data which describes what we're doing
now actually. - "50" into numeric data which describes how many index pages we scanned.

* And when lazy_vacuum_index is done, backend sets additional data NULL again.

Regards,

--
Masahiko Sawada



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: max_connections and standby server
Next
From: Michael Paquier
Date:
Subject: Re: max_connections and standby server