Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id 20151210.163005.29167515.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
Hello,

At Thu, 10 Dec 2015 15:28:14 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqRNw=w4mt-W+gtq0ED0KTR=B8Qgu6D+4BN3XmzFRuAgXQ@mail.gmail.com>
> On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> >> Hm, I guess progress messages would not change that much over the course
> >> of a long-running command. How about we pass only the array(s) that we
> >> change and NULL for others:
> >>
> >> With the following prototype for pgstat_report_progress:
> >>
> >> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
> >>                            bool *message_param[], int num_message_param);
> >>
> >> If we just wanted to change, say scanned_heap_pages, then we report that
> >> using:
> >>
> >> uint32_param[1] = scanned_heap_pages;
> >> pgstat_report_progress(uint32_param, 3, NULL, 0);
> >>
> >> Also, pgstat_report_progress() should check each of its parameters for
> >> NULL before iterating over to copy. So in most reports over the course of
> >> a command, the message_param would be NULL and hence not copied.
> >
> > It's going to be *really* important that this facility provides a
> > lightweight way of updating progress, so I think this whole API is
> > badly designed.  VACUUM, for example, is going to want a way to update
> > the individual counter for the number of pages it's scanned after
> > every page.  It should not have to pass all of the other information
> > that is part of a complete report.  It should just be able to say
> > pgstat_report_progress_update_counter(1, pages_scanned) or something
> > of this sort.  Don't marshal all of the data and then make
> > pgstat_report_progress figure out what's changed.  Provide a series of
> > narrow APIs where the caller tells you exactly what they want to
> > update, and you update only that.  It's fine to have a
> > pgstat_report_progress() function to update everything at once, for
> > the use at the beginning of a command, but the incremental updates
> > within the command should do something lighter-weight.
> 
> [first time looking really at the patch and catching up with this thread]
> 
> Agreed. As far as I can guess from the code, those should be as
> followed to bloat pgstat message queue a minimum:
> 
> +               pgstat_report_activity_flag(ACTIVITY_IS_VACUUM);
>                 /*
>                  * Loop to process each selected relation.
>                  */
> Defining a new routine for this purpose is a bit surprising. Cannot we
> just use pgstat_report_activity with a new BackendState STATE_INVACUUM
> or similar if we pursue the progress tracking approach?

The author might want to know vacuum status *after* it has been
finished. But it is reset just after the end of a vacuum. One
concern is that BackendState adds new value for
pg_stat_activiry.state and it might confuse someone using it but
I don't see other issue on it.

> A couple of comments:
> - The relation OID should be reported and not its name. In case of a
> relation rename that would not be cool for tracking, and most users
> are surely going to join with other system tables using it.

+1

> - The progress tracking facility adds a whole level of complexity for
> very little gain, and IMO this should *not* be part of PgBackendStatus
> because in most cases its data finishes wasted. We don't expect
> backends to run frequently such progress reports, do we?

I strongly thought the same thing but I haven't came up with
better place for it to be stored. but,

>  My opinion on
> the matter if that we should define a different collector data for
> vacuum, with something like PgStat_StatVacuumEntry, then have on top
> of it a couple of routines dedicated at feeding up data with it when
> some work is done on a vacuum job.

+many. But I can't guess the appropriate number of the entry of
it, or suitable replacing policy on excesive number of
vacuums. Although sane users won't run vacuum on so many
backends.

> In short, it seems to me that this patch needs a rework, and should be
> returned with feedback. Other opinions?

This is important feature for DBAs so I agree with you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Speedup twophase transactions
Next
From: Andres Freund
Date:
Subject: Re: Error with index on unlogged table