Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id 56DD2ACE.5050208@lab.ntt.co.jp
Whole thread Raw
In response to Re: [PROPOSAL] VACUUM Progress Checker.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.
List pgsql-hackers
Horiguchi-san,

Thanks a lot for taking a look!

On 2016/03/07 13:02, Kyotaro HORIGUCHI wrote:
> At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote wrote:
>> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote <amitlangote09@gmail.com> wrote:
>>> So, I took the Vinayak's latest patch and rewrote it a little
>> ...
>>> I broke it into two:
>>>
>>> 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
>>> 0002-Implement-progress-reporting-for-VACUUM-command.patch
>>
>> Oops, unamended commit messages in those patches are misleading.  So,
>> please find attached corrected versions.
>
> The 0001-P.. adds the following interface functions.
>
> +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
> +extern void pgstat_progress_set_command_target(Oid objid);
> +extern void pgstat_progress_update_param(int index, uint32 val);
> +extern void pgstat_reset_local_progress(void);
> +extern int    pgstat_progress_get_num_param(BackendCommandType cmdtype);
>
> I don't like to treat the target object id differently from other
> parameters. It could not be needed at all, or could be needed two
> or more in contrast. Although oids are not guaranteed to fit
> uint32, we have already stored BlockNumber there.

I thought giving cmdtype and objid each its own slot would make things a
little bit clearer than stuffing them into st_progress_param[0] and
st_progress_param[1], respectively.  Is that what you are suggesting?
Although as I've don, a separate field st_command_objid may be a bit too much.

If they are not special fields, I think we don't need special interface
functions *set_command() and *set_command_target().  But I am still
inclined toward keeping the former.

> # I think that integer arrays might be needed to be passed as a
> # parameter, but it would be the another issue.

Didn't really think about it.  Maybe we should consider a scenario that
would require it.

> pg_stat_get_progress_info returns a tuple with 10 integer columns
> (plus an object id). The reason why I suggested use of an integer
> array is that it allows the API to serve arbitrary number of
> parmeters without a modification of API, and array indexes are
> coloreless than any concrete names. Howerver I don't stick to
> that if we agree that it is ok to have fixed number of paremters.

I think the fixed number of parameters in the form of a fixed-size array
is because st_progress_param[] is part of a shared memory structure as
discussed before.  Although such interface has been roughly modeled on how
pg_statistic catalog and pg_stats view or get_attstatsslot() function
work, shared memory structures take the place of the catalog, so there are
some restrictions (fixed size array being one).

Regarding index into st_progress_param[], pgstat.c/pgstatfuncs.c should
not bother what it is.  As exemplified in patch 0002, individual index
numbers can be defined as macros by individual command modules (suggested
by Robert recently) with certain convention for readability such as the
following in lazyvacuum.c:

#define PROG_PAR_VAC_RELID                     0
#define PROG_PAR_VAC_PHASE_ID                  1
#define PROG_PAR_VAC_HEAP_BLKS                 2
#define PROG_PAR_VAC_CUR_HEAP_BLK              3
... so on.

Then, to report a changed parameter:

pgstat_progress_update_param(PROG_PAR_VAC_PHASE_ID, LV_PHASE_SCAN_HEAP);
...
pgstat_progress_update_param(PROG_PAR_VAC_CUR_HEAP_BLK, blkno);

by the way, following is proargnames[] for pg_stat_get_progress_info():

cmdtype integer,
OUT pid integer,
OUT param1 integer,
OUT param2 integer,
...
OUT param10 integer

So, it is a responsibility of a command specific progress view definition
that it interprets values of param1..param10 appropriately.  In fact, the
implementer of the progress reporting for a command determines what goes
into which slot of st_progress_param[], to begin with.

> pgstat_progress_get_num_param looks not good in the aspect of
> genericity. I'd like to define it as an integer array by idexed
> by the command type if it is needed. However it seems to me to be
> enough that pg_stat_get_progress_info always returns 10 integers
> regardless of what the numbers are for. The user sql function,
> pg_stat_vacuum_progress as the first user, knows how many numbers
> should be read for its work. It reads zeroes safely even if it
> reads more than what the producer side offered (unless it tries
> to divide something with it).

Thinking a bit, perhaps we don't need num_param(cmdtpye) function or array
at all as you seem to suggest.  It serves no useful purpose now that I see
it. pg_stat_get_progress_info() should simply copy
st_progress_param[0...PG_STAT_GET_PROGRESS_COLS-1] to the result and view
definer knows what's what.

Attached updated patches which incorporate above mentioned changes.  If
Vinayak has something else in mind about anything, he can weigh in.

Thanks,
Amit

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: silent data loss with ext4 / all current versions
Next
From: Masahiko Sawada
Date:
Subject: Re: Support for N synchronous standby servers - take 2