Re: adding status for COPY progress report - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: adding status for COPY progress report
Date
Msg-id CAEze2Wg7-9NW_WPjPECVGcxp8SuxWxjzv8fT163=OdEHvHvT2A@mail.gmail.com
Whole thread Raw
In response to Re: adding status for COPY progress report  (Zhihong Yu <zyu@yugabyte.com>)
Responses Re: adding status for COPY progress report
List pgsql-hackers
On Tue, 24 May 2022 at 22:12, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>> On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > Please see attached for enhancement to COPY command progress.
>> >
>> > The added status column would allow users to get the status of the most recent COPY command.
>>
>> I fail to see the merit of retaining completed progress reporting
>> commands in their views after completion, other than making the
>> behaviour of the pg_stat_progress-views much more complicated and
>> adding overhead in places where we want the system to have as little
>> overhead as possible.
>>
>> Trying to get the status of a COPY command after it finished on a
>> different connection seems weird, as that other connection is likely
>> to have already disconnected / started another task. To be certain
>> that a backend can see the return status of the COPY command, you'd
>> have to be certain that the connection doesn't run any other
>> _progress-able commands in the following seconds / minutes, which
>> implies control over the connection, which means you already have
>> access to the resulting status of your COPY command.
>>
>> Regarding the patch: I really do not like that this leaks entries into
>> all _progress views: I get garbage data from e.g. the _create_index
>> and _copy views when VACUUM is running, etc, because you removed the
>> filter on cmdtype.
>> Also, the added fields in CopyToStateData / CopyFromStateData seem
>> useless when a pgstat_progress_update_param in the right place should
>> suffice.
>>
>> Kind regards,
>>
>> Matthias van de Meent
>
> Hi,
> For #2 above, can you let me know where the pgstat_progress_update_param() call(s) should be added ?
> In my patch, pgstat_progress_update_param() is called from error callback and EndCopy().

In the places that the patch currently sets cstate->status it could
instead directly call pgstat_progress_update_param(..., STATUS_VALUE).
I'm fairly certain that EndCopy is not called when the error callback
is called, so status reporting should not be overwritten when
unconditionally setting the status to OK in EndCopy.

> For #1, if I use param18 (which is not used by other views), would that be better ?

No:

         /*
-         * Report values for only those backends which are running the given
-         * command.
+         * Report values for only those backends which are running or have run.
          */
-        if (!beentry || beentry->st_progress_command != cmdtype)
+        if (!beentry || beentry->st_progress_command_target == InvalidOid)
             continue;

This change removes the filter that ensures that we only return the
backends which have a st_progress_command of the correct cmdtype (i.e.
for _progress_copy only those that have st_progress_command ==
PROGRESS_COMMAND_COPY. Without that filter, you'll return all backends
that have (or have had) their progress fields set at any point. Which
means that the expectation of "the backends returned by
pg_stat_get_progress_info are those running the requested command"
will be incorrect - you'll violate the contract / documented behaviour
of the function: "Returns command progress information for the named
command.".

The numerical index of the column thus doesn't matter, what matters is
that you want special behaviour for only the COPY progress reporting
that doesn't fit with the rest of the progress-reporting
infrastructure, and that the patch as-is breaks all progress reporting
views.

Sidenote: The change is also invalid because the rows that we expect
to return for pg_stat_progress_basebackup always have
st_progress_command_target == InvalidOid, so the backends running
BASE_BACKUP would never be visible with the change as-is. COPY (SELECT
stuff) also would not show up, because that too reports a
command_target of InvalidOid.

Either way, I don't think that this idea is worth pursuing: the
progress views are explicitly there to show the progress of currently
active backends, and not to show the last progress state of backends
that at some point ran a progress-reporting-enabled command.

Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: allow building trusted languages without the untrusted versions
Next
From: Zhihong Yu
Date:
Subject: Re: enable/disable broken for statement triggers on partitioned tables