Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id 76d84ded-c81c-98b1-6fa4-4551ff295008@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] CLUSTER command progress monitor
Re: [HACKERS] CLUSTER command progress monitor
List pgsql-hackers
On 2019/02/23 6:02, Robert Haas wrote:
> On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> This patch is rebased on HEAD.
>> I'll tackle revising the patch based on feedbacks next month.
> 
> +   Running <command>VACUUM FULL</command> is listed in
> <structname>pg_stat_progress_cluster</structname>
> +   view because it uses <command>CLUSTER</command> command
> internally.  See <xref linkend='cluster-progress-reporting'>.
> 
> It's not really true to say that VACUUM FULL uses the CLUSTER command
> internally.  It's not really true.  It uses a good chunk of the same
> infrastructure, but it certainly doesn't use the actual command, and
> it's not really the exact same thing either, because internally it's
> doing a sequential scan but no sort, which never happens with CLUSTER.
> I'm not sure exactly how to rephrase this, but I think we need to make
> it more precise.
> 
> One idea is that maybe we should try to think of a design that could
> also handle the rewriting variants of ALTER TABLE, and call it
> pg_stat_progress_rewrite.  Maybe that's moving the goalposts too far,
> but I'm not saying we'd necessarily have to do all the work now, just
> have a view that we think could also handle that.  Then again, maybe
> the needs are too different.
> 

Hmm..., I see.
If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of
the implementation.
For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that
it's tough work.


> +   Whenever <command>CLUSTER</command> is running, the
> +   <structname>pg_stat_progress_cluster</structname> view will contain
> +   one row for each backend that is currently clustering or vacuuming
> (VACUUM FULL).
> 
> That sentence contradicts itself.  Just say that it contains a row for
> each backend that is currently running CLUSTER or VACUUM FULL.
> 

Fixed.


> @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
>   void
>   cluster(ClusterStmt *stmt, bool isTopLevel)
>   {
> +
>    if (stmt->relation != NULL)
>    {
>    /* This is the single-relation case. */
> 
> Useless hunk.
> 

Fixed.


> @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
>    heap_close(rel, NoLock);
> 
>    /* Do the job. */
> + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
>    cluster_rel(tableOid, indexOid, stmt->options);
> + pgstat_progress_end_command();
>    }
>    else
>    {
> 
> It seems like that stuff should be inside cluster_rel().
>

Fixed.


> + /* Set reltuples to total_tuples */
> + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
> OldHeap->rd_rel->reltuples);
> 
> I object.  If the user wants that, they can get it from pg_class
> themselves via an SQL query.  It's also an estimate, not something we
> know to be accurate; I want us to only report facts here, not theories

I understand that progress monitor should only report facts, so I
removed that code.



> + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
> + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
> PROGRESS_CLUSTER_METHOD_INDEX_SCAN);
> 
> I think you should use pgstat_progress_update_multi_param() if
> updating multiple parameters at the same time.
> 
> Also, some lines in this patch, such as this one, are very long.
> Consider techniques to reduce the line length to 80 characters or
> less, such as inserting a line break between the two arguments.

Fixed.


> + /* Set scan_method to NULL */
> + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);
> 
> NULL and -1 are not the same thing.

Oops, fixed.


> I think that we shouldn't have both
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
> PROGRESS_CLUSTER_PHASE_SCAN_HEAP.  They're the same thing.  Let's just
> use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both.  Actually, better yet,
> let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
> PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
> PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP.  That seems noticeably
> simpler.

Fixed.


> I agree that it's acceptable to report
> PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
> PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
> understand why it's valuable to do so in the context of a progress
> indicator.

Actually, I'm not sure why I added it since so much time has passed. :(
So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least.


Attached patch is wip patch.


Thanks!
Tatsuro Yamada


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_partition_tree crashes for a non-defined relation
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum