Re: progress report for ANALYZE - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: progress report for ANALYZE
Date
Msg-id 20190711.195610.175868438.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: progress report for ANALYZE  (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>)
Responses Re: progress report for ANALYZE  (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>)
List pgsql-hackers
Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>
> Hi Alvaro, Anthony, Julien and Robert,
> 
> On 2019/07/09 3:47, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
> > wrote:
> >>
> >> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
> >> <alvherre@2ndquadrant.com> wrote:
> >>> Yeah, I got the impression that that was determined to be the
> >>> desirable
> >>> behavior, so I made it do that, but I'm not really happy about it
> >>> either.  We're not too late to change the CREATE INDEX behavior, but
> >>> let's discuss what is it that we want.
> >>
> >> I don't think I intended to make any such determination -- which
> >> commit do you think established this as the canonical behavior?
> >>
> >> I propose that once a field is set, we should leave it set until the
> >> end.
> > +1
> > Note that this patch is already behaving like that if the table only
> > contains dead rows.

+1 from me.

> I fixed the patch including:
> 
>   - Replace "if" to "else if". (Suggested by Julien)
>   - Fix typo s/ech/each/. (Suggested by Anthony)
>   - Add Phase "analyzing complete" in the pgstat view. (Suggested by
>   - Julien, Robert and me)
>     It was overlooked to add it in system_views.sql.
> 
> I share my re-test result, see below:
> 
> ---------------------------------------------------------
> [Session #1]
> create table hoge as select * from generate_series(1, 1000000) a;
> analyze verbose hoge;
> 
> [Session #2]
> \a \t
> select * from pg_stat_progress_analyze; \watch 0.001
> 
> 3785|13599|postgres|16384|f|16384|scanning table|4425|6
> 3785|13599|postgres|16384|f|16384|scanning table|4425|31
> 3785|13599|postgres|16384|f|16384|scanning table|4425|70
> 3785|13599|postgres|16384|f|16384|scanning table|4425|109
> ...
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|analyzing sample|0|0
> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> fixed. :)
> ---------------------------------------------------------

I have some comments.

+        const int   index[] = {
+            PROGRESS_ANALYZE_PHASE,
+            PROGRESS_ANALYZE_TOTAL_BLOCKS,
+            PROGRESS_ANALYZE_BLOCKS_DONE
+        };
+        const int64 val[] = {
+            PROGRESS_ANALYZE_PHASE_ANALYSIS,
+            0, 0

Do you oppose to leaving the total/done blocks alone here:p?



+  BlockNumber  nblocks;
+  double    blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);

It is converted into int64.



+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'



+                      WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'


+     <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+      <entry>OID of the database to which this backend is connected.</entry>
+      <entry>Name of the database to which this backend is connected.</entry>

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)


+      <entry>Whether the current scan includes legacy inheritance children.</entry>

This apparently excludes partition tables but actually it is
included.

  "Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..


+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).

Is <literal> needed? And I think the parentheses are not needed.

  OID of the table currently being scanned. Can differ from relid
  when analyzing tables that have child tables.


+       Total number of heap blocks to scan in the current table.

   Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".


+       The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.


+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.

Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.


regards.

- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: [HACKERS] WIP: Aggregation push-down
Next
From: Sergei Kornilov
Date:
Subject: Re: base backup client as auxiliary backend process