Re: progress report for ANALYZE - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: progress report for ANALYZE
Date
Msg-id 20190722.173039.59246943.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: progress report for ANALYZE  (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>)
Responses Re: progress report for ANALYZE  (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>)
List pgsql-hackers
Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
> >> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> >> fixed. :)
> >> ---------------------------------------------------------
> >
> > I have some comments.
> > +        const int   index[] = {
> > +            PROGRESS_ANALYZE_PHASE,
> > +            PROGRESS_ANALYZE_TOTAL_BLOCKS,
> > +            PROGRESS_ANALYZE_BLOCKS_DONE
> > +        };
> > +        const int64 val[] = {
> > +            PROGRESS_ANALYZE_PHASE_ANALYSIS,
> > +            0, 0
> > Do you oppose to leaving the total/done blocks alone here:p?
> 
> 
> Thanks for your comments!
> I created a new patch based on your comments because Alvaro allowed me
> to work on revising the patch. :)
> 
> 
> Ah, I revised it to remove "0, 0".

Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

> > +  BlockNumber  nblocks;
> > +  double    blksdone = 0;
> > Why is it a double even though blksdone is of the same domain
> > with nblocks? And finally:
> > +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> > +                   ++blksdone);
> > It is converted into int64.
> 
> 
> Fixed.
> But is it suitable to use BlockNumber instead int64?

Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

> > +                      WHEN 2 THEN 'analyzing sample'
> > +                      WHEN 3 THEN 'analyzing sample (extended stats)'
> > I think we should avoid parenthesized phrases as far as
> > not-necessary. That makes the column unnecessarily long. The
> > phase is internally called as "compute stats" so *I* would prefer
> > something like the followings:
> > +                      WHEN 2 THEN 'computing stats'
> > +                      WHEN 3 THEN 'computing extended stats'
> > +                      WHEN 4 THEN 'analyzing complete'
> > And if you are intending by this that (as mentioned in the
> > documentation) "working to complete this analyze", this would be:
> > +                      WHEN 4 THEN 'completing analyze'
> > +                      WHEN 4 THEN 'finalizing analyze'
> 
> 
> I have no strong opinion, so I changed the phase-names based on
> your suggestions like following:
> 
>   WHEN 2 THEN 'computing stats'
>   WHEN 3 THEN 'computing extended stats'
>   WHEN 4 THEN 'finalizing analyze'
> 
> However, I'd like to get any comments from hackers to get a consensus
> about the names.

Agreed. Especially such word choosing is not suitable for me..

> > +     <entry>Process ID of backend.</entry>
> > of "the" backend. ? And period is not attached on all
> > descriptions consisting of a single sentence.
> >
> > + <entry>OID of the database to which this backend is
> > connected.</entry>
> > + <entry>Name of the database to which this backend is
> > connected.</entry>
> > "database to which .. is connected" is phrased as "database this
> > backend is connected to" in pg_stat_acttivity. (Just for consistency)
> 
> 
> I checked the sentences on other views of progress monitor (VACUUM,
> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
> I'd like to keep it as is. :)

Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

> > + <entry>Whether the current scan includes legacy inheritance
> > children.</entry>
> > This apparently excludes partition tables but actually it is
> > included.
> >
> >    "Whether scanning through child tables" ?
> > I'm not sure "child tables" is established as the word to mean
> > both "partition tables and inheritance children"..
> 
> 
> Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..


> > +       The table being scanned (differs from <literal>relid</literal>
> > +       only when processing partitions or inheritance children).
> > Is <literal> needed? And I think the parentheses are not needed.
> >    OID of the table currently being scanned. Can differ from relid
> >    when analyzing tables that have child tables.
> 
> 
> How about:
>   OID of the table currently being scanned.
>   It might be different from relid when analyzing tables that have child
>   tables.
> 
> 
> 
> > +       Total number of heap blocks to scan in the current table.
> >     Number of heap blocks on scanning_table to scan?
> > It might be better that this description describes that this
> > and the next column is meaningful only while the phase
> > "scanning table".
> 
> 
> How about:
>   Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a

> > +       The command is currently scanning the table.
> > "sample(s)" comes somewhat abruptly in the next item. Something
> > like "The command is currently scanning the table
> > <structfield>scanning_table</structfield> to obtain samples"
> > might be better.
> 
> 
> Fixed.
> 
>  
> > + <command>ANALYZE</command> is currently extracting statistical data
> > +       from the sample obtained.
> > Something like "The command is computing stats from the samples
> > obtained in the previous phase" might be better.
> 
> 
> Fixed.
> 
> 
> Please find attached patch. :)
> 
> Here is a test result of the patch.
> ==========================================================
> # select * from pg_stat_progress_analyze ; \watch 0.0001;
> 
> 9067|13599|postgres|16387|f|16387|scanning table|443|14
> 9067|13599|postgres|16387|f|16387|scanning table|443|44
> 9067|13599|postgres|16387|f|16387|scanning table|443|76
> 9067|13599|postgres|16387|f|16387|scanning table|443|100
> ...
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
> ==========================================================

Looks fine!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Memory Accounting
Next
From: David Rowley
Date:
Subject: Re: POC: converting Lists into arrays