Re: progress report for ANALYZE - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: progress report for ANALYZE
Date
Msg-id 0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp
Whole thread Raw
In response to Re: progress report for ANALYZE  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: progress report for ANALYZE  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Hi Horiguchi-san!


On 2019/07/11 19:56, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>
>> Hi Alvaro, Anthony, Julien and Robert,
>>
>> On 2019/07/09 3:47, Julien Rouhaud wrote:
>>> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>>
>>>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
>>>> <alvherre@2ndquadrant.com> wrote:
>>>>> Yeah, I got the impression that that was determined to be the
>>>>> desirable
>>>>> behavior, so I made it do that, but I'm not really happy about it
>>>>> either.  We're not too late to change the CREATE INDEX behavior, but
>>>>> let's discuss what is it that we want.
>>>>
>>>> I don't think I intended to make any such determination -- which
>>>> commit do you think established this as the canonical behavior?
>>>>
>>>> I propose that once a field is set, we should leave it set until the
>>>> end.
>>> +1
>>> Note that this patch is already behaving like that if the table only
>>> contains dead rows.
> 
> +1 from me.
> 
>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>> fixed. :)
>> ---------------------------------------------------------
>
> I have some comments.
> 
> +        const int   index[] = {
> +            PROGRESS_ANALYZE_PHASE,
> +            PROGRESS_ANALYZE_TOTAL_BLOCKS,
> +            PROGRESS_ANALYZE_BLOCKS_DONE
> +        };
> +        const int64 val[] = {
> +            PROGRESS_ANALYZE_PHASE_ANALYSIS,
> +            0, 0
> 
> Do you oppose to leaving the total/done blocks alone here:p?


Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)


Ah, I revised it to remove "0, 0".



> +  BlockNumber  nblocks;
> +  double    blksdone = 0;
> 
> Why is it a double even though blksdone is of the same domain
> with nblocks? And finally:
> 
> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> +                   ++blksdone);
> 
> It is converted into int64.


Fixed.
But is it suitable to use BlockNumber instead int64?


  
> +                      WHEN 2 THEN 'analyzing sample'
> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
> 
> I think we should avoid parenthesized phrases as far as
> not-necessary. That makes the column unnecessarily long. The
> phase is internally called as "compute stats" so *I* would prefer
> something like the followings:
> 
> +                      WHEN 2 THEN 'computing stats'
> +                      WHEN 3 THEN 'computing extended stats'
> 
> 
> 
> +                      WHEN 4 THEN 'analyzing complete'
> 
> And if you are intending by this that (as mentioned in the
> documentation) "working to complete this analyze", this would be:
> 
> +                      WHEN 4 THEN 'completing analyze'
> +                      WHEN 4 THEN 'finalizing analyze'


I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

   WHEN 2 THEN 'computing stats'
   WHEN 3 THEN 'computing extended stats'
   WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.



> +     <entry>Process ID of backend.</entry>
> 
> of "the" backend. ? And period is not attached on all
> descriptions consisting of a single sentence.
>
> +      <entry>OID of the database to which this backend is connected.</entry>
> +      <entry>Name of the database to which this backend is connected.</entry>
> 
> "database to which .. is connected" is phrased as "database this
> backend is connected to" in pg_stat_acttivity. (Just for consistency)


I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)



> +      <entry>Whether the current scan includes legacy inheritance children.</entry>
> 
> This apparently excludes partition tables but actually it is
> included.
>
>    "Whether scanning through child tables" ?
> 
> I'm not sure "child tables" is established as the word to mean
> both "partition tables and inheritance children"..


Hmm... I'm also not sure but I fixed as you suggested.



> +       The table being scanned (differs from <literal>relid</literal>
> +       only when processing partitions or inheritance children).
> 
> Is <literal> needed? And I think the parentheses are not needed.
> 
>    OID of the table currently being scanned. Can differ from relid
>    when analyzing tables that have child tables.


How about:
   OID of the table currently being scanned.
   It might be different from relid when analyzing tables that have child tables.



> +       Total number of heap blocks to scan in the current table.
> 
>     Number of heap blocks on scanning_table to scan?
> 
> It might be better that this description describes that this
> and the next column is meaningful only while the phase
> "scanning table".


How about:
   Total number of heap blocks in the scanning_table.




> +       The command is currently scanning the table.
> 
> "sample(s)" comes somewhat abruptly in the next item. Something
> like "The command is currently scanning the table
> <structfield>scanning_table</structfield> to obtain samples"
> might be better.


Fixed.

  
> +       <command>ANALYZE</command> is currently extracting statistical data
> +       from the sample obtained.
> 
> Something like "The command is computing stats from the samples
> obtained in the previous phase" might be better.


Fixed.


Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================


Thanks,
Tatsuro Yamada


Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Index Skip Scan
Next
From: David Rowley
Date:
Subject: Re: Tid scan improvements