Re: progress report for ANALYZE - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: progress report for ANALYZE
Date
Msg-id 374df3ef-be5c-6b7b-1ad9-e09d41640e67@nttcom.co.jp
Whole thread Raw
In response to Re: progress report for ANALYZE  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: progress report for ANALYZE  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,


On 2019/07/22 17:30, Kyotaro Horiguchi wrote:
> Hello.
> 
> # It's very good timing, as you came in while I have a time after
> # finishing a quite nerve-wrackig task..
> 
> At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
>>>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>>>> fixed. :)
>>>> ---------------------------------------------------------
>>>
>>> I have some comments.
>>> +        const int   index[] = {
>>> +            PROGRESS_ANALYZE_PHASE,
>>> +            PROGRESS_ANALYZE_TOTAL_BLOCKS,
>>> +            PROGRESS_ANALYZE_BLOCKS_DONE
>>> +        };
>>> +        const int64 val[] = {
>>> +            PROGRESS_ANALYZE_PHASE_ANALYSIS,
>>> +            0, 0
>>> Do you oppose to leaving the total/done blocks alone here:p?
>>
>>
>> Thanks for your comments!
>> I created a new patch based on your comments because Alvaro allowed me
>> to work on revising the patch. :)
>>
>>
>> Ah, I revised it to remove "0, 0".
> 
> Thanks. For a second I thought that
> PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
> renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.


"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:

./src/include/commands/progress.h

+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE              1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING                       2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE                        4

Is it Okay?

  
>>> +  BlockNumber  nblocks;
>>> +  double    blksdone = 0;
>>> Why is it a double even though blksdone is of the same domain
>>> with nblocks? And finally:
>>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
>>> +                   ++blksdone);
>>> It is converted into int64.
>>
>>
>> Fixed.
>> But is it suitable to use BlockNumber instead int64?
> 
> Yeah, I didn't meant that we should use int64 there. Sorry for
> the confusing comment. I meant that blksdone should be of
> BlockNumber.


Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.
  

>>> +                      WHEN 2 THEN 'analyzing sample'
>>> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
>>> I think we should avoid parenthesized phrases as far as
>>> not-necessary. That makes the column unnecessarily long. The
>>> phase is internally called as "compute stats" so *I* would prefer
>>> something like the followings:
>>> +                      WHEN 2 THEN 'computing stats'
>>> +                      WHEN 3 THEN 'computing extended stats'
>>> +                      WHEN 4 THEN 'analyzing complete'
>>> And if you are intending by this that (as mentioned in the
>>> documentation) "working to complete this analyze", this would be:
>>> +                      WHEN 4 THEN 'completing analyze'
>>> +                      WHEN 4 THEN 'finalizing analyze'
>>
>>
>> I have no strong opinion, so I changed the phase-names based on
>> your suggestions like following:
>>
>>    WHEN 2 THEN 'computing stats'
>>    WHEN 3 THEN 'computing extended stats'
>>    WHEN 4 THEN 'finalizing analyze'
>>
>> However, I'd like to get any comments from hackers to get a consensus
>> about the names.
> 
> Agreed. Especially such word choosing is not suitable for me..


To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)


  
>>> +     <entry>Process ID of backend.</entry>
>>> of "the" backend. ? And period is not attached on all
>>> descriptions consisting of a single sentence.
>>>
>>> + <entry>OID of the database to which this backend is
>>> connected.</entry>
>>> + <entry>Name of the database to which this backend is
>>> connected.</entry>
>>> "database to which .. is connected" is phrased as "database this
>>> backend is connected to" in pg_stat_acttivity. (Just for consistency)
>>
>>
>> I checked the sentences on other views of progress monitor (VACUUM,
>> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
>> I'd like to keep it as is. :)
> 
> Oh, I see from where the wordings came. But no periods seen after
> sentense when it is the only one in a description in other system
> views tables. I think the progress views tables should be
> corrected following convention.


Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.

  
>>> + <entry>Whether the current scan includes legacy inheritance
>>> children.</entry>
>>> This apparently excludes partition tables but actually it is
>>> included.
>>>
>>>     "Whether scanning through child tables" ?
>>> I'm not sure "child tables" is established as the word to mean
>>> both "partition tables and inheritance children"..
>>
>>
>> Hmm... I'm also not sure but I fixed as you suggested.
> 
> Yeah, I also am not sure the suggestion is good enough as is..
>
>>> +       Total number of heap blocks to scan in the current table.
>>>      Number of heap blocks on scanning_table to scan?
>>> It might be better that this description describes that this
>>> and the next column is meaningful only while the phase
>>> "scanning table".
>>
>>
>> How about:
>>    Total number of heap blocks in the scanning_table.
> 
> (For me, ) that seems like it shows blocks including all
> descendents for inheritance parent. But I'm not sure..a


In the case of scanning_table is parent table, it doesn't
show the number. However, child tables shows the number.
I tested it using Declarative partitioning table, See the bottom
of this email. :)


>> Please find attached patch. :)
>>
>> Here is a test result of the patch.
>> ==========================================================
>> # select * from pg_stat_progress_analyze ; \watch 0.0001;
>>
>> 9067|13599|postgres|16387|f|16387|scanning table|443|14
>> ...
>> 9067|13599|postgres|16387|f|16387|scanning table|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
>> ==========================================================
> 
> Looks fine!


I shared a test result using Declarative partitioning table.

==========================================================
## Create partitioning table
create table hoge as select a from generate_series(0, 40000) a;

create table hoge2(
   a integer
) partition by range(a);

create table hoge2_10000 partition of hoge2
for values from (0) to (10000);

create table hoge2_20000 partition of hoge2
for values from (10000) to (20000);

create table hoge2_30000 partition of hoge2
for values from (20000) to (30000);

create table hoge2_default partition of hoge2 default;


## Test
select oid,relname,relpages,reltuples from pg_class where relname like 'hoge%';

   oid  |    relname    | relpages | reltuples
-------+---------------+----------+-----------
  16538 | hoge          |      177 |     40001
  16541 | hoge2         |        0 |         0
  16544 | hoge2_10000   |       45 |     10000
  16547 | hoge2_20000   |       45 |     10000
  16550 | hoge2_30000   |       45 |     10000
  16553 | hoge2_default |       45 |     10001
(6 rows)

select * from pg_stat_progress_analyze ; \watch 0.00001;

27579|13599|postgres|16541|t|16544|scanning table|45|17
27579|13599|postgres|16541|t|16544|scanning table|45|38
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45

27579|13599|postgres|16541|t|16547|scanning table|45|17
27579|13599|postgres|16541|t|16547|scanning table|45|37
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45

27579|13599|postgres|16541|t|16550|scanning table|45|9
27579|13599|postgres|16541|t|16550|scanning table|45|30
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45

27579|13599|postgres|16541|t|16553|scanning table|45|5
27579|13599|postgres|16541|t|16553|scanning table|45|26
27579|13599|postgres|16541|t|16553|scanning table|45|42
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|finalizing analyze|45|45

27579|13599|postgres|16544|f|16544|scanning table|45|1
27579|13599|postgres|16544|f|16544|scanning table|45|30
27579|13599|postgres|16544|f|16544|computing stats|45|45

27579|13599|postgres|16547|f|16547|scanning table|45|25
27579|13599|postgres|16547|f|16547|computing stats|45|45

27579|13599|postgres|16550|f|16550|scanning table|45|11
27579|13599|postgres|16550|f|16550|scanning table|45|38
27579|13599|postgres|16550|f|16550|finalizing analyze|45|45

27579|13599|postgres|16553|f|16553|scanning table|45|25
27579|13599|postgres|16553|f|16553|computing stats|45|45
==========================================================

I'll share test result using partitioning table via
Inheritance tables on next email. :)

Thanks,
Tatsuro Yamada


Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Introduce timeout capability for ConditionVariableSleep
Next
From: Michael Paquier
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb