Home > mailing lists
Re: progress report for ANALYZE - Mailing list pgsql-hackers

From	Tatsuro Yamada
Subject	Re: progress report for ANALYZE
Date	August 13, 2019 05:33:34
Msg-id	73116c0a-bdc7-7c1d-b8da-a1e6040d015a@nttcom.co.jp_1 Whole thread Raw
In response to	Re: progress report for ANALYZE (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers
Tree view
Hi Robert and All!


On 2019/08/02 2:48, Robert Haas wrote:> On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
>> <tatsuro.yamada.tf@nttcom.co.jp> wrote:
>>> Attached v4 patch file only includes this fix.
>>
>> I've moved this to the September CF, where it is in "Needs review" state.
> 
> /me reviews.


Thanks for your comments! :)


> +      <entry><structfield>scanning_table</structfield></entry>
> 
> I think this should be retitled to something that ends in 'relid',
> like all of the corresponding cases in existing progress views.
> Perhaps 'active_relid' or 'current_relid'.


Fixed.
I changed "scanning_table" to "current_relid" for analyze in monitoring.sgml.
However, I didn't change "relid" on other places for other commands because
I'd like to create other patch for that later. :)


  > +       The command is computing extended stats from the samples
> obtained in the previous phase.
> 
> I think you should change this (and the previous one) to say "from the
> samples obtained during the table scan."


Fixed.


> +       Total number of heap blocks in the scanning_table.
> 
> Perhaps I'm confused, but it looks to me like what you are advertising
> is the number of blocks that will be sampled, not the total number of
> blocks in the table.  I think that's the right thing to advertise, but
> then the column should be named and documented that way.


Ah, you are right. Fixed.
I used the following sentence based on Vinayak's patch created two years a go.

-     <entry><structfield>heap_blks_total</structfield></entry>
-     <entry><type>bigint</type></entry>
-     <entry>
-       Total number of heap blocks in the current_relid.
-     </entry>

+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+</entry>



> + {
> + const int   index[] = {
> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> + PROGRESS_ANALYZE_SCANREL
> + };
> + const int64 val[] = {
> + nblocks,
> + RelationGetRelid(onerel)
> + };
> +
> + pgstat_progress_update_multi_param(2, index, val);
> + }
> 
> This block seems to be introduced just so you can declare variables; I
> don't think that's good style. It's arguably unnecessary because we
> now are selectively allowing variable declarations within functions,
> but I think you should just move the first array to the top of the
> function and the second declaration to the top of the function
> dropping const, and then just do val[0] = nblocks and val[1] =
> RelationGetRelid(onerel).  Maybe you can also come up with better
> names than 'index' and 'val'.  Same comment applies to another place
> where you have something similar.


I agreed and fixed.


> Patch seems to need minor rebasing.
> 
> Maybe "scanning table" should be renamed "acquiring sample rows," to
> match the names used in the code?


I fixed as following:

s/PROGRESS_ANALYZE_PHASE_SCAN_TABLE/
   PROGRESS_ANALYZE_PHASE_ACQUIRING_SAMPLE_ROWS/

s/WHEN 1 THEN 'scanning table'::text/
   WHEN 1 THEN 'acquiring sample rows'::text/


> I'm not a fan of the way you set the scan-table phase and inh flag in
> one place, and then slightly later set the relation OID and block
> count.  That creates a race during which users could see the first bit
> of data set and the second not set.  I don't see any reason not to set
> all four fields together.


Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas? :)


# do_analyze_rel()
     ...
     if (inh)
         numrows = acquire_inherited_sample_rows(onerel, elevel,
                                                 rows, targrows,
                                                 &totalrows, &totaldeadrows);
     else
         numrows = (*acquirefunc) (onerel, elevel,
                                   rows, targrows,
                                   &totalrows, &totaldeadrows);


# acquire_inherited_sample_rows()
     ...
     foreach(lc, tableOIDs)
     {
     ...
        /* Check table type (MATVIEW can't happen, but might as well allow) */
         if (childrel->rd_rel->relkind == RELKIND_RELATION ||
             childrel->rd_rel->relkind == RELKIND_MATVIEW)
         {
             /* Regular table, so use the regular row acquisition function */
             acquirefunc = acquire_sample_rows;
     ...
         /* OK, we'll process this child */
         has_child = true;
         rels[nrels] = childrel;
         acquirefuncs[nrels] = acquirefunc;
     ...
     }
     ...
     for (i = 0; i < nrels; i++)
     {
     ...
         AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
     ...
             if (childtargrows > 0)
             {
     ...
                 /* Fetch a random sample of the child's rows */
                 childrows = (*acquirefunc) (childrel, elevel,
                                             rows + numrows, childtargrows,
                                             &trows, &tdrows)


> Please be sure to make the names of the constants you use match up
> with the external names as far as it reasonably makes sense, e.g.
> 
> +#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE       1
> +#define PROGRESS_ANALYZE_PHASE_COMPUTING            2
> +#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
> +#define PROGRESS_ANALYZE_PHASE_FINALIZE         4
> 
> vs.
> 
> +            WHEN 0 THEN 'initializing'::text
> +            WHEN 1 THEN 'scanning table'::text
> +            WHEN 2 THEN 'computing stats'::text
> +            WHEN 3 THEN 'computing extended stats'::text
> +            WHEN 4 THEN 'finalizing analyze'::text
> 
> Not terrible, but it could be closer.


Agreed.
How about these?:

#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS 1  <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS       2  <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS   3  <- fixed
#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE    4  <- fixed

vs.

WHEN 1 THEN 'acquiring sample rows'::text
WHEN 2 THEN 'computing stats'::text
WHEN 3 THEN 'computing extended stats'::text
WHEN 4 THEN 'finalizing analyze'::text

I revised the name of the constants, so the constants and the phase
names are closer than before. Also, I used Verb instead Gerund
because other phase names used Verb such as VACUUM. :)


> Similarly with the column names (include_children vs. INH).


Fixed.
I selected "include_children" as the column name because it's
easy to understand than "INH".

s/PROGRESS_ANALYZE_INH/
   PROGRESS_ANALYZE_INCLUDE_CHILDREN/


Please find attached file. :)


Thanks,
Tatsuro Yamada
Attachment

v5-Report-progress-for-ANALYZE.patch
pgsql-hackers by date:
From: Dilip Kumar
Date: 13 August 2019, 05:22:05
Subject: Re: POC: Cleaning up orphaned files using undo logs
From: Craig Ringer
Date: 13 August 2019, 05:34:37
Subject: Re: Global temporary tables
Re: progress report for ANALYZE - Mailing list pgsql-hackers

Attachment

Previous

Next