Re: progress report for ANALYZE - Mailing list pgsql-hackers
From | Tatsuro Yamada |
---|---|
Subject | Re: progress report for ANALYZE |
Date | |
Msg-id | 73116c0a-bdc7-7c1d-b8da-a1e6040d015a@nttcom.co.jp_1 Whole thread Raw |
In response to | Re: progress report for ANALYZE (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Hi Robert and All! On 2019/08/02 2:48, Robert Haas wrote:> On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada >> <tatsuro.yamada.tf@nttcom.co.jp> wrote: >>> Attached v4 patch file only includes this fix. >> >> I've moved this to the September CF, where it is in "Needs review" state. > > /me reviews. Thanks for your comments! :) > + <entry><structfield>scanning_table</structfield></entry> > > I think this should be retitled to something that ends in 'relid', > like all of the corresponding cases in existing progress views. > Perhaps 'active_relid' or 'current_relid'. Fixed. I changed "scanning_table" to "current_relid" for analyze in monitoring.sgml. However, I didn't change "relid" on other places for other commands because I'd like to create other patch for that later. :) > + The command is computing extended stats from the samples > obtained in the previous phase. > > I think you should change this (and the previous one) to say "from the > samples obtained during the table scan." Fixed. > + Total number of heap blocks in the scanning_table. > > Perhaps I'm confused, but it looks to me like what you are advertising > is the number of blocks that will be sampled, not the total number of > blocks in the table. I think that's the right thing to advertise, but > then the column should be named and documented that way. Ah, you are right. Fixed. I used the following sentence based on Vinayak's patch created two years a go. - <entry><structfield>heap_blks_total</structfield></entry> - <entry><type>bigint</type></entry> - <entry> - Total number of heap blocks in the current_relid. - </entry> + <entry><structfield>sample_blks_total</structfield></entry> + <entry><type>bigint</></entry> + <entry> + Total number of heap blocks that will be sampled. +</entry> > + { > + const int index[] = { > + PROGRESS_ANALYZE_TOTAL_BLOCKS, > + PROGRESS_ANALYZE_SCANREL > + }; > + const int64 val[] = { > + nblocks, > + RelationGetRelid(onerel) > + }; > + > + pgstat_progress_update_multi_param(2, index, val); > + } > > This block seems to be introduced just so you can declare variables; I > don't think that's good style. It's arguably unnecessary because we > now are selectively allowing variable declarations within functions, > but I think you should just move the first array to the top of the > function and the second declaration to the top of the function > dropping const, and then just do val[0] = nblocks and val[1] = > RelationGetRelid(onerel). Maybe you can also come up with better > names than 'index' and 'val'. Same comment applies to another place > where you have something similar. I agreed and fixed. > Patch seems to need minor rebasing. > > Maybe "scanning table" should be renamed "acquiring sample rows," to > match the names used in the code? I fixed as following: s/PROGRESS_ANALYZE_PHASE_SCAN_TABLE/ PROGRESS_ANALYZE_PHASE_ACQUIRING_SAMPLE_ROWS/ s/WHEN 1 THEN 'scanning table'::text/ WHEN 1 THEN 'acquiring sample rows'::text/ > I'm not a fan of the way you set the scan-table phase and inh flag in > one place, and then slightly later set the relation OID and block > count. That creates a race during which users could see the first bit > of data set and the second not set. I don't see any reason not to set > all four fields together. Hmm... I understand but it's little difficult because if there are child rels, acquire_inherited_sample_rows() calls acquire_sample_rows() (See below). So, it is able to set all four fields together if inh flag is given as a parameter of those functions, I suppose. But I'm not sure whether it's okay to add the parameter to both functions or not. Do you have any ideas? :) # do_analyze_rel() ... if (inh) numrows = acquire_inherited_sample_rows(onerel, elevel, rows, targrows, &totalrows, &totaldeadrows); else numrows = (*acquirefunc) (onerel, elevel, rows, targrows, &totalrows, &totaldeadrows); # acquire_inherited_sample_rows() ... foreach(lc, tableOIDs) { ... /* Check table type (MATVIEW can't happen, but might as well allow) */ if (childrel->rd_rel->relkind == RELKIND_RELATION || childrel->rd_rel->relkind == RELKIND_MATVIEW) { /* Regular table, so use the regular row acquisition function */ acquirefunc = acquire_sample_rows; ... /* OK, we'll process this child */ has_child = true; rels[nrels] = childrel; acquirefuncs[nrels] = acquirefunc; ... } ... for (i = 0; i < nrels; i++) { ... AcquireSampleRowsFunc acquirefunc = acquirefuncs[i]; ... if (childtargrows > 0) { ... /* Fetch a random sample of the child's rows */ childrows = (*acquirefunc) (childrel, elevel, rows + numrows, childtargrows, &trows, &tdrows) > Please be sure to make the names of the constants you use match up > with the external names as far as it reasonably makes sense, e.g. > > +#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE 1 > +#define PROGRESS_ANALYZE_PHASE_COMPUTING 2 > +#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3 > +#define PROGRESS_ANALYZE_PHASE_FINALIZE 4 > > vs. > > + WHEN 0 THEN 'initializing'::text > + WHEN 1 THEN 'scanning table'::text > + WHEN 2 THEN 'computing stats'::text > + WHEN 3 THEN 'computing extended stats'::text > + WHEN 4 THEN 'finalizing analyze'::text > > Not terrible, but it could be closer. Agreed. How about these?: #define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS 1 <- fixed #define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS 2 <- fixed #define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS 3 <- fixed #define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE 4 <- fixed vs. WHEN 1 THEN 'acquiring sample rows'::text WHEN 2 THEN 'computing stats'::text WHEN 3 THEN 'computing extended stats'::text WHEN 4 THEN 'finalizing analyze'::text I revised the name of the constants, so the constants and the phase names are closer than before. Also, I used Verb instead Gerund because other phase names used Verb such as VACUUM. :) > Similarly with the column names (include_children vs. INH). Fixed. I selected "include_children" as the column name because it's easy to understand than "INH". s/PROGRESS_ANALYZE_INH/ PROGRESS_ANALYZE_INCLUDE_CHILDREN/ Please find attached file. :) Thanks, Tatsuro Yamada
Attachment
pgsql-hackers by date: