Re: [HACKERS] ANALYZE command progress checker - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] ANALYZE command progress checker |
Date | |
Msg-id | CA+TgmoaGCwYkk2xiPKnYNqnNQEDuM9L6KYWU8vTb-tVb6BZhaA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] ANALYZE command progress checker (vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] ANALYZE command progress checker
|
List | pgsql-hackers |
On Fri, Mar 10, 2017 at 2:46 AM, vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote: > Thank you for reviewing the patch. > > The attached patch incorporated Michael and Amit comments also. I reviewed this tonight. + /* Report compute index stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); Hmm, is there really any point to this? And is it even accurate? It doesn't look to me like we are computing any index statistics here; we're only allocating a few in-memory data structures here, I think, which is not a statistics computation and probably doesn't take any significant time. + /* Report compute heap stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); The phase you label as computing heap statistics also includes the computation of index statistics. I wouldn't bother separating the computation of heap and index statistics; I'd just have a "compute statistics" phase that begins right after the comment that starts "Compute the statistics." + /* Report that we are now doing index cleanup */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); OK, this doesn't make any sense either. We are not cleaning up the indexes here. We are just closing them and releasing resources. I don't see why you need this phase at all; it can't last more than some tiny fraction of a second. It seems like you're trying to copy the exact same phases that exist for vacuum instead of thinking about what makes sense for ANALYZE. + /* Report number of heap blocks scaned so far*/ + pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED, targblock); While I don't think this is necessarily a bad thing to report, I find it very surprising that you're not reporting what seems to me like the most useful thing here - namely, the number of blocks that will be sampled in total and the number of those that you've selected. Instead, you're just reporting the length of the relation and the last-sampled block, which is a less-accurate guide to total progress. There are enough counters to report both things, so maybe we should. + /* Report total number of sample rows*/ + pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows); As an alternative to the previous paragraph, yet another thing you could report is number of rows sampled out of total rows to be sampled. But this isn't the way to do it: you are reporting the number of rows you sampled only once you've finished sampling. The point of progress reporting is to report progress -- that is, to report this value as it's updated, not just to report it when ANALYZE is practically done and the value has reached its maximum. The documentation for this patch isn't very good, either. You forgot to update the part of the documentation that says that VACUUM is the only command that currently supports progress reporting, and the descriptions of the phases and progress counters are brief and not entirely accurate - e.g. heap_blks_scanned is not actually the number of heap blocks scanned, because we are sampling, not reading all the blocks. When adding calls to the progress reporting functions, please consider whether a blank line should be added before or after the new code, or both, or neither. I'd say some blank lines are needed in a few places where you didn't add them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: