Re: [HACKERS] ANALYZE command progress checker - Mailing list pgsql-hackers

From vinayak
Subject Re: [HACKERS] ANALYZE command progress checker
Date
Msg-id fc9354f1-87bb-67c3-9fc2-1fd8c5c6cdf0@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] ANALYZE command progress checker  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] ANALYZE command progress checker  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
On 2017/03/17 10:38, Robert Haas wrote:
> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
> <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>> Thank you for reviewing the patch.
>>
>> The attached patch incorporated Michael and Amit comments also.
> I reviewed this tonight.
Thank you for reviewing the patch.
> +        /* Report compute index stats phase */
> +        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> +
> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>
> Hmm, is there really any point to this?  And is it even accurate?  It
> doesn't look to me like we are computing any index statistics here;
> we're only allocating a few in-memory data structures here, I think,
> which is not a statistics computation and probably doesn't take any
> significant time.
>
> +        /* Report compute heap stats phase */
> +        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> +                                    PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>
> The phase you label as computing heap statistics also includes the
> computation of index statistics.  I wouldn't bother separating the
> computation of heap and index statistics; I'd just have a "compute
> statistics" phase that begins right after the comment that starts
> "Compute the statistics."
Understood. Fixed in the attached patch.
>
> +    /* Report that we are now doing index cleanup */
> +    pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> +                                PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>
> OK, this doesn't make any sense either.  We are not cleaning up the
> indexes here.  We are just closing them and releasing resources.  I
> don't see why you need this phase at all; it can't last more than some
> tiny fraction of a second.  It seems like you're trying to copy the
> exact same phases that exist for vacuum instead of thinking about what
> makes sense for ANALYZE.
Understood. I have removed this phase.
>
> +        /* Report number of heap blocks scaned so far*/
> +        pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
> targblock);
>
> While I don't think this is necessarily a bad thing to report, I find
> it very surprising that you're not reporting what seems to me like the
> most useful thing here - namely, the number of blocks that will be
> sampled in total and the number of those that you've selected.
> Instead, you're just reporting the length of the relation and the
> last-sampled block, which is a less-accurate guide to total progress.
> There are enough counters to report both things, so maybe we should.
>
> +    /* Report total number of sample rows*/
> +    pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);
>
> As an alternative to the previous paragraph, yet another thing you
> could report is number of rows sampled out of total rows to be
> sampled.  But this isn't the way to do it: you are reporting the
> number of rows you sampled only once you've finished sampling.  The
> point of progress reporting is to report progress -- that is, to
> report this value as it's updated, not just to report it when ANALYZE
> is practically done and the value has reached its maximum.
Understood.
I have reported number of rows sampled out of total rows to be sampled.
I have not reported the number of blocks that will be sampled in total.
So currect pg_stat_progress_analyze view looks like:

=# \d+ pg_stat_progress_analyze
                          View "pg_catalog.pg_stat_progress_analyze"
          Column         |  Type   | Collation | Nullable | Default | 
Storage  | Descripti
on
------------------------+---------+-----------+----------+---------+----------+----------
---
  pid                    | integer |           |          |         | 
plain    |
  datid                  | oid     |           |          |         | 
plain    |
  datname                | name    |           |          |         | 
plain    |
  relid                  | oid     |           |          |         | 
plain    |
  phase                  | text    |           |          |         | 
extended |
  num_target_sample_rows | bigint  |           |          |         | 
plain    |
  num_rows_sampled       | bigint  |           |          |         | 
plain    |
View definition:
  SELECT s.pid,
     s.datid,
     d.datname,
     s.relid,
         CASE s.param1
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'collecting sample rows'::text
             WHEN 2 THEN 'computing statistics'::text
             ELSE NULL::text
         END AS phase,
     s.param2 AS num_target_sample_rows,
     s.param3 AS num_rows_sampled
    FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, 
param1, param2, p
aram3, param4, param5, param6, param7, param8, param9, param10)
      LEFT JOIN pg_database d ON s.datid = d.oid;

Comment?
> The documentation for this patch isn't very good, either.  You forgot
> to update the part of the documentation that says that VACUUM is the
> only command that currently supports progress reporting, and the
> descriptions of the phases and progress counters are brief and not
> entirely accurate - e.g. heap_blks_scanned is not actually the number
> of heap blocks scanned, because we are sampling, not reading all the
> blocks.
Understood. I have updated the documentation.
I will also try to improve documentation.
> When adding calls to the progress reporting functions, please consider
> whether a blank line should be added before or after the new code, or
> both, or neither.  I'd say some blank lines are needed in a few places
> where you didn't add them.
+1. I have added blank lines in a few places.

Regards,
Vinayak Pokale

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: [HACKERS] Re: Authentication tests, and plain 'password' authentication with aSCRAM verifier
Next
From: Dave Page
Date:
Subject: Re: [HACKERS] Monitoring roles patch