Re: [HACKERS] ANALYZE command progress checker - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: [HACKERS] ANALYZE command progress checker
Date
Msg-id CAE9k0PmGhP4+Sa219AzwOghhQFerZirysDpp2p6Hem24Rcg+ZQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] ANALYZE command progress checker  (vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp>)
Responses Re: [HACKERS] ANALYZE command progress checker  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Re: [HACKERS] ANALYZE command progress checker  (vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp>)
List pgsql-hackers
Hi Vinayak,

Here are couple of review comments that may need your attention.

1. Firstly, I am seeing some trailing whitespace errors when trying to
apply your v3 patch using git apply command.

[ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
CREATE VIEW pg_stat_progress_analyze AS
pg_stat_progress_analyze_v3.patch:156: trailing whitespace.   SELECT
pg_stat_progress_analyze_v3.patch:157: trailing whitespace.       S.pid AS pid, S.datid AS datid, D.datname AS
datname,
pg_stat_progress_analyze_v3.patch:158: trailing whitespace.       S.relid AS relid,
pg_stat_progress_analyze_v3.patch:159: trailing whitespace.       CASE S.param1 WHEN 0 THEN 'initializing'
error: patch failed: doc/src/sgml/monitoring.sgml:521

2) The test-case 'rules' is failing.  I think it's failing because in
rules.sql 'ORDERBY viewname' is used which will put
'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
list of catalog views. You may need to correct rules.out file
accordingly. This is what i could see in rules.sql,

SELECT viewname, definition FROM pg_views WHERE schemaname <>
'information_schema' ORDER BY viewname;

I am still reviewing your patch and doing some testing. Will update if
i find any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Fri, Mar 17, 2017 at 3:16 PM, vinayak
<Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>
> On 2017/03/17 10:38, Robert Haas wrote:
>>
>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
>> <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>>>
>>> Thank you for reviewing the patch.
>>>
>>> The attached patch incorporated Michael and Amit comments also.
>>
>> I reviewed this tonight.
>
> Thank you for reviewing the patch.
>>
>> +        /* Report compute index stats phase */
>> +        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +
>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>>
>> Hmm, is there really any point to this?  And is it even accurate?  It
>> doesn't look to me like we are computing any index statistics here;
>> we're only allocating a few in-memory data structures here, I think,
>> which is not a statistics computation and probably doesn't take any
>> significant time.
>>
>> +        /* Report compute heap stats phase */
>> +        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +
>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>>
>> The phase you label as computing heap statistics also includes the
>> computation of index statistics.  I wouldn't bother separating the
>> computation of heap and index statistics; I'd just have a "compute
>> statistics" phase that begins right after the comment that starts
>> "Compute the statistics."
>
> Understood. Fixed in the attached patch.
>>
>>
>> +    /* Report that we are now doing index cleanup */
>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +                                PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>>
>> OK, this doesn't make any sense either.  We are not cleaning up the
>> indexes here.  We are just closing them and releasing resources.  I
>> don't see why you need this phase at all; it can't last more than some
>> tiny fraction of a second.  It seems like you're trying to copy the
>> exact same phases that exist for vacuum instead of thinking about what
>> makes sense for ANALYZE.
>
> Understood. I have removed this phase.
>>
>>
>> +        /* Report number of heap blocks scaned so far*/
>> +        pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
>> targblock);
>>
>> While I don't think this is necessarily a bad thing to report, I find
>> it very surprising that you're not reporting what seems to me like the
>> most useful thing here - namely, the number of blocks that will be
>> sampled in total and the number of those that you've selected.
>> Instead, you're just reporting the length of the relation and the
>> last-sampled block, which is a less-accurate guide to total progress.
>> There are enough counters to report both things, so maybe we should.
>>
>> +    /* Report total number of sample rows*/
>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS,
>> numrows);
>>
>> As an alternative to the previous paragraph, yet another thing you
>> could report is number of rows sampled out of total rows to be
>> sampled.  But this isn't the way to do it: you are reporting the
>> number of rows you sampled only once you've finished sampling.  The
>> point of progress reporting is to report progress -- that is, to
>> report this value as it's updated, not just to report it when ANALYZE
>> is practically done and the value has reached its maximum.
>
> Understood.
> I have reported number of rows sampled out of total rows to be sampled.
> I have not reported the number of blocks that will be sampled in total.
> So currect pg_stat_progress_analyze view looks like:
>
> =# \d+ pg_stat_progress_analyze
>                          View "pg_catalog.pg_stat_progress_analyze"
>          Column         |  Type   | Collation | Nullable | Default | Storage
> | Descripti
> on
> ------------------------+---------+-----------+----------+---------+----------+----------
> ---
>  pid                    | integer |           |          |         | plain
> |
>  datid                  | oid     |           |          |         | plain
> |
>  datname                | name    |           |          |         | plain
> |
>  relid                  | oid     |           |          |         | plain
> |
>  phase                  | text    |           |          |         |
> extended |
>  num_target_sample_rows | bigint  |           |          |         | plain
> |
>  num_rows_sampled       | bigint  |           |          |         | plain
> |
> View definition:
>  SELECT s.pid,
>     s.datid,
>     d.datname,
>     s.relid,
>         CASE s.param1
>             WHEN 0 THEN 'initializing'::text
>             WHEN 1 THEN 'collecting sample rows'::text
>             WHEN 2 THEN 'computing statistics'::text
>             ELSE NULL::text
>         END AS phase,
>     s.param2 AS num_target_sample_rows,
>     s.param3 AS num_rows_sampled
>    FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid,
> param1, param2, p
> aram3, param4, param5, param6, param7, param8, param9, param10)
>      LEFT JOIN pg_database d ON s.datid = d.oid;
>
> Comment?
>>
>> The documentation for this patch isn't very good, either.  You forgot
>> to update the part of the documentation that says that VACUUM is the
>> only command that currently supports progress reporting, and the
>> descriptions of the phases and progress counters are brief and not
>> entirely accurate - e.g. heap_blks_scanned is not actually the number
>> of heap blocks scanned, because we are sampling, not reading all the
>> blocks.
>
> Understood. I have updated the documentation.
> I will also try to improve documentation.
>>
>> When adding calls to the progress reporting functions, please consider
>> whether a blank line should be added before or after the new code, or
>> both, or neither.  I'd say some blank lines are needed in a few places
>> where you didn't add them.
>
> +1. I have added blank lines in a few places.
>
> Regards,
> Vinayak Pokale
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] wait events for disk I/O
Next
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] Logical replication existing data copy