Re: [HACKERS] ANALYZE command progress checker - Mailing list pgsql-hackers

From vinayak
Subject Re: [HACKERS] ANALYZE command progress checker
Date
Msg-id a31277a9-2bfe-e966-c574-715302521162@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] ANALYZE command progress checker  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Responses Re: ANALYZE command progress checker  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On 2017/03/23 15:04, Haribabu Kommi wrote:


On Wed, Mar 22, 2017 at 8:11 PM, vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:


On 2017/03/21 21:25, Haribabu Kommi wrote:


On Tue, Mar 21, 2017 at 3:41 PM, vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
Thank you for testing the patch on Windows platform.


Thanks for the updated patch.

It works good for a normal relation.  But for a relation that contains child tables,
the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.

Thank you for reviewing the patch.
The attached patch implements a way to report sample rows count from acquire_sample_rows() even if called for child tables.
How about adding another phase called PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze operation. And adding
some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze progress of
relation that contains child relations also.
I have added the phase called PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS.
I have also updated the documentation.

Thanks for the updated patch. I will check it.
 
The ANALYZE command takes long time in computing statistics phase.So I think we can add some column or phase so that user can easily understand the progress.
How about adding new column like "num_rows_processed" will compute the statistics of specified column?

I prefer a column with rows processed instead of a phase. 
Because we already set the phase of compute stats and showing
the progress there will number of rows processed will be good.
 
How about separate the computing "inheritance statistics" phase from the computing regular "single table" statistics.
Comment?

Yes, this will be good to show both that states of inheritance of sampled rows and 
compute inheritance stats, so that it will be clearly visible to the user the current
status.
I have updated the patch.
I have added column "num_rows_processed" and phase "computing inherited statistics" in the view.

=# \d+ pg_stat_progress_analyze
                         View "pg_catalog.pg_stat_progress_analyze"
         Column         |  Type   | Collation | Nullable | Default | Storage  | Description
------------------------+---------+-----------+----------+---------+----------+-------------
 pid                    | integer |           |          |         | plain    |
 datid                  | oid     |           |          |         | plain    |
 datname                | name    |           |          |         | plain    |
 relid                  | oid     |           |          |         | plain    |
 phase                  | text    |           |          |         | extended |
 num_target_sample_rows | bigint  |           |          |         | plain    |
 num_rows_sampled       | bigint  |           |          |         | plain    |
 num_rows_processed     | bigint  |           |          |         | plain    |
View definition:
 SELECT s.pid,
    s.datid,
    d.datname,
    s.relid,
        CASE s.param1
            WHEN 0 THEN 'initializing'::text
            WHEN 1 THEN 'collecting sample rows'::text
            WHEN 2 THEN 'collecting inherited sample rows'::text
            WHEN 3 THEN 'computing statistics'::text
            WHEN 4 THEN 'computing inherited statistics'::text
            ELSE NULL::text
        END AS phase,
    s.param2 AS num_target_sample_rows,
    s.param3 AS num_rows_sampled,
    s.param4 AS num_rows_processed
   FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10)
     LEFT JOIN pg_database d ON s.datid = d.oid;

Regards,
Vinayak Pokale
NTT Open Source Software Center
Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
Next
From: Rajkumar Raghuwanshi
Date:
Subject: Re: [HACKERS] Parallel Append implementation