Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
Date
Msg-id CAA4eK1+ZT925fMnJ_XmfYg2nfwk=aMhJmk_TBAR2HyMJFcx-0Q@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Adding Peter G.

On Wed, Apr 1, 2020 at 12:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have done some testing for the parallel "create index".
>
> postgres[99536]=# show maintenance_work_mem ;
>  maintenance_work_mem
> ----------------------
>  1MB
> (1 row)
>
> CREATE TABLE test (a int, b int);
> INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
> CREATE INDEX idx1 on test(a);
> select query, total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query like 'CREATE INDEX%';
>
>
> SET max_parallel_maintenance_workers TO 0;
>             query             |     total_time     | shared_blks_hit |
> shared_blks_read | total_read_blks | shared_blks_dirtied |
> shared_blks_written
>
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
>  CREATE INDEX idx1 on test(a) | 1947.4959979999999 |            8947 |
>               11 |            8958 |                   5 |
>       0
>
> SET max_parallel_maintenance_workers TO 2;
>
>             query             |     total_time     | shared_blks_hit |
> shared_blks_read | total_read_blks | shared_blks_dirtied |
> shared_blks_written
>
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
>  CREATE INDEX idx1 on test(a) | 1942.1426040000001 |            8960 |
>               14 |            8974 |                   5 |
>       0
> (1 row)
>
> I have noticed that the total_read_blks, with the parallel, create
> index is more compared to non-parallel one.  I have created a fresh
> database before each run.  I am not much aware of the internal code of
> parallel create an index so I am not sure whether it is expected to
> read extra blocks with the parallel create an index.  I guess maybe
> because multiple workers are inserting int the btree they might need
> to visit some btree nodes multiple times while traversing the tree
> down.  But, it's better if someone who have more idea with this code
> can confirm this.
>

Peter, Is this behavior expected?

Let me summarize the situation so that it would be easier for Peter to
comment.  Julien has noticed that parallel vacuum and parallel create
index doesn't seem to report correct values for buffer usage stats.
Sawada-San wrote a patch to fix the problem for both the cases.  We
expect that 'total_read_blks' as reported in pg_stat_statements should
give the same value for parallel and non-parallel operations.  We see
that is true for parallel vacuum and previously we have the same
observation for the parallel query. Now, for parallel create index
this doesn't seem to be true as test results by Dilip show that.  We
have two possibilities here (a) there is some bug in Sawada-San's
patch or (b) this is expected behavior for parallel create index.
What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Peter Geoghegan
Date:
Subject: Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)