Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan vs kernel read ahead
Date
Msg-id CAA4eK1+Cu=2qHP0LWaB4jZB2tf7GJmruxapoPXv72UZmdVvQcw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan vs kernel read ahead  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Parallel Seq Scan vs kernel read ahead
List pgsql-hackers
On Tue, Jul 21, 2020 at 8:06 AM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
>
> Thank you for the advice. I repeated the test as per your advice and average of 3 runs
> per worker/s planned.
> It still shows the following similar performance results between Master and Patch V2.
> I wonder why there's no difference though.
>
> The test on my machine is roughly like this:
>
> createdb test
> psql -d test
> create table t_heap as select generate_series(1, 100000000) i;
> \q
>
> pg_ctl restart
> psql -d test
> SET track_io_timing = on;
> SET max_parallel_workers_per_gather = 0;
> SHOW max_parallel_workers_per_gather;
> EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) from t_heap;
> \timing
> SELECT count(*) from t_heap;
>
> drop table t_heap;
> \q
> dropdb test
> pg_ctl restart
>
> Below are the results. Again, almost no discernible difference between the master and patch.
> Also, the results when max_parallel_workers_per_gather is more than 4 could be inaccurate
> due to my machine's limitation of only having v4 CPUs. Even so, query planner capped it at
> 6 workers.
>
> Query Planner I/O Timings (track_io_timing = on) in ms :
> | Worker | I/O READ (Master) | I/O READ (Patch) | I/O WRITE (Master) | I/O WRITE (Patch) |
> |--------|-------------------|------------------|--------------------|-------------------|
> | 0      | "1,130.777"       | "1,250.821"      | "01,698.051"       | "01,733.439"      |
> | 1      | "1,603.016"       | "1,660.767"      | "02,312.248"       | "02,291.661"      |
> | 2      | "2,036.269"       | "2,107.066"      | "02,698.216"       | "02,796.893"      |
> | 3      | "2,298.811"       | "2,307.254"      | "05,695.991"       | "05,894.183"      |
> | 4      | "2,098.642"       | "2,135.960"      | "23,837.088"       | "26,537.158"      |
> | 5      | "1,956.536"       | "1,997.464"      | "45,891.851"       | "48,049.338"      |
> | 6      | "2,201.816"       | "2,219.001"      | "61,937.828"       | "67,809.486"      |
>
> Query Planner Execution Time (ms):
> | Worker | QueryPlanner (Master) | QueryPlanner (Patch) |
> |--------|-----------------------|----------------------|
> | 0.000  | "40,454.252"          | "40,521.578"         |
> | 1.000  | "21,332.067"          | "21,205.068"         |
> | 2.000  | "14,266.756"          | "14,385.539"         |
> | 3.000  | "11,597.936"          | "11,722.055"         |
> | 4.000  | "12,937.468"          | "13,439.247"         |
> | 5.000  | "14,383.083"          | "14,782.866"         |
> | 6.000  | "14,671.336"          | "15,507.581"         |
>
> Based from the results above, the I/O latency increases as number of workers
> also increase. Despite that, the query planner execution time is almost closely same
> when 2 or more workers are used (14~11s). Same results between Master and Patch V2.
>
> As for buffers, same results are shown per worker (both Master and Patch).
> | Worker | Buffers                                          |
> |--------|--------------------------------------------------|
> | 0      | shared read=442478 dirtied=442478 written=442446 |
> | 1      | shared read=442478 dirtied=442478 written=442414 |
> | 2      | shared read=442478 dirtied=442478 written=442382 |
> | 3      | shared read=442478 dirtied=442478 written=442350 |
> | 4      | shared read=442478 dirtied=442478 written=442318 |
> | 5      | shared read=442478 dirtied=442478 written=442286 |
> | 6      | shared read=442478 dirtied=442478 written=442254 |
>
>
> SQL Query Execution Time (ms) :
> | Worker | SQL (Master) | SQL (Patch)  |
> |--------|--------------|--------------|
> | 0      | "10,418.606" | "10,377.377" |
> | 1      | "05,427.460"  | "05,402.727" |
> | 2      | "03,662.998"  | "03,650.277" |
> | 3      | "02,718.837"  | "02,692.871" |
> | 4      | "02,759.802"  | "02,693.370" |
> | 5      | "02,761.834"  | "02,682.590" |
> | 6      | "02,711.434"  | "02,726.332" |
>
> The SQL query execution time definitely benefitted from previous run of query planner,
> so the results are faster. But again, both Master and Patched have almost the same results.
> Nonetheless, the execution time is almost consistent when
> max_parallel_workers_per_gather is 2 (default) and above.
>
> I am definitely missing something. Perhaps I think I could not understand why there's no
> I/O difference between the Master and Patched (V2). Or has it been already improved
> even without this patch?
>

I don't think it is strange that you are not seeing much difference
because as per the initial email by Thomas this patch is not supposed
to give benefits on all systems.  I think we wanted to check that the
patch should not regress performance in cases where it doesn't give
benefits.  I think it might be okay to run with a higher number of
workers than you have CPUs in the system as we wanted to check if such
cases regress as shown by Soumyadeep above [1].  Can you once try with
8 and or 10 workers as well?

[1] - https://www.postgresql.org/message-id/CADwEdoqirzK3H8bB%3DxyJ192EZCNwGfcCa_WJ5GHVM7Sv8oenuA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING
Next
From: Michael Paquier
Date:
Subject: Re: Added tab completion for the missing options in copy statement