RE: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers
From | k.jamison@fujitsu.com |
---|---|
Subject | RE: Parallel Seq Scan vs kernel read ahead |
Date | |
Msg-id | OSBPR01MB2341378F27AD79873177F853EF780@OSBPR01MB2341.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Parallel Seq Scan vs kernel read ahead (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Seq Scan vs kernel read ahead
|
List | pgsql-hackers |
On Tuesday, July 21, 2020 12:18 PM, Amit Kapila wrote: > On Tue, Jul 21, 2020 at 8:06 AM k.jamison@fujitsu.com <k.jamison@fujitsu.com> > wrote: > > > > Thank you for the advice. I repeated the test as per your advice and > > average of 3 runs per worker/s planned. > > It still shows the following similar performance results between Master and > Patch V2. > > I wonder why there's no difference though. > > > > The test on my machine is roughly like this: > > > > createdb test > > psql -d test > > create table t_heap as select generate_series(1, 100000000) i; \q > > > > pg_ctl restart > > psql -d test > > SET track_io_timing = on; > > SET max_parallel_workers_per_gather = 0; SHOW > > max_parallel_workers_per_gather; EXPLAIN (ANALYZE, BUFFERS) SELECT > > count(*) from t_heap; \timing SELECT count(*) from t_heap; > > > > drop table t_heap; > > \q > > dropdb test > > pg_ctl restart > > > > Below are the results. Again, almost no discernible difference between the > master and patch. > > Also, the results when max_parallel_workers_per_gather is more than 4 > > could be inaccurate due to my machine's limitation of only having v4 > > CPUs. Even so, query planner capped it at > > 6 workers. > > > > Query Planner I/O Timings (track_io_timing = on) in ms : > > | Worker | I/O READ (Master) | I/O READ (Patch) | I/O WRITE (Master) | > > | I/O WRITE (Patch) | > > > |--------|-------------------|------------------|--------------------|------------- > ------| > > | 0 | "1,130.777" | "1,250.821" | "01,698.051" | > "01,733.439" | > > | 1 | "1,603.016" | "1,660.767" | "02,312.248" | > "02,291.661" | > > | 2 | "2,036.269" | "2,107.066" | "02,698.216" | > "02,796.893" | > > | 3 | "2,298.811" | "2,307.254" | "05,695.991" | > "05,894.183" | > > | 4 | "2,098.642" | "2,135.960" | "23,837.088" | > "26,537.158" | > > | 5 | "1,956.536" | "1,997.464" | "45,891.851" | > "48,049.338" | > > | 6 | "2,201.816" | "2,219.001" | "61,937.828" | > "67,809.486" | > > > > Query Planner Execution Time (ms): > > | Worker | QueryPlanner (Master) | QueryPlanner (Patch) | > > |--------|-----------------------|----------------------| > > | 0.000 | "40,454.252" | "40,521.578" | > > | 1.000 | "21,332.067" | "21,205.068" | > > | 2.000 | "14,266.756" | "14,385.539" | > > | 3.000 | "11,597.936" | "11,722.055" | > > | 4.000 | "12,937.468" | "13,439.247" | > > | 5.000 | "14,383.083" | "14,782.866" | > > | 6.000 | "14,671.336" | "15,507.581" | > > > > Based from the results above, the I/O latency increases as number of > > workers also increase. Despite that, the query planner execution time > > is almost closely same when 2 or more workers are used (14~11s). Same > results between Master and Patch V2. > > > > As for buffers, same results are shown per worker (both Master and Patch). > > | Worker | Buffers | > > |--------|--------------------------------------------------| > > | 0 | shared read=442478 dirtied=442478 written=442446 | > > | 1 | shared read=442478 dirtied=442478 written=442414 | > > | 2 | shared read=442478 dirtied=442478 written=442382 | > > | 3 | shared read=442478 dirtied=442478 written=442350 | > > | 4 | shared read=442478 dirtied=442478 written=442318 | > > | 5 | shared read=442478 dirtied=442478 written=442286 | > > | 6 | shared read=442478 dirtied=442478 written=442254 | > > > > > > SQL Query Execution Time (ms) : > > | Worker | SQL (Master) | SQL (Patch) | > > |--------|--------------|--------------| > > | 0 | "10,418.606" | "10,377.377" | > > | 1 | "05,427.460" | "05,402.727" | > > | 2 | "03,662.998" | "03,650.277" | > > | 3 | "02,718.837" | "02,692.871" | > > | 4 | "02,759.802" | "02,693.370" | > > | 5 | "02,761.834" | "02,682.590" | > > | 6 | "02,711.434" | "02,726.332" | > > > > The SQL query execution time definitely benefitted from previous run > > of query planner, so the results are faster. But again, both Master and Patched > have almost the same results. > > Nonetheless, the execution time is almost consistent when > > max_parallel_workers_per_gather is 2 (default) and above. > > > > I am definitely missing something. Perhaps I think I could not > > understand why there's no I/O difference between the Master and > > Patched (V2). Or has it been already improved even without this patch? > > > > I don't think it is strange that you are not seeing much difference because as per > the initial email by Thomas this patch is not supposed to give benefits on all > systems. I think we wanted to check that the patch should not regress > performance in cases where it doesn't give benefits. I think it might be okay to > run with a higher number of workers than you have CPUs in the system as we > wanted to check if such cases regress as shown by Soumyadeep above [1]. Can > you once try with > 8 and or 10 workers as well? > > [1] - > https://www.postgresql.org/message-id/CADwEdoqirzK3H8bB%3DxyJ192EZCN > wGfcCa_WJ5GHVM7Sv8oenuA%40mail.gmail.com You are right. Kindly excuse me on that part, which only means it may or may not have any benefits on the filesystem I am using. But for other fs, as we can see from David's benchmarks significant results/benefits. Following your advice on regression test case, I increased the number of workers, but the query planner still capped the workers at 6, so the results from 6 workers onwards are almost the same. I don't see significant difference between master and patched on my machine as per my test results below. (Just for reconfirmation) Query Planner I/O Timings (ms): | Worker | I/O READ (Master) | I/O READ (Patch) | I/O WRITE (Master) | I/O WRITE (Patch) | |--------|-------------------|------------------|--------------------|-------------------| | 0 | "1,130.78" | "1,250.82" | "1,698.05" | "1,733.44" | | 1 | "1,603.02" | "1,660.77" | "2,312.25" | "2,291.66" | | 2 | "2,036.27" | "2,107.07" | "2,698.22" | "2,796.89" | | 3 | "2,298.81" | "2,307.25" | "5,695.99" | "5,894.18" | | 4 | "2,098.64" | "2,135.96" | "23,837.09" | "26,537.16" | | 5 | "1,956.54" | "1,997.46" | "45,891.85" | "48,049.34" | | 6 | "2,201.82" | "2,219.00" | "61,937.83" | "67,809.49" | | 8 | "2,117.80" | "2,169.67" | "60,671.22" | "68,676.36" | | 16 | "2,052.73" | "2,134.86" | "60,635.17" | "66,462.82" | | 32 | "2,036.00" | "2,200.98" | "60,833.92" | "67,702.49" | Query Planner Execution Time (ms): | Worker | QueryPlanner (Master) | QueryPlanner (Patch) | |--------|-----------------------|----------------------| | 0 | "40,454.25" | "40,521.58" | | 1 | "21,332.07" | "21,205.07" | | 2 | "14,266.76" | "14,385.54" | | 3 | "11,597.94" | "11,722.06" | | 4 | "12,937.47" | "13,439.25" | | 5 | "14,383.08" | "14,782.87" | | 6 | "14,671.34" | "15,507.58" | | 8 | "14,679.50" | "15,615.69" | | 16 | "14,474.78" | "15,274.61" | | 32 | "14,462.11" | "15,470.68" | | Worker | Buffers | |--------|--------------------------------------------------| | 0 | shared read=442478 dirtied=442478 written=442446 | | 1 | shared read=442478 dirtied=442478 written=442414 | | 2 | shared read=442478 dirtied=442478 written=442382 | | 3 | shared read=442478 dirtied=442478 written=442350 | | 4 | shared read=442478 dirtied=442478 written=442318 | | 5 | shared read=442478 dirtied=442478 written=442286 | | 6 | shared read=442478 dirtied=442478 written=442254 | | 8 | shared read=442478 dirtied=442478 written=442254 | | 16 | shared read=442478 dirtied=442478 written=442254 | | 32 | shared read=442478 dirtied=442478 written=442254 | I also re-ran the query and measured the execution time (ms) with \timing | Worker | SQL (Master) | SQL (Patch) | |--------|--------------|-------------| | 0 | 15476.458 | 15278.772 | | 1 | 8292.702 | 8426.435 | | 2 | 6256.673 | 6232.456 | | 3 | 6357.217 | 6340.013 | | 4 | 7591.311 | 7913.881 | | 5 | 8165.315 | 8070.592 | | 6 | 8065.578 | 8200.076 | | 8 | 7988.302 | 8609.138 | | 16 | 8025.170 | 8469.895 | | 32 | 8019.393 | 8645.150 | Again tested on: XFS filesystem, HDD virtual machine, 8GB RAM RHEL4, 64-bit, 4 CPUs, Intel Core Processor (Haswell, IBRS) PostgreSQL 14devel on x86_64-pc-linux-gnu So I guess it does not affect the filesystem that I am using. So I think it's OK. Kind regards, Kirk Jamison
pgsql-hackers by date: