Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | 54C0873D.3070001@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On 21-01-2015 PM 09:43, Amit Kapila wrote: > On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote <amitlangote09@gmail.com> > wrote: >> On Wednesday, January 21, 2015, Amit Kapila <amit.kapila16@gmail.com> > wrote: >>> >>> >>> Does it happen only when parallel_seqscan_degree > max_worker_processes? >> >> >> I have max_worker_processes set to the default of 8 while > parallel_seqscan_degree is 4. So, this may be a case different from Thom's. >> > > I think this is due to reason that memory for forming > tuple in master backend is retained for longer time which > is causing this statement to take much longer time than > required. I have fixed the other issue as well reported by > you in attached patch. > Thanks for fixing. > I think this patch is still not completely ready for general > purpose testing, however it could be helpful if we can run > some tests to see in what kind of scenario's it gives benefit > like in the test you are doing if rather than increasing > seq_page_cost, you should add an expensive WHERE condition > so that it should automatically select parallel plan. I think it is better > to change one of the new parameter's (parallel_setup_cost, > parallel_startup_cost and cpu_tuple_comm_cost) if you want > your statement to use parallel plan, like in your example if > you would have reduced cpu_tuple_comm_cost, it would have > selected parallel plan, that way we can get some feedback about > what should be the appropriate default values for the newly added > parameters. I am already planing to do some tests in that regard, > however if I get some feedback from other's that would be helpful. > > Perhaps you are aware or you've postponed working on it, but I see that a plan executing in a worker does not know about instrumentation. It results in the EXPLAIN ANALYZE showing incorrect figures. For example compare the normal seqscan and parallel seqscan below: postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE sqrt(a) < 3456 AND md5(a::text) LIKE 'ac%'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------Seq Scan ontest (cost=0.00..310228.52 rows=16120 width=4) (actual time=0.497..17062.436 rows=39028 loops=1) Filter: ((sqrt((a)::double precision) < 3456::double precision) AND (md5((a)::text) ~~ 'ac%'::text)) Rows Removed by Filter: 9960972Planning time: 0.206 msExecution time: 17378.413 ms (5 rows) postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE sqrt(a) < 3456 AND md5(a::text) LIKE 'ac%'; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Parallel SeqScan on test (cost=0.00..255486.08 rows=16120 width=4) (actual time=7.329..4906.981 rows=39028 loops=1) Filter: ((sqrt((a)::double precision) < 3456::double precision) AND (md5((a)::text) ~~ 'ac%'::text)) Rows Removed by Filter: 1992710 Number of Workers: 4 Number of Blocks Per Worker: 8849Planningtime: 0.137 msExecution time: 6077.782 ms (7 rows) Note the "Rows Removed by Filter". I guess the difference may be because, all the rows filtered by workers were not accounted for. I'm not quite sure, but since exec_worker_stmt goes the Portal way, QueryDesc.instrument_options remains unset and hence no instrumentation opportunities in a worker backend. One option may be to pass instrument_options down to worker_stmt? By the way, 17s and 6s compare really well in favor of parallel seqscan above, :) Thanks, Amit
pgsql-hackers by date: