Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Thom Brown |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CAA-aLv52KyeSzV+25QsAoCw-CeQMUOFevgCWcUcXodytahYYhg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On 1 January 2015 at 10:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Running it again, I get the same issue. This is with parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.>
> This doesn't happen if I set the pgbench scale to 50. I suspect this is a OOM issue. My laptop has 16GB RAM, the table is around 13GB at scale 100, and I don't have swap enabled. But I'm concerned it crashes the whole instance.
>Isn't this a backend crash due to OOM?And after that server will restart automatically.
Yes, I'm fairly sure it is. I guess what I'm confused about is that 8 parallel sequential scans in separate sessions (1 per session) don't cause the server to crash, but in a single session (8 in 1 session), they do.
> I also notice that requesting BUFFERS in a parallel EXPLAIN output yields no such information.
> --
Yeah and the reason for same is that all the work done relatedto BUFFERS is done by backend workers, master backenddoesn't read any pages, so it is not able to accumulate thisinformation.> Is that not possible to report?It is not impossible to report such information, we can develop someway to share such information between master backend and workers.I think we can do this if required once the patch is more stablized.
Ah great, as I think losing such information to this feature would be unfortunate.
Will there be a GUC to influence parallel scan cost? Or does it take into account effective_io_concurrency in the costs?
And will the planner be able to decide whether or not it'll choose to use background workers or not? For example:
# explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=89584.00..89584.05 rows=5 width=4) (actual time=228.222..228.224 rows=5 loops=1)
Output: bid
Group Key: pgbench_accounts.bid
Buffers: shared hit=83334
-> Seq Scan on public.pgbench_accounts (cost=0.00..88334.00 rows=500000 width=4) (actual time=0.008..136.522 rows=500000 loops=1)
Output: bid
Buffers: shared hit=83334
Planning time: 0.071 ms
Execution time: 228.265 ms
(9 rows)
And will the planner be able to decide whether or not it'll choose to use background workers or not? For example:
# explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=89584.00..89584.05 rows=5 width=4) (actual time=228.222..228.224 rows=5 loops=1)
Output: bid
Group Key: pgbench_accounts.bid
Buffers: shared hit=83334
-> Seq Scan on public.pgbench_accounts (cost=0.00..88334.00 rows=500000 width=4) (actual time=0.008..136.522 rows=500000 loops=1)
Output: bid
Buffers: shared hit=83334
Planning time: 0.071 ms
Execution time: 228.265 ms
(9 rows)
This is a quick plan, but if we tell it that it's allowed 8 background workers:
# set parallel_seqscan_degree = 8;
SET
Time: 0.187 ms
# explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12291.75..12291.80 rows=5 width=4) (actual time=603.042..603.042 rows=1 loops=1)
Output: bid
Group Key: pgbench_accounts.bid
-> Parallel Seq Scan on public.pgbench_accounts (cost=0.00..11041.75 rows=500000 width=4) (actual time=2.445..529.284 rows=500000 loops=1)
Output: bid
Number of Workers: 8
Number of Blocks Per Workers: 10416
Planning time: 0.049 ms
Execution time: 663.103 ms
(9 rows)
Time: 663.437 ms
# set parallel_seqscan_degree = 8;
SET
Time: 0.187 ms
# explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12291.75..12291.80 rows=5 width=4) (actual time=603.042..603.042 rows=1 loops=1)
Output: bid
Group Key: pgbench_accounts.bid
-> Parallel Seq Scan on public.pgbench_accounts (cost=0.00..11041.75 rows=500000 width=4) (actual time=2.445..529.284 rows=500000 loops=1)
Output: bid
Number of Workers: 8
Number of Blocks Per Workers: 10416
Planning time: 0.049 ms
Execution time: 663.103 ms
(9 rows)
Time: 663.437 ms
It's significantly slower. I'd hope the planner would anticipate this and decide, "I'm just gonna perform a single scan in this instance as it'll be a lot quicker for this simple case." So at the moment parallel_seqscan_degree seems to mean "You *must* use this many threads if you can parallelise." Ideally we'd be saying "can use up to if necessary".
Thanks
Thom
pgsql-hackers by date: