Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Thom Brown
Subject	Re: Parallel Seq Scan
Date	January 2, 2015 10:40:06
Msg-id	CAA-aLv52KyeSzV+25QsAoCw-CeQMUOFevgCWcUcXodytahYYhg@mail.gmail.com Whole thread
In response to	Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

On 1 January 2015 at 10:34, Amit Kapila <amit.kapila16@gmail.com> wrote:

> Running it again, I get the same issue. This is with parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.
>
> This doesn't happen if I set the pgbench scale to 50. I suspect this is a OOM issue. My laptop has 16GB RAM, the table is around 13GB at scale 100, and I don't have swap enabled. But I'm concerned it crashes the whole instance.
>

Isn't this a backend crash due to OOM?
And after that server will restart automatically.

Yes, I'm fairly sure it is. I guess what I'm confused about is that 8 parallel sequential scans in separate sessions (1 per session) don't cause the server to crash, but in a single session (8 in 1 session), they do.

> I also notice that requesting BUFFERS in a parallel EXPLAIN output yields no such information.
> --

Yeah and the reason for same is that all the work done related
to BUFFERS is done by backend workers, master backend
doesn't read any pages, so it is not able to accumulate this
information.

> Is that not possible to report?

It is not impossible to report such information, we can develop some
way to share such information between master backend and workers.
I think we can do this if required once the patch is more stablized.

Ah great, as I think losing such information to this feature would be unfortunate.

Will there be a GUC to influence parallel scan cost? Or does it take into account effective_io_concurrency in the costs?

And will the planner be able to decide whether or not it'll choose to use background workers or not? For example:

# explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=89584.00..89584.05 rows=5 width=4) (actual time=228.222..228.224 rows=5 loops=1)
   Output: bid
   Group Key: pgbench_accounts.bid
   Buffers: shared hit=83334
   -> Seq Scan on public.pgbench_accounts (cost=0.00..88334.00 rows=500000 width=4) (actual time=0.008..136.522 rows=500000 loops=1)
         Output: bid
         Buffers: shared hit=83334
Planning time: 0.071 ms
Execution time: 228.265 ms
(9 rows)

This is a quick plan, but if we tell it that it's allowed 8 background workers:

# set parallel_seqscan_degree = 8;
SET
Time: 0.187 ms

# explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12291.75..12291.80 rows=5 width=4) (actual time=603.042..603.042 rows=1 loops=1)
   Output: bid
   Group Key: pgbench_accounts.bid
   -> Parallel Seq Scan on public.pgbench_accounts (cost=0.00..11041.75 rows=500000 width=4) (actual time=2.445..529.284 rows=500000 loops=1)
         Output: bid
         Number of Workers: 8
         Number of Blocks Per Workers: 10416
Planning time: 0.049 ms
Execution time: 663.103 ms
(9 rows)

Time: 663.437 ms

It's significantly slower. I'd hope the planner would anticipate this and decide, "I'm just gonna perform a single scan in this instance as it'll be a lot quicker for this simple case." So at the moment parallel_seqscan_degree seems to mean "You *must* use this many threads if you can parallelise." Ideally we'd be saying "can use up to if necessary".

Thanks

Thom

pgsql-hackers by date:

From: Amit Kapila
Date: 02 January 2015, 10:36:29
Subject: Re: Parallel Seq Scan

From: Amit Kapila
Date: 02 January 2015, 11:13:59
Subject: Re: Parallel Seq Scan

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next