Home > mailing lists

Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

From	Soumyadeep Chakraborty
Subject	Re: Parallel Seq Scan vs kernel read ahead
Date	June 3, 2020 22:18:31
Msg-id	CAE-ML+-q_MzaHbMYyKA1qv=PzLZz0oKNo4OMPbx9CCUBBKtGmg@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan vs kernel read ahead (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
Responses	Re: Parallel Seq Scan vs kernel read ahead
List	pgsql-hackers

Tree view

On Sat, May 23, 2020 at 12:00 AM Robert Haas
<robertmhaas(at)gmail(dot)com> wrote:
> I think there's a significant difference. The idea I remember being
> discussed at the time was to divide the relation into equal parts at
> the very start and give one part to each worker. I think that carries
> a lot of risk of some workers finishing much sooner than others.

Was the idea of work-stealing considered? Here is what I have been
thinking about:

Each worker would be assigned a contiguous chunk of blocks at init time.
Then if a worker is finished with its work, it can inspect other
workers' remaining work and "steal" some of the blocks from the end of
the victim worker's allocation.

Considerations for such a scheme:

1. Victim selection: Who will be the victim worker? It can be selected at
random if nothing else.

2. How many blocks to steal? Stealing half of the victim's remaining
blocks seems to be fair.

3. Stealing threshold: We should disallow stealing if the amount of
remaining work is not enough in the victim worker.

4. Additional parallel state: Representing the chunk of "work". I guess
one variable for the current block and one for the last block in the
chunk allocated. The latter would have to be protected with atomic
fetches as it would be decremented by the stealing worker.

5. Point 4 implies that there might be more atomic fetch operations as
compared to this patch. Idk if that is a lesser evil than the workers
being idle..probably not? A way to salvage that a little would be to
forego atomic fetches when the amount of work remaining is less than the
threshold discussed in 3 as there is no possibility of work stealing then.

Regards,

Soumyadeep

pgsql-hackers by date:

From: Soumyadeep Chakraborty
Date: 03 June 2020, 22:14:59
Subject: Re: Parallel Seq Scan vs kernel read ahead

From: Alvaro Herrera
Date: 03 June 2020, 22:27:12
Subject: Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical ()at walsender.c:2762

Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

Previous

Next