Home > mailing lists

Re: asynchronous and vectorized execution - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Re: asynchronous and vectorized execution
Date	May 11, 2016 14:17:39
Msg-id	57333EFE.3@postgrespro.ru Whole thread Raw
In response to	Re: asynchronous and vectorized execution (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: asynchronous and vectorized execution
List	pgsql-hackers

Tree view


On 11.05.2016 17:00, Robert Haas wrote:
> On Tue, May 10, 2016 at 3:42 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Doesn't this actually mean that we need to have normal job scheduler which
>> is given queue of jobs and having some pool of threads will be able to
>> orginize efficient execution of queries? Optimizer can build pipeline
>> (graph) of tasks, which corresponds to execution plan nodes, i.e. SeqScan,
>> Sort, ... Each task is splitted into several jobs which can be concurretly
>> scheduled by task dispatcher.  So you will not have blocked worker waiting
>> for something and all system resources will be utilized. Such approach with
>> dispatcher allows to implement quotas, priorities,... Also dispatches can
>> care about NUMA and cache optimizations which is especially critical on
>> modern architectures. One more reference:
>> http://db.in.tum.de/~leis/papers/morsels.pdf
> I read this as a proposal to redesign the entire optimizer and
> executor to use some new kind of plan.  That's not a project I'm
> willing to entertain; it is hard to imagine we could do it in a
> reasonable period of time without introducing bugs and performance
> regressions.  I think there is a great deal of performance benefit
> that we can get by changing things incrementally.
>
Yes, I agree with you that complete rewriting of optimizer is huge 
project with unpredictable influence on performance of some queries.
Changing things incrementally is good approach, but only if we are 
moving in right direction.
I still not sure that introduction of async. operations is step in right 
direction. Async.ops are used to significantly complicate code (since 
you have to maintain state yourself). It will be bad if implementation 
of each node has to deal with async state itself in its own manner.

My suggestion is to try to provide some generic mechanism for managing 
state transition and have some scheduler which controls this process. It 
should not be responsibility of node implementation to organize 
asynchronous/parallel execution. Instead of this it should just produce 
set of jobs which execution should  be controlled by scheduler. First 
implementation of scheduler can be quite simple. But later in can become 
more clever: try to bind data to processors and do many other 
optimizations.



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Robert Haas
Date: 11 May 2016, 14:12:33
Subject: Re: asynchronous and vectorized execution

From: Bruce Momjian
Date: 11 May 2016, 14:20:22
Subject: Academic help for Postgres

Re: asynchronous and vectorized execution - Mailing list pgsql-hackers

Previous

Next