Thread: [HACKERS] [GSoC] Push-based query executor discussion

[HACKERS] [GSoC] Push-based query executor discussion

From
Arseny Sher
Date:
Hello,

I would like to work on push-based executor [1] during GSoC, so I'm
writing to introduce myself and start the discussion of the project. I
think I should mention beforehand that the subject is my master's
thesis topic, and I have already started working on it. This letter is
not (obviously) a ready proposal but rather initial point to talk over
the concept. Below you can see a short review of the idea, description
of benefits for the community, details, related work and some info
about me.


*Brief review*
The idea is described at the wiki page [1] and in the letter [2]. I
propose to replace current ExecProcNode interface between execution
nodes with function called, say, pushTuple that pushes the ready tuple
to the current node's parent.


*Benefits for the community*
Why would we want this? In general, because Postgres executor is slow
for CPU-bound queries and this approach should accelerate it. [4] and
[5] argue that this model results in better code and data locality,
and that JIT compilation makes the difference even more drastic.

Besides, while working on this, in order to study the effects of model
change I will try to investigate the Postgres executor's performance
in both models extensively. For instance, it is commonly accepted that
current Volcano-style model leads to poor usage of modern CPUs
pipelining abilities and large percent of branch mispredictions. I am
going to see whether, where and when this is true in Postgres;
profiling results should be useful for any further executor
optimizations.


*Project details*
Technically, I am planning to implement this as follows. Common for
all nodes code which needs to be changed is in execMain.c and
execProcnode.c; standard_ExecutorRun in execMain.c now should start
execution of all leaf nodes in proper order instead of pulling tuples
one-by-one from top-level node. By 'proper' order here I mean that
inner nodes will be run first, outer nodes second, so that when the
first tuple from outer side of some node arrives to it, the node
already received all its tuples from the inner side.

How we 'start' execution of a leaf? Recall that now instead of
ExecProcNode we have pushTuple function with following signature:

bool pushTuple(TupleTableSlot *slot, PlanState *node, PlanState *pusher)

'slot' is the tuple we push. 'node' is a receiver of tuple, 'pusher'
is sender of the tuple, its parent is 'node'. We need 'pusher' only to
distinguish inner and outer pushes. This function returns true if
'node' is still accepting tuples after the push, false if not,
e.g. Limit node can return false after required number of tuples were
passed. We also add the convention that when a node has nothing to
push anymore, it calls pushTuple with slot=NULL to let parent know
that it is done. So, to start execution of a leaf,
standard_ExecutorRun basically needs to call pushTuple(NULL, leaf,
NULL) once. Leaf nodes are a special case because pusher=NULL; another
obvious special case is top-level node: it calls pushTuple(slot, NULL,
node), such call will push the slot to the destination
((*dest->receiveSlot) (slot, dest) in current code).

Like ExecProcNode, pushTuple will call the proper implementation, e.g.
pushTupleToLimit. Such implementations will contain the code similar
to its analogue (e.g. ExecLimit), but, very roughly, where we have

return slot;

now, in push model we will have

bool parent_accepts_tuples = pushTuple(slot, node->parent, node);

and then we will continue execution if parent_accepts_tuples is true
or exit if not.

Complex nodes require more complicated modifications to preserve the
correct behaviour and be efficient. The latter leads to some
architectural issues: for example, efficient SeqScan should call
pushTuple from function doing similar to what heapgettups_pagemode
currently does, otherwise, we would need to save/restore its state
(lines, page, etc) every time for each tuple. On the other hand, it is
not nice to call pushTuple from there because currently access level
(heapam.c) knows nothing about PlanStates. Such issues will need to be
addressed and discussed with the community.

Currently, I have a prototype (pretty much WIP) which implements this
model for SeqScan, Limit, Hash and Hashjoin nodes.

Since TPC-H benchmarks are de facto standard to evaluate such things,
I am planning to to use them for testing. BTW, I’ve written a couple
of scripts to automate this job [16], although it seems that everyone
who tests TPC-H ends up with writing his own version.

Now, it is clear that rewriting all nodes with full support in such a
manner is huge work. Besides, we still don't know quantitative profit
of this model.  Because of that, I do not propose any timeline right
now; instead, we should decide which part of this work (if any) is
going to be done in the course of GSoC. Probably, all TPC-H queries
with and without index support is a good initial target, but this
needs to be discussed. Anyway, I don't think that the result will be a
patch ready to be merged into Postgres master straight away, because
it is rather radical change; and it seems that supporting both
executors simultaneously is also a bad idea because many code would be
duplicated in this case.


*Related work*
There are other works aimed for improving executor performance. I can
mention at least four approaches:
* JITing things [6][10][17]
* batched and/or vectorized execution [7][8][9]
* expression evaluation optimizations [10][17]
* slot deforming optimizations [10]

The latter two are orthogonal to the proposed project.
Batched execution and JIT can be applied together, and some study [10]
shows benefits of such combined approach.

While batched execution and push-based execution model can be applied
together too, they seemingly solve more or less same problems -- code
and data locality, avoiding reloading node's state and better use of
modern CPU features.  However, batched execution requires massive
changes to the current code and seems harder to implement; IIRC I have
seen some patches on this at mailing lists, but I am not aware which
work is the most complete as of now. It is not easy to compare these
approaches theoretically; probably, the best way to estimate their
effect is to implement them and run benchmarks.

Relationship between JIT-compilation and push-based execution model is
interesting: paper [5] shows that HyPer system with JIT + push runs 4x
times faster on some queries than JIT + pull. It should be noted that
HyPer uses column-wise storage though.
Full query compiler developed at ISP RAS [6] speeds up query
processing 2-5 times on TPC-H queries and exploits push-based
execution model. However, supporting it requires implementing executor
logic twice: in plain C for usual interpreter and in LLVM API for JIT
compiler. Ideally we would like to write the code once and be able to
use it either with and without JIT compilation.
There is an ongoing work at ISP RAS to make this possible using
automatic runtime code specialization; however, experiments have
showed that specialization of original Postgres C code doesn't give
significant improvement because of Volcano-style model. We expect that
after making a switch to push-based model in Postgres code we will
achieve speedup comparable to full-query JIT using runtime
specialization.


*About me*
My name is Arseny Sher. Currently, I am studying master's degree at
CMC MSU [12] and working at ISP RAS [13]. Earlier I got bachelor's
degree at the same faculty. I started working with Postgres at the end
of October and I love its extensibility and excellent quality of
code. My previous work was mainly connected with big graphs
computation (keywords are Spark, GraphX, Scala, GraphLab). I also did
some scala.js coding for Russian philologists and participated in
project for IMDG performance comparison, doing mostly devops stuff
(Docker, Ansible, Python). Here are my stackoverflow [14] & github
accounts [15].

The idea of this project was born when my colleagues working on JITing
Postgres realized that runtime specialization of C code written in
push-based architecture should be much more efficient than
specializing existing code (see 'Related work' section), and at the
time I’ve decided that I want my thesis to be connected with
PostgreSQL.

I am ready to work full-time this summer and I think that push-based
execution of all TPC-H queries is quite an achievable goal; however, I
haven't yet studied all required nodes in detail and I will make more
exact estimations if the community supports this project.

P.S. Should letters like this go to hackers or students mailing list?
The latter seems more suitable, but it looks rather dead...

____________________________________________________________
References

[1] https://wiki.postgresql.org/wiki/GSoC_2017#Implementing_push-based_query_executor
[2] https://www.postgresql.org/message-id/CAFRJ5K3sDGSn%3DpKgnsobYQX4CMTOU%3D0uJ-vt2kF3t1FsVnTCRQ%40mail.gmail.com
[4] Efficiently Compiling Efficient Query Plans for Modern Hardware,
    http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
[5] Compiling Database Queries into Machine Code,
    http://sites.computer.org/debull/A14mar/p3.pdf
[6] LLVM Cauldron, slides
    http://llvm.org/devmtg/2016-09/slides/Melnik-PostgreSQLLLVM.pdf
[7]
https://www.postgresql.org/message-id/flat/CA%2BTgmobx8su_bYtAa3DgrqB%2BR7xZG6kHRj0ccMUUshKAQVftww%40mail.gmail.com#CA+Tgmobx8su_bYtAa3DgrqB+R7xZG6kHRj0ccMUUshKAQVftww@mail.gmail.com
[8]
https://www.postgresql.org/message-id/flat/20160624232953.beub22r6yqux4gcp%40alap3.anarazel.de#20160624232953.beub22r6yqux4gcp@alap3.anarazel.de
[9]
https://www.postgresql.org/message-id/flat/50877c0c-fb88-b601-3115-55a8c70d693e%40postgrespro.ru#50877c0c-fb88-b601-3115-55a8c70d693e@postgrespro.ru
[10]
https://www.postgresql.org/message-id/flat/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de#20161206034955.bh33paeralxbtluv@alap3.anarazel.de
[11] Vectorization vs. Compilation in Query Execution
     https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf
[12] https://cs.msu.ru/en
[13] http://www.ispras.ru/en/
[14] http://stackoverflow.com/users/4014587/ars
[15] https://github.com/arssher
[16] https://github.com/arssher/pgtpch
[17]
https://www.postgresql.org/message-id/flat/CADviLuNjQTh99o6E0LTi0Ygks%3DnaW8SXHmgn%3D8P%2BaaBXKXa0pA%40mail.gmail.com#CADviLuNjQTh99o6E0LTi0Ygks=naW8SXHmgn=8P+aaBXKXa0pA@mail.gmail.com

--
Arseny Sher


Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Robert Haas
Date:
On Mon, Mar 6, 2017 at 11:20 AM, Arseny Sher <sher-ars@yandex.ru> wrote:
> I would like to work on push-based executor [1] during GSoC, so I'm
> writing to introduce myself and start the discussion of the project. I
> think I should mention beforehand that the subject is my master's
> thesis topic, and I have already started working on it. This letter is
> not (obviously) a ready proposal but rather initial point to talk over
> the concept. Below you can see a short review of the idea, description
> of benefits for the community, details, related work and some info
> about me.

While I admire your fearlessness, I think the chances of you being
able to bring a project of this type to a successful conclusion are
remote.  Here is what I said about this topic previously:

http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+vKu21nk3prkRjSdbhjubN7qvc8UKuGg@mail.gmail.com

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Arseny Sher
Date:
I will share the actual benchmarks and the code to give it another
chance and give an idea how the result looks like. Currently I have
implemented suggested changes to nodes seqscan, hash, hashjoin, limit,
hashed aggregation and in-memory sort. It allows to test q1, q3, q4, q5,
q10, q12 and q14 queries from TPC-H set. Since my goal was just to
estimate performance benefits, there are several restictions:
* ExecReScan is not supported
* only CMD_SELECT operations currently work
* only forward direction is supported.
SRF, subplanes and parallel execution are not supported either because
corresponding nodes are not yet implemented.


Here you can see the results:

+-----+-----------+---------+----------+
|query|reversed, s|master, s|speedup, %|
+-----+-----------+---------+----------+
|q01  |128.53     |138.94   |8.1       |
+-----+-----------+---------+----------+
|q03  |61.53      |67.29    |9.36      |
+-----+-----------+---------+----------+
|q04  |86.27      |95.95    |11.22     |
+-----+-----------+---------+----------+
|q05  |54.44      |56.82    |4.37      |
+-----+-----------+---------+----------+
|q10  |55.44      |59.88    |8.01      |
+-----+-----------+---------+----------+
|q12  |69.59      |76.65    |10.15     |
+-----+-----------+---------+----------+


'reversed' is Postgres with push-based executor, master' is current
master branch. 24 runs were conducted and median of them was
taken. Speedup in % is (master - reversed) / reversed * 100. Scale of
TPC-H database was 40. We use doubles as floating point types instead of
numerics. Only q1 here is fully supported, meaning that that the planner
would anyway choose this plan, even if all other nodes were
implemented. For other queries planner also uses Index Scan, Nested Loop
Semi Join, bitmap scans, Materialize, which are not yet reversed.

postgresql.conf was

shared_buffers = 128GB
maintenance_work_mem = 1GB
work_mem = 8GB
effective_cache_size = 128GB

max_wal_senders = 0
max_parallel_workers_per_gather = 0  # disable parallelism

# disable not yet implemented nodes
set enable_indexscan TO off;
set enable_indexonlyscan TO off;
set enable_material TO off;
set enable_bitmapscan TO off;
set enable_nestloop TO off;
set enable_sort TO off;

Patches are attached, they apply cleanly on 767ce36ff36747.

While in some places patches introduce kind of ugliness which is
described in commit messages and commits, e.g. heapam.h now must know
about PlanState *, I think in others this approach can make the
architecture a bit cleaner.  Specifically, now we can cleanly separate
logic for handling tuples from inner and outer sides (see hashjoin), and
also separate logic for handling NULL tuples. I haven't yet added the
latter, but the idea is that the node below always knows when it is
done, so it can call its parent function for handling null tuples
directly instead of keeping extra 'if' in generic
execProcNode/pushTuple.

--
Arseny Sher

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Arseny Sher
Date:
> While I admire your fearlessness, I think the chances of you being
> able to bring a project of this type to a successful conclusion are
> remote.  Here is what I said about this topic previously:
>
> http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+vKu21nk3prkRjSdbhjubN7qvc8UKuGg@mail.gmail.com

Well, as I said, I don't pretend that I will support full functionality:
>> instead, we should decide which part of this work (if any) is
>> going to be done in the course of GSoC. Probably, all TPC-H queries
>> with and without index support is a good initial target, but this
>> needs to be discussed.

I think that successfull completion of this project should be a clear
and justified answer to the question "Is this idea is good enough to
work on merging it into the master?", not the production-ready patches
themselves. Nevertheless, of course project success criterion must be
reasonably formalized -- e.g. implement nodes X with features Y, etc.



Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Oleg Bartunov
Date:


On Wed, Mar 22, 2017 at 8:04 PM, Arseny Sher <sher-ars@ispras.ru> wrote:
> While I admire your fearlessness, I think the chances of you being
> able to bring a project of this type to a successful conclusion are
> remote.  Here is what I said about this topic previously:
>
> http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+vKu21nk3prkRjSdbhjubN7qvc8UKuGg@mail.gmail.com

Well, as I said, I don't pretend that I will support full functionality:
>> instead, we should decide which part of this work (if any) is
>> going to be done in the course of GSoC. Probably, all TPC-H queries
>> with and without index support is a good initial target, but this
>> needs to be discussed.

I think that successfull completion of this project should be a clear
and justified answer to the question "Is this idea is good enough to
work on merging it into the master?", not the production-ready patches
themselves. Nevertheless, of course project success criterion must be
reasonably formalized -- e.g. implement nodes X with features Y, etc.

How many GSoC slots and possible students we have ? 

Should we reject this interesting project, which based on several years of research work of academician group in the institute ? May be better help him to reformulate the scope of project and let him work ? I don't know exactly if the results of GSoC project should be committed , but as a research project it's certainly would be useful for the community.
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC] Push-based query executor discussion

From
sher-ars@ispras.ru
Date:
Oleg Bartunov <obartunov@gmail.com> writes:

> I don't know exactly if the results of GSoC project should be 
> committed,

Technically, they are not required:
https://developers.google.com/open-source/gsoc/faq

> Are mentoring organizations required to use the code produced by
> students?

> No. While we hope that all the code that comes out of this program will
> find a happy home, we don’t require organizations to use the student's'
> code.


--
Arseny Sher




Re: [GSoC] Push-based query executor discussion

From
Arseny Sher
Date:
I have cleaned up the code a bit and added the separation I mentioned in
a previous mail -- now we there are even three functions instead of old
ExecProcNode: one for starting leaf nodes, one for passing tuples and
one for signaling that the node has finished its job. It is all
described in execProcnode.c.

I also rewrote HashJoin without using the explicit state machine. It
seems slightly cleaner to me now...

Here are updated benchmarks:
+-----+-----------+---------+----------+
|query|reversed, s|master, s|speedup, %|
+-----+-----------+---------+----------+
|q01  |108.21     |117.88   |8.94      |
+-----+-----------+---------+----------+
|q03  |55.48      |58.805   |5.99      |
+-----+-----------+---------+----------+
|q04  |78.405     |81.86    |4.41      |
+-----+-----------+---------+----------+
|q05  |49.91      |51.18    |2.54      |
+-----+-----------+---------+----------+
|q10  |49.215     |52.61    |6.9       |
+-----+-----------+---------+----------+
|q12  |63.24      |68.505   |8.33      |
+-----+-----------+---------+----------+
|q14  |33.42      |35.31    |5.66      |
+-----+-----------+---------+----------+

As before, 24 runs were performed, median taken, scale is 40GB,
postgresql.conf is the same.

Patches are rebased, now they apply on 4dd3abe99f50.


--
Arseny Sher

Re: [GSoC] Push-based query executor discussion

From
Arseny Sher
Date:
Time is short, student's application deadline is on 3rd April. I decided
to reformulate the project scope myself. Here is the proposal:

https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing

The main idea is that now there is a formalized goal of the project,
"partial support of all TPC-H queries".

I am also CC'ing people who was mentioned in "Potential Mentors" section
on GSoC wiki page.


--
Arseny Sher



Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Alexander Korotkov
Date:
On Sun, Apr 2, 2017 at 12:13 AM, Arseny Sher <sher-ars@ispras.ru> wrote:
Time is short, student's application deadline is on 3rd April. I decided
to reformulate the project scope myself. Here is the proposal:

https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing

The main idea is that now there is a formalized goal of the project,
"partial support of all TPC-H queries".

I am also CC'ing people who was mentioned in "Potential Mentors" section
on GSoC wiki page.

I'd love to see a comment from Andres Freund who is leading executor performance improvements.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Kevin Grittner
Date:
On Thu, Apr 6, 2017 at 8:11 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

>> https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing

> I'd love to see a comment from Andres Freund who is leading executor
> performance improvements.

Note that the final proposal is here:

https://summerofcode.withgoogle.com/serve/5874530240167936/

Also, I just entered a comment about an important question that I
think needs to be answered right up front.

--
Kevin Grittner


Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Tom Lane
Date:
Kevin Grittner <kgrittn@gmail.com> writes:
> Note that the final proposal is here:
> https://summerofcode.withgoogle.com/serve/5874530240167936/

I'm just getting a blank page at that URL?

            regards, tom lane


Re: [HACKERS] [GSoC] Push-based query executor discussion

From
Kevin Grittner
Date:
Sorry, I didn't notice that this was going to a public list.  That URL
is only available to people who signed up as mentors for PostgreSQL
GSoC participation this year.  Does the link to the draft work for you?

--
Kevin Grittner


Re: [pgsql-students] [HACKERS] [GSoC] Push-based query executor discussion

From
Simon Riggs
Date:
On 22 March 2017 at 14:58, Oleg Bartunov <obartunov@gmail.com> wrote:

> Should we reject this interesting project, which based on several years of
> research work of academician group in the institute ? May be better help him
> to reformulate the scope of project and let him work ? I don't know exactly
> if the results of GSoC project should be committed , but as a research
> project it's certainly would be useful for the community.

+1

Arseny, thank you for your contributions.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services