Thread: [HACKERS] [GSoC] Push-based query executor discussion
Hello, I would like to work on push-based executor [1] during GSoC, so I'm writing to introduce myself and start the discussion of the project. I think I should mention beforehand that the subject is my master's thesis topic, and I have already started working on it. This letter is not (obviously) a ready proposal but rather initial point to talk over the concept. Below you can see a short review of the idea, description of benefits for the community, details, related work and some info about me. *Brief review* The idea is described at the wiki page [1] and in the letter [2]. I propose to replace current ExecProcNode interface between execution nodes with function called, say, pushTuple that pushes the ready tuple to the current node's parent. *Benefits for the community* Why would we want this? In general, because Postgres executor is slow for CPU-bound queries and this approach should accelerate it. [4] and [5] argue that this model results in better code and data locality, and that JIT compilation makes the difference even more drastic. Besides, while working on this, in order to study the effects of model change I will try to investigate the Postgres executor's performance in both models extensively. For instance, it is commonly accepted that current Volcano-style model leads to poor usage of modern CPUs pipelining abilities and large percent of branch mispredictions. I am going to see whether, where and when this is true in Postgres; profiling results should be useful for any further executor optimizations. *Project details* Technically, I am planning to implement this as follows. Common for all nodes code which needs to be changed is in execMain.c and execProcnode.c; standard_ExecutorRun in execMain.c now should start execution of all leaf nodes in proper order instead of pulling tuples one-by-one from top-level node. By 'proper' order here I mean that inner nodes will be run first, outer nodes second, so that when the first tuple from outer side of some node arrives to it, the node already received all its tuples from the inner side. How we 'start' execution of a leaf? Recall that now instead of ExecProcNode we have pushTuple function with following signature: bool pushTuple(TupleTableSlot *slot, PlanState *node, PlanState *pusher) 'slot' is the tuple we push. 'node' is a receiver of tuple, 'pusher' is sender of the tuple, its parent is 'node'. We need 'pusher' only to distinguish inner and outer pushes. This function returns true if 'node' is still accepting tuples after the push, false if not, e.g. Limit node can return false after required number of tuples were passed. We also add the convention that when a node has nothing to push anymore, it calls pushTuple with slot=NULL to let parent know that it is done. So, to start execution of a leaf, standard_ExecutorRun basically needs to call pushTuple(NULL, leaf, NULL) once. Leaf nodes are a special case because pusher=NULL; another obvious special case is top-level node: it calls pushTuple(slot, NULL, node), such call will push the slot to the destination ((*dest->receiveSlot) (slot, dest) in current code). Like ExecProcNode, pushTuple will call the proper implementation, e.g. pushTupleToLimit. Such implementations will contain the code similar to its analogue (e.g. ExecLimit), but, very roughly, where we have return slot; now, in push model we will have bool parent_accepts_tuples = pushTuple(slot, node->parent, node); and then we will continue execution if parent_accepts_tuples is true or exit if not. Complex nodes require more complicated modifications to preserve the correct behaviour and be efficient. The latter leads to some architectural issues: for example, efficient SeqScan should call pushTuple from function doing similar to what heapgettups_pagemode currently does, otherwise, we would need to save/restore its state (lines, page, etc) every time for each tuple. On the other hand, it is not nice to call pushTuple from there because currently access level (heapam.c) knows nothing about PlanStates. Such issues will need to be addressed and discussed with the community. Currently, I have a prototype (pretty much WIP) which implements this model for SeqScan, Limit, Hash and Hashjoin nodes. Since TPC-H benchmarks are de facto standard to evaluate such things, I am planning to to use them for testing. BTW, I’ve written a couple of scripts to automate this job [16], although it seems that everyone who tests TPC-H ends up with writing his own version. Now, it is clear that rewriting all nodes with full support in such a manner is huge work. Besides, we still don't know quantitative profit of this model. Because of that, I do not propose any timeline right now; instead, we should decide which part of this work (if any) is going to be done in the course of GSoC. Probably, all TPC-H queries with and without index support is a good initial target, but this needs to be discussed. Anyway, I don't think that the result will be a patch ready to be merged into Postgres master straight away, because it is rather radical change; and it seems that supporting both executors simultaneously is also a bad idea because many code would be duplicated in this case. *Related work* There are other works aimed for improving executor performance. I can mention at least four approaches: * JITing things [6][10][17] * batched and/or vectorized execution [7][8][9] * expression evaluation optimizations [10][17] * slot deforming optimizations [10] The latter two are orthogonal to the proposed project. Batched execution and JIT can be applied together, and some study [10] shows benefits of such combined approach. While batched execution and push-based execution model can be applied together too, they seemingly solve more or less same problems -- code and data locality, avoiding reloading node's state and better use of modern CPU features. However, batched execution requires massive changes to the current code and seems harder to implement; IIRC I have seen some patches on this at mailing lists, but I am not aware which work is the most complete as of now. It is not easy to compare these approaches theoretically; probably, the best way to estimate their effect is to implement them and run benchmarks. Relationship between JIT-compilation and push-based execution model is interesting: paper [5] shows that HyPer system with JIT + push runs 4x times faster on some queries than JIT + pull. It should be noted that HyPer uses column-wise storage though. Full query compiler developed at ISP RAS [6] speeds up query processing 2-5 times on TPC-H queries and exploits push-based execution model. However, supporting it requires implementing executor logic twice: in plain C for usual interpreter and in LLVM API for JIT compiler. Ideally we would like to write the code once and be able to use it either with and without JIT compilation. There is an ongoing work at ISP RAS to make this possible using automatic runtime code specialization; however, experiments have showed that specialization of original Postgres C code doesn't give significant improvement because of Volcano-style model. We expect that after making a switch to push-based model in Postgres code we will achieve speedup comparable to full-query JIT using runtime specialization. *About me* My name is Arseny Sher. Currently, I am studying master's degree at CMC MSU [12] and working at ISP RAS [13]. Earlier I got bachelor's degree at the same faculty. I started working with Postgres at the end of October and I love its extensibility and excellent quality of code. My previous work was mainly connected with big graphs computation (keywords are Spark, GraphX, Scala, GraphLab). I also did some scala.js coding for Russian philologists and participated in project for IMDG performance comparison, doing mostly devops stuff (Docker, Ansible, Python). Here are my stackoverflow [14] & github accounts [15]. The idea of this project was born when my colleagues working on JITing Postgres realized that runtime specialization of C code written in push-based architecture should be much more efficient than specializing existing code (see 'Related work' section), and at the time I’ve decided that I want my thesis to be connected with PostgreSQL. I am ready to work full-time this summer and I think that push-based execution of all TPC-H queries is quite an achievable goal; however, I haven't yet studied all required nodes in detail and I will make more exact estimations if the community supports this project. P.S. Should letters like this go to hackers or students mailing list? The latter seems more suitable, but it looks rather dead... ____________________________________________________________ References [1] https://wiki.postgresql.org/wiki/GSoC_2017#Implementing_push-based_query_executor [2] https://www.postgresql.org/message-id/CAFRJ5K3sDGSn%3DpKgnsobYQX4CMTOU%3D0uJ-vt2kF3t1FsVnTCRQ%40mail.gmail.com [4] Efficiently Compiling Efficient Query Plans for Modern Hardware, http://www.vldb.org/pvldb/vol4/p539-neumann.pdf [5] Compiling Database Queries into Machine Code, http://sites.computer.org/debull/A14mar/p3.pdf [6] LLVM Cauldron, slides http://llvm.org/devmtg/2016-09/slides/Melnik-PostgreSQLLLVM.pdf [7] https://www.postgresql.org/message-id/flat/CA%2BTgmobx8su_bYtAa3DgrqB%2BR7xZG6kHRj0ccMUUshKAQVftww%40mail.gmail.com#CA+Tgmobx8su_bYtAa3DgrqB+R7xZG6kHRj0ccMUUshKAQVftww@mail.gmail.com [8] https://www.postgresql.org/message-id/flat/20160624232953.beub22r6yqux4gcp%40alap3.anarazel.de#20160624232953.beub22r6yqux4gcp@alap3.anarazel.de [9] https://www.postgresql.org/message-id/flat/50877c0c-fb88-b601-3115-55a8c70d693e%40postgrespro.ru#50877c0c-fb88-b601-3115-55a8c70d693e@postgrespro.ru [10] https://www.postgresql.org/message-id/flat/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de#20161206034955.bh33paeralxbtluv@alap3.anarazel.de [11] Vectorization vs. Compilation in Query Execution https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf [12] https://cs.msu.ru/en [13] http://www.ispras.ru/en/ [14] http://stackoverflow.com/users/4014587/ars [15] https://github.com/arssher [16] https://github.com/arssher/pgtpch [17] https://www.postgresql.org/message-id/flat/CADviLuNjQTh99o6E0LTi0Ygks%3DnaW8SXHmgn%3D8P%2BaaBXKXa0pA%40mail.gmail.com#CADviLuNjQTh99o6E0LTi0Ygks=naW8SXHmgn=8P+aaBXKXa0pA@mail.gmail.com -- Arseny Sher
On Mon, Mar 6, 2017 at 11:20 AM, Arseny Sher <sher-ars@yandex.ru> wrote: > I would like to work on push-based executor [1] during GSoC, so I'm > writing to introduce myself and start the discussion of the project. I > think I should mention beforehand that the subject is my master's > thesis topic, and I have already started working on it. This letter is > not (obviously) a ready proposal but rather initial point to talk over > the concept. Below you can see a short review of the idea, description > of benefits for the community, details, related work and some info > about me. While I admire your fearlessness, I think the chances of you being able to bring a project of this type to a successful conclusion are remote. Here is what I said about this topic previously: http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+vKu21nk3prkRjSdbhjubN7qvc8UKuGg@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I will share the actual benchmarks and the code to give it another chance and give an idea how the result looks like. Currently I have implemented suggested changes to nodes seqscan, hash, hashjoin, limit, hashed aggregation and in-memory sort. It allows to test q1, q3, q4, q5, q10, q12 and q14 queries from TPC-H set. Since my goal was just to estimate performance benefits, there are several restictions: * ExecReScan is not supported * only CMD_SELECT operations currently work * only forward direction is supported. SRF, subplanes and parallel execution are not supported either because corresponding nodes are not yet implemented. Here you can see the results: +-----+-----------+---------+----------+ |query|reversed, s|master, s|speedup, %| +-----+-----------+---------+----------+ |q01 |128.53 |138.94 |8.1 | +-----+-----------+---------+----------+ |q03 |61.53 |67.29 |9.36 | +-----+-----------+---------+----------+ |q04 |86.27 |95.95 |11.22 | +-----+-----------+---------+----------+ |q05 |54.44 |56.82 |4.37 | +-----+-----------+---------+----------+ |q10 |55.44 |59.88 |8.01 | +-----+-----------+---------+----------+ |q12 |69.59 |76.65 |10.15 | +-----+-----------+---------+----------+ 'reversed' is Postgres with push-based executor, master' is current master branch. 24 runs were conducted and median of them was taken. Speedup in % is (master - reversed) / reversed * 100. Scale of TPC-H database was 40. We use doubles as floating point types instead of numerics. Only q1 here is fully supported, meaning that that the planner would anyway choose this plan, even if all other nodes were implemented. For other queries planner also uses Index Scan, Nested Loop Semi Join, bitmap scans, Materialize, which are not yet reversed. postgresql.conf was shared_buffers = 128GB maintenance_work_mem = 1GB work_mem = 8GB effective_cache_size = 128GB max_wal_senders = 0 max_parallel_workers_per_gather = 0 # disable parallelism # disable not yet implemented nodes set enable_indexscan TO off; set enable_indexonlyscan TO off; set enable_material TO off; set enable_bitmapscan TO off; set enable_nestloop TO off; set enable_sort TO off; Patches are attached, they apply cleanly on 767ce36ff36747. While in some places patches introduce kind of ugliness which is described in commit messages and commits, e.g. heapam.h now must know about PlanState *, I think in others this approach can make the architecture a bit cleaner. Specifically, now we can cleanly separate logic for handling tuples from inner and outer sides (see hashjoin), and also separate logic for handling NULL tuples. I haven't yet added the latter, but the idea is that the node below always knows when it is done, so it can call its parent function for handling null tuples directly instead of keeping extra 'if' in generic execProcNode/pushTuple. -- Arseny Sher -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
- 0001-parent-param-added-to-ExecInitNode-parent-field-adde.patch
- 0002-Node-s-interface-functions-stubbed.patch
- 0003-Base-for-reversed-executor.patch
- 0004-Reversed-SeqScan-implementation.patch
- 0005-Reversed-HashJoin-implementation.patch
- 0006-Reversed-Limit-implementation.patch
- 0007-Reversed-hashed-Agg-implementation.patch
- 0008-Reversed-in-memory-Sort-implementation.patch
> While I admire your fearlessness, I think the chances of you being > able to bring a project of this type to a successful conclusion are > remote. Here is what I said about this topic previously: > > http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+vKu21nk3prkRjSdbhjubN7qvc8UKuGg@mail.gmail.com Well, as I said, I don't pretend that I will support full functionality: >> instead, we should decide which part of this work (if any) is >> going to be done in the course of GSoC. Probably, all TPC-H queries >> with and without index support is a good initial target, but this >> needs to be discussed. I think that successfull completion of this project should be a clear and justified answer to the question "Is this idea is good enough to work on merging it into the master?", not the production-ready patches themselves. Nevertheless, of course project success criterion must be reasonably formalized -- e.g. implement nodes X with features Y, etc.
On Wed, Mar 22, 2017 at 8:04 PM, Arseny Sher <sher-ars@ispras.ru> wrote:
> While I admire your fearlessness, I think the chances of you being
> able to bring a project of this type to a successful conclusion are
> remote. Here is what I said about this topic previously:
>
> http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+ vKu21nk3prkRjSdbhjubN7qvc8UKuG g@mail.gmail.com
Well, as I said, I don't pretend that I will support full functionality:
>> instead, we should decide which part of this work (if any) is
>> going to be done in the course of GSoC. Probably, all TPC-H queries
>> with and without index support is a good initial target, but this
>> needs to be discussed.
I think that successfull completion of this project should be a clear
and justified answer to the question "Is this idea is good enough to
work on merging it into the master?", not the production-ready patches
themselves. Nevertheless, of course project success criterion must be
reasonably formalized -- e.g. implement nodes X with features Y, etc.
How many GSoC slots and possible students we have ?
Should we reject this interesting project, which based on several years of research work of academician group in the institute ? May be better help him to reformulate the scope of project and let him work ? I don't know exactly if the results of GSoC project should be committed , but as a research project it's certainly would be useful for the community.
Should we reject this interesting project, which based on several years of research work of academician group in the institute ? May be better help him to reformulate the scope of project and let him work ? I don't know exactly if the results of GSoC project should be committed , but as a research project it's certainly would be useful for the community.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Oleg Bartunov <obartunov@gmail.com> writes: > I don't know exactly if the results of GSoC project should be > committed, Technically, they are not required: https://developers.google.com/open-source/gsoc/faq > Are mentoring organizations required to use the code produced by > students? > No. While we hope that all the code that comes out of this program will > find a happy home, we don’t require organizations to use the student's' > code. -- Arseny Sher
I have cleaned up the code a bit and added the separation I mentioned in a previous mail -- now we there are even three functions instead of old ExecProcNode: one for starting leaf nodes, one for passing tuples and one for signaling that the node has finished its job. It is all described in execProcnode.c. I also rewrote HashJoin without using the explicit state machine. It seems slightly cleaner to me now... Here are updated benchmarks: +-----+-----------+---------+----------+ |query|reversed, s|master, s|speedup, %| +-----+-----------+---------+----------+ |q01 |108.21 |117.88 |8.94 | +-----+-----------+---------+----------+ |q03 |55.48 |58.805 |5.99 | +-----+-----------+---------+----------+ |q04 |78.405 |81.86 |4.41 | +-----+-----------+---------+----------+ |q05 |49.91 |51.18 |2.54 | +-----+-----------+---------+----------+ |q10 |49.215 |52.61 |6.9 | +-----+-----------+---------+----------+ |q12 |63.24 |68.505 |8.33 | +-----+-----------+---------+----------+ |q14 |33.42 |35.31 |5.66 | +-----+-----------+---------+----------+ As before, 24 runs were performed, median taken, scale is 40GB, postgresql.conf is the same. Patches are rebased, now they apply on 4dd3abe99f50. -- Arseny Sher
Time is short, student's application deadline is on 3rd April. I decided to reformulate the project scope myself. Here is the proposal: https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing The main idea is that now there is a formalized goal of the project, "partial support of all TPC-H queries". I am also CC'ing people who was mentioned in "Potential Mentors" section on GSoC wiki page. -- Arseny Sher
On Sun, Apr 2, 2017 at 12:13 AM, Arseny Sher <sher-ars@ispras.ru> wrote:
I'd love to see a comment from Andres Freund who is leading executor performance improvements.
Time is short, student's application deadline is on 3rd April. I decided
to reformulate the project scope myself. Here is the proposal:
https://docs.google.com/document/d/ 1dvBETE6IJA9AcXd11XJNPsF_ VPcDhSjy7rlsxj262l8/edit?usp= sharing
The main idea is that now there is a formalized goal of the project,
"partial support of all TPC-H queries".
I am also CC'ing people who was mentioned in "Potential Mentors" section
on GSoC wiki page.
The Russian Postgres Company
On Thu, Apr 6, 2017 at 8:11 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: >> https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing > I'd love to see a comment from Andres Freund who is leading executor > performance improvements. Note that the final proposal is here: https://summerofcode.withgoogle.com/serve/5874530240167936/ Also, I just entered a comment about an important question that I think needs to be answered right up front. -- Kevin Grittner
Kevin Grittner <kgrittn@gmail.com> writes: > Note that the final proposal is here: > https://summerofcode.withgoogle.com/serve/5874530240167936/ I'm just getting a blank page at that URL? regards, tom lane
Sorry, I didn't notice that this was going to a public list. That URL is only available to people who signed up as mentors for PostgreSQL GSoC participation this year. Does the link to the draft work for you? -- Kevin Grittner
On 22 March 2017 at 14:58, Oleg Bartunov <obartunov@gmail.com> wrote: > Should we reject this interesting project, which based on several years of > research work of academician group in the institute ? May be better help him > to reformulate the scope of project and let him work ? I don't know exactly > if the results of GSoC project should be committed , but as a research > project it's certainly would be useful for the community. +1 Arseny, thank you for your contributions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services