Thread: WIP: avoiding tuple construction/deconstruction overhead
Attached is the current state of a patch to reduce the overhead of passing tuple data up through many levels of plan nodes. It's not tested enough to apply yet, but I thought I'd put it out for comment. It seems to get about a factor of 4 speedup on Miroslav's nested-joins example (above and beyond what we got from Atsushi Ogawa's patch). The basic point of the patch is to allow a TupleTableSlot to contain a "virtual" tuple instead of a regular heap tuple. The virtual tuple is just an array of Datums, with any pass-by-reference Datums pointing at original storage (either a lower-level slot or expression result storage). This representation is essentially the raw output of ExecProject. This not only avoids the overhead of forming the data into a tuple (heap_formtuple) but also saves cycles when extracting the data at the next level up, since we can just grab the Datums directly. (This behavior builds on and shares code with Ogawa's patch to cache extracted Datums in TupleTableSlots. When a slot contains a physical tuple, the same Datum arrays cache any Datums extracted from it.) Since a slot may or may not contain a regular tuple, you can't just grab slot->val anymore; there are new API functions ExecCopySlotTuple() and ExecFetchSlotTuple() (the former when you want to make your own copy, the latter when you don't). These force construction of a real tuple if the slot is virtual. I also made an ExecCopySlot() convenience routine for the common case of copying one slot's contents into another slot. A related API modification is to change tuple receivers (DestReceivers) to receive a TupleTableSlot instead of separate tuple and tuple descriptor parameters. This makes it possible to avoid an unnecessary tuple construction/deconstruction at the final output phase as well. It also turned out to be useful to make a short-circuit path for ExecProject when the targetlist is entirely simple Vars. This only requires copying Datums from lower to upper slots, and we can implement it that way instead of going through ExecEvalExpr. Finally, I have made some progress towards making the tuple access routines consistently use "bool isNull" arrays as null markers, instead of the char 'n' or ' ' convention that was previously used in some but not all contexts. I don't think we can retire heap_formtuple or heap_modifytuple for a long time, if ever, but we can deprecate them in favor of the parallel new routines with the bool interface. Comments? regards, tom lane
Attachment
Tom Lane wrote: > Attached is the current state of a patch to reduce the overhead of > passing tuple data up through many levels of plan nodes. It is a good idea. I think that this patch improves performance of the whole executor. I have three comments. (1)We can improve compare_heap() by using TableTupleSlot instead of HeapTuple. Please see attached patch. (2)In ExecStoreTuple(), we can omit initialization of slot->tts_nvalid. If slot->tts_isempty is false, tts_nvalid is initialized by ExecClearTuple(). If it is true, tts_nvalid is always zero. (3)There is a description of slot->val in comment of execTuple.c. This had better revise it. > Finally, I have made some progress towards making the tuple access > routines consistently use "bool isNull" arrays as null markers, instead > of the char 'n' or ' ' convention that was previously used in some but > not all contexts. I agree. I think that this progress improves readability. regards, --- Atsushi Ogawa
Attachment
a_ogawa <a_ogawa@hi-ho.ne.jp> writes: > (1)We can improve compare_heap() by using TableTupleSlot instead of > HeapTuple. Please see attached patch. Did you measure any performance improvement from that? I considered it but thought it would likely be a wash or a loss, because in most cases only one attribute will be pulled from a tuple during comparetup_heap. slot_getattr cannot improve on heap_getattr in that case, and is quite likely to be slower. > (3)There is a description of slot->val in comment of execTuple.c. > This had better revise it. Drat, how'd I miss that? Thanks. regards, tom lane
Tom Lane wrote: > a_ogawa <a_ogawa@hi-ho.ne.jp> writes: > > (1)We can improve compare_heap() by using TableTupleSlot instead of > > HeapTuple. Please see attached patch. > > Did you measure any performance improvement from that? I considered it > but thought it would likely be a wash or a loss, because in most cases > only one attribute will be pulled from a tuple during comparetup_heap. > slot_getattr cannot improve on heap_getattr in that case, and is quite > likely to be slower. I measured performance of heap_getattr and slot_getattr in comparetup_heap. I made the table which had ten varchar attributes, and registered data for tests. (Attached file includes SQL doing this.) I carried out the following tests. (case 1) test1: select * from sort_test order by v1 limit 100; test2: select * from sort_test order by v1, v2 limit 100; test3: select * from sort_test order by v1, v2, v3 limit 100; test4: select * from sort_test order by v1, v2, v3, v4 limit 100; test5: select * from sort_test order by v1, v2, v3, v4, v5 limit 100; result: test1 test2 test3 test4 test5 ----------------------------------------------------------------------- heap_getattr 2.149s 2.602s 3.204s 3.830s 4.159s slot_getattr 2.523s 3.422s 3.977s 4.453s 4.721s (case 2) test1: select * from sort_test order by v10 limit 100; test2: select * from sort_test order by v10, v9 limit 100; test3: select * from sort_test order by v10, v9, v8 limit 100; test4: select * from sort_test order by v10, v9, v8, v7 limit 100; test5: select * from sort_test order by v10, v9, v8, v7, v6 limit 100; result: test1 test2 test3 test4 test5 ----------------------------------------------------------------------- heap_getattr 3.654s 5.549s 6.575s 7.367s 7.870s slot_getattr 4.027s 4.930s 5.249s 5.555s 5.756s (case 3) test1: select * from sort_test order by v5 limit 100; test2: select * from sort_test order by v5, v6 limit 100; test3: select * from sort_test order by v5, v6, v7 limit 100; test4: select * from sort_test order by v5, v6, v7, v8 limit 100; test5: select * from sort_test order by v5, v6, v7, v8, v9 limit 100; result: test1 test2 test3 test4 test5 ----------------------------------------------------------------------- heap_getattr 2.657s 4.207s 5.194s 6.179s 6.662s slot_getattr 3.126s 4.233s 4.806s 5.271s 5.557s In most cases, heap_getattr is fast. When the following conditions occurred, slot_getattr is fast. (1)Tuple have varlen attributes. (2)Sort key have more than two attributes. (3)A position of a sort key is far from the head of tuple. (4)As for the data of a sort key, there be many repetition. Actually it will be rare that these conditions are occurred. Thinking from a result, I think that we had better continue using heap_getattr in comparetup_heap. regards, --- Atsushi Ogawa
Attachment
I am very excited there has been so much reduction in tuple processing overhead in the past few weeks. This is always and area I thought needed improvement, and its great to see it. We will certainly have some big performance improvements in 8.1 because we already have several (e.g. SMP) and we have many more months to go. --------------------------------------------------------------------------- a_ogawa wrote: > > Tom Lane wrote: > > a_ogawa <a_ogawa@hi-ho.ne.jp> writes: > > > (1)We can improve compare_heap() by using TableTupleSlot instead of > > > HeapTuple. Please see attached patch. > > > > Did you measure any performance improvement from that? I considered it > > but thought it would likely be a wash or a loss, because in most cases > > only one attribute will be pulled from a tuple during comparetup_heap. > > slot_getattr cannot improve on heap_getattr in that case, and is quite > > likely to be slower. > > I measured performance of heap_getattr and slot_getattr in > comparetup_heap. > > I made the table which had ten varchar attributes, and registered > data for tests. (Attached file includes SQL doing this.) > I carried out the following tests. > > (case 1) > test1: select * from sort_test order by v1 limit 100; > test2: select * from sort_test order by v1, v2 limit 100; > test3: select * from sort_test order by v1, v2, v3 limit 100; > test4: select * from sort_test order by v1, v2, v3, v4 limit 100; > test5: select * from sort_test order by v1, v2, v3, v4, v5 limit 100; > > result: test1 test2 test3 test4 test5 > ----------------------------------------------------------------------- > heap_getattr 2.149s 2.602s 3.204s 3.830s 4.159s > slot_getattr 2.523s 3.422s 3.977s 4.453s 4.721s > > (case 2) > test1: select * from sort_test order by v10 limit 100; > test2: select * from sort_test order by v10, v9 limit 100; > test3: select * from sort_test order by v10, v9, v8 limit 100; > test4: select * from sort_test order by v10, v9, v8, v7 limit 100; > test5: select * from sort_test order by v10, v9, v8, v7, v6 limit 100; > > result: test1 test2 test3 test4 test5 > ----------------------------------------------------------------------- > heap_getattr 3.654s 5.549s 6.575s 7.367s 7.870s > slot_getattr 4.027s 4.930s 5.249s 5.555s 5.756s > > (case 3) > test1: select * from sort_test order by v5 limit 100; > test2: select * from sort_test order by v5, v6 limit 100; > test3: select * from sort_test order by v5, v6, v7 limit 100; > test4: select * from sort_test order by v5, v6, v7, v8 limit 100; > test5: select * from sort_test order by v5, v6, v7, v8, v9 limit 100; > > result: test1 test2 test3 test4 test5 > ----------------------------------------------------------------------- > heap_getattr 2.657s 4.207s 5.194s 6.179s 6.662s > slot_getattr 3.126s 4.233s 4.806s 5.271s 5.557s > > In most cases, heap_getattr is fast. > When the following conditions occurred, slot_getattr is fast. > (1)Tuple have varlen attributes. > (2)Sort key have more than two attributes. > (3)A position of a sort key is far from the head of tuple. > (4)As for the data of a sort key, there be many repetition. > Actually it will be rare that these conditions are occurred. > > Thinking from a result, I think that we had better continue using > heap_getattr in comparetup_heap. > > regards, > > --- Atsushi Ogawa [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073