Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Parallel CREATE INDEX for GIN indexes |
Date | |
Msg-id | 87o777jrcc.fsf@163.com Whole thread Raw |
In response to | Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Parallel CREATE INDEX for GIN indexes
|
List | pgsql-hackers |
Andy Fan <zhihuifan1213@163.com> writes: I just realize all my replies is replied to sender only recently, probably because I upgraded the email cient and the short-cut changed sliently, resent the lastest one only.... >>> Suppose RBTree's output is: >>> >>> batch-1 at RBTree: >>> 1 [tid1, tid8, tid100] >>> 2 [tid1, tid9, tid800] >>> ... >>> 78 [tid23, tid99, tid800] >>> >>> batch-2 at RBTree >>> 1 [tid1001, tid1203, tid1991] >>> ... >>> ... >>> 97 [tid1023, tid1099, tid1800] >>> >>> Since all the tuples in each batch (1, 2, .. 78) are sorted already, we >>> can just flush them into tuplesort as a 'run' *without any sorts*, >>> however within this way, it is possible to produce more 'runs' than what >>> you did in your patch. >>> >> >> Oh! Now I think I understand what you were proposing - you're saying >> that when dumping the RBTree to the tuplesort, we could tell the >> tuplesort this batch of tuples is already sorted, and tuplesort might >> skip some of the work when doing the sort later. >> >> I guess that's true, and maybe it'd be useful elsewhere, I still think >> this could be left as a future improvement. Allowing it seems far from >> trivial, and it's not quite clear if it'd be a win (it might interfere >> with the existing sort code in unexpected ways). > > Yes, and I agree that can be done later and I'm thinking Matthias's > proposal is more promising now. > >>> new way: the No. of batch depends on size of RBTree's batch size. >>> existing way: the No. of batch depends on size of work_mem in tuplesort. >>> Usually the new way would cause more no. of runs which is harmful for >>> mergeruns. so I can't say it is an improve of not and not include it in >>> my previous patch. >>> >>> however case 1 sounds a good canidiates for this method. >>> >>> Tuples from state->bs_worker_state after the perform_sort and ctid >>> merge: >>> >>> 1 [tid1, tid8, tid100, tid1001, tid1203, tid1991] >>> 2 [tid1, tid9, tid800] >>> 78 [tid23, tid99, tid800] >>> 97 [tid1023, tid1099, tid1800] >>> >>> then when we move tuples to bs_sort_state, a). we don't need to sort at >>> all. b). we can merge all of them into 1 run which is good for mergerun >>> on leader as well. That's the thing I did in the previous patch. >>> >> >> I'm sorry, I don't understand what you're proposing. Could you maybe >> elaborate in more detail? > > After we called "tuplesort_performsort(state->bs_worker_sort);" in > _gin_process_worker_data, all the tuples in bs_worker_sort are sorted > already, and in the same function _gin_process_worker_data, we have > code: > > while ((tup = tuplesort_getgintuple(worker_sort, &tuplen, true)) != NULL) > { > > ....(1) > > tuplesort_putgintuple(state->bs_sortstate, ntup, ntuplen); > > } > > and later we called 'tuplesort_performsort(state->bs_sortstate);'. Even > we have some CTID merges activity in '....(1)', the tuples are still > ordered, so the sort (in both tuplesort_putgintuple and > 'tuplesort_performsort) are not necessary, what's more, in the each of > 'flush-memory-to-disk' in tuplesort, it create a 'sorted-run', and in > this case, acutally we only need 1 run only since all the input tuples > in the worker is sorted. The reduction of 'sort-runs' in worker will be > helpful to leader's final mergeruns. the 'sorted-run' benefit doesn't > exist for the case-1 (RBTree -> worker_state). > > If Matthias's proposal is adopted, my optimization will not be useful > anymore and Matthias's porposal looks like a more natural and effecient > way. -- Best Regards Andy Fan
pgsql-hackers by date: