Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Parallel CREATE INDEX for GIN indexes |
Date | |
Msg-id | 87jzjes2ia.fsf@163.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Parallel CREATE INDEX for GIN indexes
|
List | pgsql-hackers |
Hi Tomas, I have completed my first round of review, generally it looks good to me, more testing need to be done in the next days. Here are some tiny comments from my side, just FYI. 1. Comments about GinBuildState.bs_leader looks not good for me. /* * bs_leader is only present when a parallel index build is performed, and * only in the leader process. (Actually, only the leader process has a * GinBuildState.) */ GinLeader *bs_leader; In the worker function _gin_parallel_build_main: initGinState(&buildstate.ginstate, indexRel); is called, and the following members in workers at least: buildstate.funcCtx, buildstate.accum and so on. So is the comment "only the leader process has a GinBuildState" correct? 2. progress argument is not used? _gin_parallel_scan_and_build(GinBuildState *state, GinShared *ginshared, Sharedsort *sharedsort, Relation heap, Relation index, int sortmem, bool progress) 3. In function tuplesort_begin_index_gin, comments about nKeys takes me some time to think about why 1 is correct(rather than IndexRelationGetNumberOfKeyAttributes) and what does the "only the index key" means. base->nKeys = 1; /* Only the index key */ finally I think it is because gin index stores each attribute value into an individual index entry for a multi-column index, so each index entry has only 1 key. So we can comment it as the following? "Gin Index stores the value of each attribute into different index entry for mutli-column index, so each index entry has only 1 key all the time." This probably makes it easier to understand. 4. GinBuffer: The comment "Similar purpose to BuildAccumulator, but much simpler." makes me think why do we need a simpler but similar structure, After some thoughts, they are similar at accumulating TIDs only. GinBuffer is designed for "same key value" (hence GinBufferCanAddKey). so IMO, the first comment is good enough and the 2 comments introduce confuses for green hand and is potential to remove it. /* * State used to combine accumulate TIDs from multiple GinTuples for the same * key value. * * XXX Similar purpose to BuildAccumulator, but much simpler. */ typedef struct GinBuffer 5. GinBuffer: ginMergeItemPointers always allocate new memory for the new items and hence we have to pfree old memory each time. However it is not necessary in some places, for example the new items can be appended to Buffer->items (and this should be a common case). So could we pre-allocate some spaces for items and reduce the number of pfree/palloc and save some TID items copy in the desired case? 6. GinTuple.ItemPointerData first; /* first TID in the array */ is ItemPointerData.ip_blkid good enough for its purpose? If so, we can save the memory for OffsetNumber for each GinTuple. Item 5) and 6) needs some coding and testing. If it is OK to do, I'd like to take it as an exercise in this area. (also including the item 1~4.) -- Best Regards Andy Fan
pgsql-hackers by date: