Home > mailing lists

Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Parallel CREATE INDEX for GIN indexes
Date	May 28, 2024 09:29:48
Msg-id	87jzjes2ia.fsf@163.com Whole thread Raw
In response to	Re: Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Parallel CREATE INDEX for GIN indexes
List	pgsql-hackers

Tree view

Hi Tomas,

I have completed my first round of review, generally it looks good to
me, more testing need to be done in the next days. Here are some tiny
comments from my side, just FYI. 

1. Comments about GinBuildState.bs_leader looks not good for me. 
   
    /*
     * bs_leader is only present when a parallel index build is performed, and
     * only in the leader process. (Actually, only the leader process has a
     * GinBuildState.)
     */
    GinLeader  *bs_leader;

In the worker function _gin_parallel_build_main:
initGinState(&buildstate.ginstate, indexRel); is called, and the
following members in workers at least: buildstate.funcCtx,
buildstate.accum and so on.  So is the comment "only the leader process
has a GinBuildState" correct?

2. progress argument is not used?
_gin_parallel_scan_and_build(GinBuildState *state,
                             GinShared *ginshared, Sharedsort *sharedsort,
                             Relation heap, Relation index,
                             int sortmem, bool progress)


3. In function tuplesort_begin_index_gin, comments about nKeys takes me
some time to think about why 1 is correct(rather than
IndexRelationGetNumberOfKeyAttributes) and what does the "only the index
key" means. 

base->nKeys = 1;            /* Only the index key */

finally I think it is because gin index stores each attribute value into
an individual index entry for a multi-column index, so each index entry
has only 1 key.  So we can comment it as the following? 

"Gin Index stores the value of each attribute into different index entry
for mutli-column index, so each index entry has only 1 key all the
time." This probably makes it easier to understand.


4. GinBuffer: The comment "Similar purpose to BuildAccumulator, but much
simpler." makes me think why do we need a simpler but
similar structure, After some thoughts, they are similar at accumulating
TIDs only. GinBuffer is designed for "same key value" (hence
GinBufferCanAddKey). so IMO, the first comment is good enough and the 2 
comments introduce confuses for green hand and is potential to remove
it. 

/*
 * State used to combine accumulate TIDs from multiple GinTuples for the same
 * key value.
 *
 * XXX Similar purpose to BuildAccumulator, but much simpler.
 */
typedef struct GinBuffer


5. GinBuffer: ginMergeItemPointers always allocate new memory for the
new items and hence we have to pfree old memory each time. However it is
not necessary in some places, for example the new items can be appended
to Buffer->items (and this should be a common case). So could we
pre-allocate some spaces for items and reduce the number of pfree/palloc
and save some TID items copy in the desired case?

6. GinTuple.ItemPointerData first;        /* first TID in the array */

is ItemPointerData.ip_blkid good enough for its purpose?  If so, we can
save the memory for OffsetNumber for each GinTuple.

Item 5) and 6) needs some coding and testing. If it is OK to do, I'd
like to take it as an exercise in this area. (also including the item
1~4.)


-- 
Best Regards
Andy Fan

pgsql-hackers by date:

From: Nisha Moond
Date: 28 May 2024, 09:17:16
Subject: Re: Conflict Detection and Resolution

From: Andy Fan
Date: 28 May 2024, 09:47:17
Subject: Re: why memoize is not used for correlated subquery

Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

Previous

Next