Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id 03abcca0-47b2-4bc1-be05-6c1a3f1c5511@enterprisedb.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Andy Fan <zhihuifan1213@163.com>)
Responses Re: Parallel CREATE INDEX for GIN indexes
List pgsql-hackers
Hi Andy,

Thanks for the review. Here's an updated patch series, addressing most
of the points you've raised - I've kept them in "fixup" patches for now,
should be merged into 0001.

More detailed responses below.

On 5/28/24 11:29, Andy Fan wrote:
> 
> Hi Tomas,
> 
> I have completed my first round of review, generally it looks good to
> me, more testing need to be done in the next days. Here are some tiny
> comments from my side, just FYI. 
> 
> 1. Comments about GinBuildState.bs_leader looks not good for me. 
>    
>     /*
>      * bs_leader is only present when a parallel index build is performed, and
>      * only in the leader process. (Actually, only the leader process has a
>      * GinBuildState.)
>      */
>     GinLeader  *bs_leader;
> 
> In the worker function _gin_parallel_build_main:
> initGinState(&buildstate.ginstate, indexRel); is called, and the
> following members in workers at least: buildstate.funcCtx,
> buildstate.accum and so on.  So is the comment "only the leader process
> has a GinBuildState" correct?
> 

Yeah, this is misleading. I don't remember what exactly was my reasoning
for this wording, I've removed the comment.

> 2. progress argument is not used?
> _gin_parallel_scan_and_build(GinBuildState *state,
>                              GinShared *ginshared, Sharedsort *sharedsort,
>                              Relation heap, Relation index,
>                              int sortmem, bool progress)
> 

I've modified the code to use the progress flag, but now that I look at
it I'm a bit unsure I understand the purpose of this. I've modeled this
after what the btree does, and I see that there are two places calling
_bt_parallel_scan_and_sort:

1) _bt_leader_participate_as_worker: progress=true

2) _bt_parallel_build_main: progress=false

Isn't that a bit weird? AFAIU the progress will be updated only by the
leader, but will that progress be correct? And doesn't that means the if
the leader does not participate as a worker, the progress won't be updated?

FWIW The parallel BRIN code has the same issue - it's not using the
progress flag in _brin_parallel_scan_and_build.

> 
> 3. In function tuplesort_begin_index_gin, comments about nKeys takes me
> some time to think about why 1 is correct(rather than
> IndexRelationGetNumberOfKeyAttributes) and what does the "only the index
> key" means. 
> 
> base->nKeys = 1;            /* Only the index key */
> 
> finally I think it is because gin index stores each attribute value into
> an individual index entry for a multi-column index, so each index entry
> has only 1 key.  So we can comment it as the following? 
> 
> "Gin Index stores the value of each attribute into different index entry
> for mutli-column index, so each index entry has only 1 key all the
> time." This probably makes it easier to understand.
> 

OK, I see what you mean. The other tuplesort_begin_ functions nearby
have similar comments, but you're right GIN is a bit special in that it
"splits" multi-column indexes into individual index entries. I've added
a comment (hopefully) clarifying this.

> 
> 4. GinBuffer: The comment "Similar purpose to BuildAccumulator, but much
> simpler." makes me think why do we need a simpler but
> similar structure, After some thoughts, they are similar at accumulating
> TIDs only. GinBuffer is designed for "same key value" (hence
> GinBufferCanAddKey). so IMO, the first comment is good enough and the 2 
> comments introduce confuses for green hand and is potential to remove
> it. 
> 
> /*
>  * State used to combine accumulate TIDs from multiple GinTuples for the same
>  * key value.
>  *
>  * XXX Similar purpose to BuildAccumulator, but much simpler.
>  */
> typedef struct GinBuffer
> 

I've updated the comment explaining the differences a bit clearer.

> 
> 5. GinBuffer: ginMergeItemPointers always allocate new memory for the
> new items and hence we have to pfree old memory each time. However it is
> not necessary in some places, for example the new items can be appended
> to Buffer->items (and this should be a common case). So could we
> pre-allocate some spaces for items and reduce the number of pfree/palloc
> and save some TID items copy in the desired case?
> 

Perhaps, but that seems rather independent of this patch.

Also, I'm not sure how much would this optimization matter in practice.
The merge should happens fairly rarely, when we decide to store the TIDs
into the index. And then it's also subject to the caching built into the
memory contexts, limiting the malloc costs. We'll still pay for the
memcpy, of course.

Anyway, it's an optimization that would affect existing callers of
ginMergeItemPointers. I don't plan to tweak this in this patch.


> 6. GinTuple.ItemPointerData first;        /* first TID in the array */
> 
> is ItemPointerData.ip_blkid good enough for its purpose?  If so, we can
> save the memory for OffsetNumber for each GinTuple.
> 
> Item 5) and 6) needs some coding and testing. If it is OK to do, I'd
> like to take it as an exercise in this area. (also including the item
> 1~4.)
> 

It might save 2 bytes in the struct, but that's negligible compared to
the memory usage overall (we only keep one GinTuple, but many TIDs and
so on), and we allocate the space in power-of-2 pattern anyway (which
means the 2B won't matter).

Moreover, using just the block number would make it harder to compare
the TIDs (now we can just call ItemPointerCompare).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: Avoid orphaned objects dependencies, take 3
Next
From: Shubham Khanna
Date:
Subject: Re: Pgoutput not capturing the generated columns