Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id 87ikvkgdcb.fsf@163.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
Tomas Vondra <tomas@vondra.me> writes:

Hi Tomas,

> Yeah. I think we have agreement on 0001-0007.

Yes, the design of 0001-0007 looks good to me and because of the
existing compexitity, I want to foucs on this part for now. I am doing
code review from yesterday, and now my work is done.  Just some small
questions: 

1. In GinBufferStoreTuple,

    /*
     * Check if the last TID in the current list is frozen. This is the case
     * when merging non-overlapping lists, e.g. in each parallel worker.
     */
    if ((buffer->nitems > 0) &&
        (ItemPointerCompare(&buffer->items[buffer->nitems - 1], &tup->first) == 0))
        buffer->nfrozen = buffer->nitems;

should we do (ItemPointerCompare(&buffer->items[buffer->nitems - 1],
&tup->first) "<=" 0), rather than "=="? 

2. Given the "non-overlap" case should be the major case
GinBufferStoreTuple , does it deserve a fastpath for it before calling
ginMergeItemPointers since ginMergeItemPointers have a unconditionally
memory allocation directly, and later we pfree it?

new = ginMergeItemPointers(&buffer->items[buffer->nfrozen], /* first unfronzen */
                   (buffer->nitems - buffer->nfrozen),    /* num of unfrozen */
               items, tup->nitems, &nnew);


3. The following comment in index_build is out-of-date now :)

    /*
     * Determine worker process details for parallel CREATE INDEX.  Currently,
     * only btree has support for parallel builds.
     *

4. Comments - Buffer is not empty and it's storing "a different key"
looks wrong to me. the key may be same and we just need to flush them
because of memory usage. There is the same issue in both
_gin_process_worker_data and _gin_parallel_merge.  

        if (GinBufferShouldTrim(buffer, tup))
        {
            Assert(buffer->nfrozen > 0);

            state->buildStats.nTrims++;

            /*
             * Buffer is not empty and it's storing a different key - flush
             * the data into the insert, and start a new entry for current
             * GinTuple.
             */
            AssertCheckItemPointers(buffer, true);

I also run valgrind testing with some testcase, no memory issue is
found. 

> I'm a bit torn about 0008, I have not expected changing tuplesort like
> this when I started working 
> on the patch, but I can't deny it's a massive speedup for some cases
> (where the patch doesn't help otherwise). But then in other cases it
> doesn't help at all, and 0010 helps.

Yes, I'd like to see these improvements both 0008 and 0010 as a
dedicated improvement. 

-- 
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Tender Wang
Date:
Subject: Re: Eager aggregation, take 3
Next
From: shveta malik
Date:
Subject: Re: Conflict Detection and Resolution