Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id d759ee8f-1fc8-4ca2-b192-e28405e9a202@enterprisedb.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On 5/2/24 20:22, Tomas Vondra wrote:
>> 
>>> For some of the opclasses it can regress (like the jsonb_path_ops). I
>>> don't think that's a major issue. Or more precisely, I'm not surprised
>>> by it. It'd be nice to be able to disable the parallel builds in these
>>> cases somehow, but I haven't thought about that.
>>
>> Do you know why it regresses?
>>
> 
> No, but one thing that stands out is that the index is much smaller than
> the other columns/opclasses, and the compression does not save much
> (only about 5% for both phases). So I assume it's the overhead of
> writing writing and reading a bunch of GB of data without really gaining
> much from doing that.
> 

I finally got to look into this regression, but I think I must have done
something wrong before because I can't reproduce it. This is the timings
I get now, if I rerun the benchmark:

     workers       trgm   tsvector     jsonb  jsonb (hash)
    -------------------------------------------------------
           0       1225        404       104            56
           1        772        180        57            60
           2        549        143        47            52
           3        426        127        43            50
           4        364        116        40            48
           5        323        111        38            46
           6        292        111        37            45

and the speedup, relative to serial build:


     workers       trgm   tsvector      jsonb  jsonb (hash)
    --------------------------------------------------------
           1        63%        45%        54%          108%
           2        45%        35%        45%           94%
           3        35%        31%        41%           89%
           4        30%        29%        38%           86%
           5        26%        28%        37%           83%
           6        24%        28%        35%           81%

So there's a small regression for the jsonb_path_ops opclass, but only
with one worker. After that, it gets a bit faster than serial build.
While not a great speedup, it's far better than the earlier results that
showed maybe 40% regression.

I don't know what I did wrong before - maybe I had a build with an extra
debug info or something like that? No idea why would that affect only
one of the opclasses. But this time I made doubly sure the results are
correct etc.

Anyway, I'm fairly happy with these results. I don't think it's
surprising there are cases where parallel build does not help much.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: First draft of PG 17 release notes
Next
From: Bruce Momjian
Date:
Subject: Re: First draft of PG 17 release notes