On 5/2/24 20:22, Tomas Vondra wrote:
>>
>>> For some of the opclasses it can regress (like the jsonb_path_ops). I
>>> don't think that's a major issue. Or more precisely, I'm not surprised
>>> by it. It'd be nice to be able to disable the parallel builds in these
>>> cases somehow, but I haven't thought about that.
>>
>> Do you know why it regresses?
>>
>
> No, but one thing that stands out is that the index is much smaller than
> the other columns/opclasses, and the compression does not save much
> (only about 5% for both phases). So I assume it's the overhead of
> writing writing and reading a bunch of GB of data without really gaining
> much from doing that.
>
I finally got to look into this regression, but I think I must have done
something wrong before because I can't reproduce it. This is the timings
I get now, if I rerun the benchmark:
workers trgm tsvector jsonb jsonb (hash)
-------------------------------------------------------
0 1225 404 104 56
1 772 180 57 60
2 549 143 47 52
3 426 127 43 50
4 364 116 40 48
5 323 111 38 46
6 292 111 37 45
and the speedup, relative to serial build:
workers trgm tsvector jsonb jsonb (hash)
--------------------------------------------------------
1 63% 45% 54% 108%
2 45% 35% 45% 94%
3 35% 31% 41% 89%
4 30% 29% 38% 86%
5 26% 28% 37% 83%
6 24% 28% 35% 81%
So there's a small regression for the jsonb_path_ops opclass, but only
with one worker. After that, it gets a bit faster than serial build.
While not a great speedup, it's far better than the earlier results that
showed maybe 40% regression.
I don't know what I did wrong before - maybe I had a build with an extra
debug info or something like that? No idea why would that affect only
one of the opclasses. But this time I made doubly sure the results are
correct etc.
Anyway, I'm fairly happy with these results. I don't think it's
surprising there are cases where parallel build does not help much.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company