Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Parallel CREATE INDEX for BRIN indexes
Date
Msg-id CAEze2WhCSKQDhP3nyARWH5Zf+B-5rtDteA7d4QSDa-BuSSkSPA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for BRIN indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Parallel CREATE INDEX for BRIN indexes
List pgsql-hackers
On Thu, 23 Nov 2023 at 14:35, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 11/23/23 13:33, Matthias van de Meent wrote:
>> The union operator may leak (lots of) memory, so I think it makes
>> sense to keep a context around that can be reset after we've extracted
>> the merge result.
>>
>
> But does the current code actually achieve that? It does create a "brin
> union" context, but then it only does this:
>
>     /* Use our own memory context to avoid retail pfree */
>     cxt = AllocSetContextCreate(CurrentMemoryContext,
>                                 "brin union",
>                                 ALLOCSET_DEFAULT_SIZES);
>     oldcxt = MemoryContextSwitchTo(cxt);
>     db = brin_deform_tuple(bdesc, b, NULL);
>     MemoryContextSwitchTo(oldcxt);
>
> Surely that does not limit the amount of memory used by the actual union
> functions in any way?

Oh, yes, of course. For some reason I thought that covered the calls
to the union operator function too, but it indeed only covers
deserialization. I do think it is still worthwhile to not do the
create/delete cycle, but won't hold the patch back for that.

>>> However, I don't think the number of union_tuples calls is likely to be
>>> very high, especially for large tables. Because we split the table into
>>> 2048 chunks, and then cap the chunk size by 8192. For large tables
>>> (where this matters) we're likely close to 8192.
>>
>> I agree that the merging part of the index creation is the last part,
>> and usually has no high impact on the total performance of the reindex
>> operation, but in memory-constrained environments releasing and then
>> requesting the same chunk of memory over and over again just isn't
>> great.
>
> OK, I'll take a look at the scratch context you suggested.
>
> My point however was we won't actually do that very often, because on
> large tables the BRIN ranges are likely smaller than the parallel scan
> chunk size, so few overlaps. OTOH if the table is small, or if the BRIN
> ranges are large, there'll be few of them.

That's true, so maybe I'm concerned about something that amounts to
only marginal gains.

I noticed that the v4 patch doesn't yet update the documentation in
indexam.sgml with am->amcanbuildparallel.
Once that is included and reviewed I think this will be ready, unless
you want to address any of my comments upthread (that I marked with
'not in this patch') in this patch.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: "Tristan Partin"
Date:
Subject: Re: SSL tests fail on OpenSSL v3.2.0
Next
From: Tom Lane
Date:
Subject: Re: SSL tests fail on OpenSSL v3.2.0