Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Parallel CREATE INDEX for BRIN indexes |
Date | |
Msg-id | b1fb43f8-e9a0-9e34-fbce-590463520506@enterprisedb.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Parallel CREATE INDEX for BRIN indexes
|
List | pgsql-hackers |
On 11/29/23 15:42, Matthias van de Meent wrote: > On Tue, 28 Nov 2023 at 18:59, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 11/28/23 16:39, Matthias van de Meent wrote: >>> On Thu, 23 Nov 2023 at 14:35, Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> On 11/23/23 13:33, Matthias van de Meent wrote: >>>>> The union operator may leak (lots of) memory, so I think it makes >>>>> sense to keep a context around that can be reset after we've extracted >>>>> the merge result. >>>>> >>>> >>>> But does the current code actually achieve that? It does create a "brin >>>> union" context, but then it only does this: >>>> >>>> /* Use our own memory context to avoid retail pfree */ >>>> cxt = AllocSetContextCreate(CurrentMemoryContext, >>>> "brin union", >>>> ALLOCSET_DEFAULT_SIZES); >>>> oldcxt = MemoryContextSwitchTo(cxt); >>>> db = brin_deform_tuple(bdesc, b, NULL); >>>> MemoryContextSwitchTo(oldcxt); >>>> >>>> Surely that does not limit the amount of memory used by the actual union >>>> functions in any way? >>> >>> Oh, yes, of course. For some reason I thought that covered the calls >>> to the union operator function too, but it indeed only covers >>> deserialization. I do think it is still worthwhile to not do the >>> create/delete cycle, but won't hold the patch back for that. >>> >> >> I think the union_tuples() changes are better left for a separate patch. >> >>>>>> However, I don't think the number of union_tuples calls is likely to be >>>>>> very high, especially for large tables. Because we split the table into >>>>>> 2048 chunks, and then cap the chunk size by 8192. For large tables >>>>>> (where this matters) we're likely close to 8192. >>>>> >>>>> I agree that the merging part of the index creation is the last part, >>>>> and usually has no high impact on the total performance of the reindex >>>>> operation, but in memory-constrained environments releasing and then >>>>> requesting the same chunk of memory over and over again just isn't >>>>> great. >>>> >>>> OK, I'll take a look at the scratch context you suggested. >>>> >>>> My point however was we won't actually do that very often, because on >>>> large tables the BRIN ranges are likely smaller than the parallel scan >>>> chunk size, so few overlaps. OTOH if the table is small, or if the BRIN >>>> ranges are large, there'll be few of them. >>> >>> That's true, so maybe I'm concerned about something that amounts to >>> only marginal gains. >>> >> >> However, after thinking about this a bit more, I think we actually do >> need to do something about the memory management when merging tuples. >> AFAIK the general assumption was that union_tuple() only runs for a >> single range, and then the whole context gets freed. > > Correct, but it is also is (or should be) assumed that union_tuple > will be called several times in the same context to fix repeat > concurrent updates. Presumably, that only happens rarely, but it's > something that should be kept in mind regardless. > In theory, yes. But union_tuples() is used only in summarize_range(), and that only processes a single page range. >> But the way the >> merging was implemented, it all runs in a single context. And while a >> single union_tuple() may not need a lot memory, in total it may be >> annoying. I just added a palloc(1MB) into union_tuples and ended up with >> ~160MB allocated in the PortalContext on just 2GB table. In practice the >> memory will grow more slowly, but not great :-/ >> >> The attached 0003 patch adds a memory context that's reset after >> producing a merged BRIN tuple for each page range. > > Looks good. > > This also made me think a bit more about how we're working with the > tuples. With your latest patch, we always deserialize and re-serialize > the sorted brin tuples, just in case the next tuple will also be a > BRIN tuple of the same page range. Could we save some of that > deserialization time by optimistically expecting that we're not going > to need to merge the tuple and only store a local copy of it locally? > See attached 0002; this saves some cycles in common cases. > Good idea! > The v20231128 version of the patchset (as squashed, attached v5-0001) > looks good to me. > Cool. I'll put this through a bit more stress testing, and then I'll get it pushed. Thanks for the reviews! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: