Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Parallel CREATE INDEX for BRIN indexes |
Date | |
Msg-id | CAEze2WjugXkONAe9RZSgoVipHX5m6c0LnDZ=FVgWs25Fk-zX2A@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for BRIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Parallel CREATE INDEX for BRIN indexes
|
List | pgsql-hackers |
On Wed, 8 Nov 2023 at 12:03, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > here's an updated patch, addressing the review comments, and reworking > how the work is divided between the workers & leader etc. > > 0001 is just v2, rebased to current master > > 0002 and 0003 address most of the issues, in particular it > > - removes the unnecessary spool > - fixes bs_reltuples type to double > - a couple comments are reworded to be clearer > - changes loop/condition in brinbuildCallbackParallel > - removes asserts added for debugging > - fixes cast in comparetup_index_brin > - 0003 then simplifies comparetup_index_brin > - I haven't inlined the tuplesort_puttuple_common parameter > (didn't seem worth it) OK, thanks > 0004 Reworks how the work is divided between workers and combined by the > leader. It undoes the tableam.c changes that attempted to divide the > relation into chunks matching the BRIN ranges, and instead merges the > results in the leader (using the BRIN "union" function). That's OK. > I haven't done any indentation fixes yet. > > I did fairly extensive testing, using pageinspect to compare indexes > built with/without parallelism. More testing is needed, but it seems to > work fine (with other opclasses and so on). After code-only review, here are some comments: > +++ b/src/backend/access/brin/brin.c > [...] > +/* Magic numbers for parallel state sharing */ > +#define PARALLEL_KEY_BRIN_SHARED UINT64CONST(0xA000000000000001) > +#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002) These shm keys use the same constants also in use in access/nbtree/nbtsort.c. While this shouldn't be an issue in normal operations, I'd prefer if we didn't actively introduce conflicting identifiers when we still have significant amounts of unused values remaining. > +#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000003) This is the fourth definition of a PARALLEL%_KEY_QUERY_TEXT, the others being in access/nbtree/nbtsort.c (value 0xA000000000000004, one more than brin's), backend/executor/execParallel.c (0xE000000000000008), and PARALLEL_VACUUM_KEY_QUERY_TEXT (0x3) (though I've not checked that their uses are exactly the same, I'd expect at least btree to match mostly, if not fully, 1:1). I think we could probably benefit from a less ad-hoc sharing of query texts. I don't think that needs to happen specifically in this patch, but I think it's something to keep in mind in future efforts. > +_brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state) > [...] > + BrinSpool *spool = state->bs_spool; > [...] > + if (!state) > + return; I think the assignment to spool should be moved to below this condition, as _brin_begin_parallel calls this with state=NULL when it can't launch parallel workers, which will cause issues here. > + state->bs_numtuples = brinshared->indtuples; My IDE complains about bs_numtuples being an integer. This is a pre-existing issue, but still valid: we can hit an overflow on tables with pages_per_range=1 and relsize >= 2^31 pages. Extremely unlikely, but problematic nonetheless. > + * FIXME This probably needs some memory management fixes - we're reading > + * tuples from the tuplesort, we're allocating an emty tuple, and so on. > + * Probably better to release this memory. This should probably be resolved. I also noticed that this is likely to execute `union_tuples` many times when pages_per_range is coprime with the parallel table scan's block stride (or when we for other reasons have many tuples returned for each range); and this `union_tuples` internally allocates and deletes its own memory context for its deserialization of the 'b' tuple. I think we should just pass a scratch context instead, so that we don't have the overhead of continously creating then deleting the same memory context. > +++ b/src/backend/catalog/index.c > [...] > - indexRelation->rd_rel->relam == BTREE_AM_OID) > + (indexRelation->rd_rel->relam == BTREE_AM_OID || > + indexRelation->rd_rel->relam == BRIN_AM_OID)) I think this needs some more effort. I imagine a new IndexAmRoutine->amcanbuildparallel is more appropriate than this hard-coded list - external indexes may want to utilize the parallel index creation planner facility, too. Some notes: As a project PostgreSQL seems to be trying to move away from hardcoding heap into everything in favour of the more AM-agnostic 'table'. I suggest replacing all mentions of "heap" in the arguments with "table", to reduce the work future maintainers need to do to fix this. Even when this AM is mostly targetted towards the heap AM, other AMs (such as those I've heard of that were developed internally at EDB) use the same block-addressing that heap does, and should thus be compatible with BRIN. Thus, "heap" is not a useful name here. There are 2 new mentions of "tuplestore" in the patch, while the structure used is tuplesort: one on form_and_spill_tuple, and one on brinbuildCallbackParallel. Please update those comments. That's it for code review. I'll do some performance comparisons and testing soon, too. Kind regards, Matthias van de Meent Neon (https://neon.tech)
pgsql-hackers by date: