Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Parallel CREATE INDEX for GIN indexes |
Date | |
Msg-id | CAEze2WjB1vpxtvKuWVEThSaB-v4+8H0EXsOB=yLAv8pLcrQuKw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Parallel CREATE INDEX for GIN indexes
Re: Parallel CREATE INDEX for GIN indexes |
List | pgsql-hackers |
On Mon, 24 Jun 2024 at 02:58, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Here's a bit more cleaned up version, clarifying a lot of comments, > removing a bunch of obsolete comments, or comments speculating about > possible solutions, that sort of thing. I've also removed couple more > XXX comments, etc. > > The main change however is that the sorting no longer relies on memcmp() > to compare the values. I did that because it was enough for the initial > WIP patches, and it worked till now - but the comments explained this > may not be a good idea if the data type allows the same value to have > multiple binary representations, or something like that. > > I don't have a practical example to show an issue, but I guess if using > memcmp() was safe we'd be doing it in a bunch of places already, and > AFAIK we're not. And even if it happened to be OK, this is a probably > not the place where to start doing it. I think one such example would be the values '5.00'::jsonb and '5'::jsonb when indexed using GIN's jsonb_ops, though I'm not sure if they're treated as having the same value inside the opclass' ordering. > So I've switched this to use the regular data-type comparisons, with > SortSupport etc. There's a bit more cleanup remaining and testing > needed, but I'm not aware of any bugs. A review of patch 0001: --- > src/backend/access/gin/gininsert.c | 1449 +++++++++++++++++++- The nbtree code has `nbtsort.c` for its sort- and (parallel) build state handling, which is exclusively used during index creation. As the changes here seem to be largely related to bulk insertion, how much effort would it be to split the bulk insertion code path into a separate file? I noticed that new fields in GinBuildState do get to have a bs_*-prefix, but none of the other added or previous fields of the modified structs in gininsert.c have such prefixes. Could this be unified? > +/* Magic numbers for parallel state sharing */ > +#define PARALLEL_KEY_GIN_SHARED UINT64CONST(0xB000000000000001) > ... These overlap with BRIN's keys; can we make them unique while we're at it? > + * mutex protects all fields before heapdesc. I can't find the field that this `heapdesc` might refer to. > +_gin_begin_parallel(GinBuildState *buildstate, Relation heap, Relation index, > ... > + if (!isconcurrent) > + snapshot = SnapshotAny; > + else > + snapshot = RegisterSnapshot(GetTransactionSnapshot()); grumble: I know this is required from the index with the current APIs, but I'm kind of annoyed that each index AM has to construct the table scan and snapshot in their own code. I mean, this shouldn't be meaningfully different across AMs, so every AM implementing this same code makes me feel like we've got the wrong abstraction. I'm not asking you to change this, but it's one more case where I'm annoyed by the state of the system, but not quite enough yet to change it. --- > +++ b/src/backend/utils/sort/tuplesortvariants.c I was thinking some more about merging tuples inside the tuplesort. I realized that this could be implemented by allowing buffering of tuple writes in writetup. This would require adding a flush operation at the end of mergeonerun to store the final unflushed tuple on the tape, but that shouldn't be too expensive. This buffering, when implemented through e.g. a GinBuffer in TuplesortPublic->arg, could allow us to merge the TID lists of same-valued GIN tuples while they're getting stored and re-sorted, thus reducing the temporary space usage of the tuplesort by some amount with limited overhead for other non-deduplicating tuplesorts. I've not yet spent the time to get this to work though, but I'm fairly sure it'd use less temporary space than the current approach with the 2 tuplesorts, and could have lower overall CPU overhead as well because the number of sortable items gets reduced much earlier in the process. --- > +++ b/src/include/access/gin_tuple.h > + typedef struct GinTuple I think this needs some more care: currently, each GinTuple is at least 36 bytes in size on 64-bit systems. By using int instead of Size (no normal indexable tuple can be larger than MaxAllocSize), and packing the fields better we can shave off 10 bytes; or 12 bytes if GinTuple.keylen is further adjusted to (u)int16: a key needs to fit on a page, so we can probably safely assume that the key size fits in (u)int16. Kind regards, Matthias van de Meent Neon (https://neon.tech)
pgsql-hackers by date: