Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id CAEze2WiGDhZD7DH=7k3jkYef=_-m-vL4NCeW06qpT0RWVsK4wA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Sun, 7 Jul 2024, 18:11 Tomas Vondra, <tomas.vondra@enterprisedb.com> wrote:
>
> On 7/3/24 20:36, Matthias van de Meent wrote:
>> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>> So I've switched this to use the regular data-type comparisons, with
>>> SortSupport etc. There's a bit more cleanup remaining and testing
>>> needed, but I'm not aware of any bugs.
>>
>> A review of patch 0001:
>>
>> ---
>>
>>> src/backend/access/gin/gininsert.c         | 1449 +++++++++++++++++++-
>>
>> The nbtree code has `nbtsort.c` for its sort- and (parallel) build
>> state handling, which is exclusively used during index creation. As
>> the changes here seem to be largely related to bulk insertion, how
>> much effort would it be to split the bulk insertion code path into a
>> separate file?
>>
>
> Hmmm. I haven't tried doing that, but I guess it's doable. I assume we'd
> want to do the move first, because it involves pre-existing code, and
> then do the patch on top of that.
>
> But what would be the benefit of doing that? I'm not sure doing it just
> to make it look more like btree code is really worth it. Do you expect
> the result to be clearer?

It was mostly a consideration of file size and separation of concerns.
The sorted build path is quite different from the unsorted build,
after all.

>> I noticed that new fields in GinBuildState do get to have a
>> bs_*-prefix, but none of the other added or previous fields of the
>> modified structs in gininsert.c have such prefixes. Could this be
>> unified?
>>
>
> Yeah, these are inconsistencies from copying the infrastructure code to
> make the parallel builds work, etc.
>
>>> +/* Magic numbers for parallel state sharing */
>>> +#define PARALLEL_KEY_GIN_SHARED            UINT64CONST(0xB000000000000001)
>>> ...
>>
>> These overlap with BRIN's keys; can we make them unique while we're at it?
>>
>
> We could, and I recall we had a similar discussion in the parallel BRIN
> thread, right?. But I'm somewhat unsure why would we actually want/need
> these keys to be unique. Surely, we don't need to mix those keys in the
> single shm segment, right? So it seems more like an aesthetic thing. Or
> is there some policy to have unique values for these keys?

Uniqueness would be mostly useful for debugging shared memory issues,
but indeed, in a correctly working system we wouldn't have to worry
about parallel state key type confusion.

>> ---
>>
>>> +++ b/src/include/access/gin_tuple.h
>>> + typedef struct GinTuple
>>
>> I think this needs some more care: currently, each GinTuple is at
>> least 36 bytes in size on 64-bit systems. By using int instead of Size
>> (no normal indexable tuple can be larger than MaxAllocSize), and
>> packing the fields better we can shave off 10 bytes; or 12 bytes if
>> GinTuple.keylen is further adjusted to (u)int16: a key needs to fit on
>> a page, so we can probably safely assume that the key size fits in
>> (u)int16.
>
> Yeah, I guess using int64 is a bit excessive - you're right about that.
> I'm not sure this is necessarily about "indexable tuples" (GinTuple is
> not indexed, it's more an intermediate representation).

Yes, but even if the GinTuple itself isn't stored on disk in the
index, a GinTuple's key *is* part of the the primary GIN btree's keys
somewhere down the line, and thus must fit on a page somewhere. That's
what I was referring to.

> But if we can make it smaller, that probably can't hurt.
>
> I don't have a great intuition on how beneficial this might be. For
> cases with many TIDs per index key, it probably won't matter much. But
> if there's many keys (so that GinTuples store only very few TIDs), it
> might make a difference.

This will indeed matter most when small TID lists are common. I
suspect it's not uncommon to find us such a in situation when users
have a low maintenance_work_mem (and thus don't have much space to
buffer and combine index tuples before they're flushed), or when the
generated index keys can't be store in the available memory (such as
in my example; it didn't tune m_w_m at all, and the table I tested had
~15GB of data).

> I'll try to measure the impact on the same "realistic" cases I used for
> the earlier steps.

That would be greatly appreciated, thanks!


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Junwang Zhao
Date:
Subject: a potential typo in comments of pg_parse_json