Home > mailing lists

Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From	Matthias van de Meent
Subject	Re: Parallel CREATE INDEX for GIN indexes
Date	July 8, 2024 08:05:38
Msg-id	CAEze2WiGDhZD7DH=7k3jkYef=_-m-vL4NCeW06qpT0RWVsK4wA@mail.gmail.com Whole thread Raw
In response to	Re: Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Sun, 7 Jul 2024, 18:11 Tomas Vondra, <tomas.vondra@enterprisedb.com> wrote:
>
> On 7/3/24 20:36, Matthias van de Meent wrote:
>> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>> So I've switched this to use the regular data-type comparisons, with
>>> SortSupport etc. There's a bit more cleanup remaining and testing
>>> needed, but I'm not aware of any bugs.
>>
>> A review of patch 0001:
>>
>> ---
>>
>>> src/backend/access/gin/gininsert.c         | 1449 +++++++++++++++++++-
>>
>> The nbtree code has `nbtsort.c` for its sort- and (parallel) build
>> state handling, which is exclusively used during index creation. As
>> the changes here seem to be largely related to bulk insertion, how
>> much effort would it be to split the bulk insertion code path into a
>> separate file?
>>
>
> Hmmm. I haven't tried doing that, but I guess it's doable. I assume we'd
> want to do the move first, because it involves pre-existing code, and
> then do the patch on top of that.
>
> But what would be the benefit of doing that? I'm not sure doing it just
> to make it look more like btree code is really worth it. Do you expect
> the result to be clearer?

It was mostly a consideration of file size and separation of concerns.
The sorted build path is quite different from the unsorted build,
after all.

>> I noticed that new fields in GinBuildState do get to have a
>> bs_*-prefix, but none of the other added or previous fields of the
>> modified structs in gininsert.c have such prefixes. Could this be
>> unified?
>>
>
> Yeah, these are inconsistencies from copying the infrastructure code to
> make the parallel builds work, etc.
>
>>> +/* Magic numbers for parallel state sharing */
>>> +#define PARALLEL_KEY_GIN_SHARED            UINT64CONST(0xB000000000000001)
>>> ...
>>
>> These overlap with BRIN's keys; can we make them unique while we're at it?
>>
>
> We could, and I recall we had a similar discussion in the parallel BRIN
> thread, right?. But I'm somewhat unsure why would we actually want/need
> these keys to be unique. Surely, we don't need to mix those keys in the
> single shm segment, right? So it seems more like an aesthetic thing. Or
> is there some policy to have unique values for these keys?

Uniqueness would be mostly useful for debugging shared memory issues,
but indeed, in a correctly working system we wouldn't have to worry
about parallel state key type confusion.

>> ---
>>
>>> +++ b/src/include/access/gin_tuple.h
>>> + typedef struct GinTuple
>>
>> I think this needs some more care: currently, each GinTuple is at
>> least 36 bytes in size on 64-bit systems. By using int instead of Size
>> (no normal indexable tuple can be larger than MaxAllocSize), and
>> packing the fields better we can shave off 10 bytes; or 12 bytes if
>> GinTuple.keylen is further adjusted to (u)int16: a key needs to fit on
>> a page, so we can probably safely assume that the key size fits in
>> (u)int16.
>
> Yeah, I guess using int64 is a bit excessive - you're right about that.
> I'm not sure this is necessarily about "indexable tuples" (GinTuple is
> not indexed, it's more an intermediate representation).

Yes, but even if the GinTuple itself isn't stored on disk in the
index, a GinTuple's key *is* part of the the primary GIN btree's keys
somewhere down the line, and thus must fit on a page somewhere. That's
what I was referring to.

> But if we can make it smaller, that probably can't hurt.
>
> I don't have a great intuition on how beneficial this might be. For
> cases with many TIDs per index key, it probably won't matter much. But
> if there's many keys (so that GinTuples store only very few TIDs), it
> might make a difference.

This will indeed matter most when small TID lists are common. I
suspect it's not uncommon to find us such a in situation when users
have a low maintenance_work_mem (and thus don't have much space to
buffer and combine index tuples before they're flushed), or when the
generated index keys can't be store in the available memory (such as
in my example; it didn't tune m_w_m at all, and the table I tested had
~15GB of data).

> I'll try to measure the impact on the same "realistic" cases I used for
> the earlier steps.

That would be greatly appreciated, thanks!


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

pgsql-hackers by date:

From: Peter Smith
Date: 08 July 2024, 07:50:26
Subject: Re: Pgoutput not capturing the generated columns

From: Junwang Zhao
Date: 08 July 2024, 08:24:49
Subject: a potential typo in comments of pg_parse_json

Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

Previous

Next