Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id CAEze2WhBNrsK0nPowTi3A4Oh8WbNLFXemeCHnb1x6hMi7zjMaA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Andy Fan <zhihuifan1213@163.com>)
List pgsql-hackers
On Tue, 9 Jul 2024 at 03:18, Andy Fan <zhihuifan1213@163.com> wrote:
>> and later we called 'tuplesort_performsort(state->bs_sortstate);'.  Even
>> we have some CTID merges activity in  '....(1)', the tuples are still
>> ordered, so the sort (in both tuplesort_putgintuple and
>> 'tuplesort_performsort) are not necessary, what's more, in the each of
>> 'flush-memory-to-disk' in tuplesort, it create a 'sorted-run', and in
>> this case, acutally we only need 1 run only since all the input tuples
>> in the worker is sorted. The reduction of 'sort-runs' in worker will be
>> helpful to leader's final mergeruns.  the 'sorted-run' benefit doesn't
>> exist for the case-1 (RBTree -> worker_state).
>>
>> If Matthias's proposal is adopted, my optimization will not be useful
>> anymore and Matthias's porposal looks like a more natural and effecient
>> way.

I think they might be complementary. I don't think it's reasonable to
expect GIN's BuildAccumulator to buffer all the index tuples at the
same time (as I mentioned upthread: we are or should be limited by
work memory), but the BuildAccumulator will do a much better job at
combining tuples than the in-memory sort + merge-write done by
Tuplesort (because BA will use (much?) less memory for the same number
of stored values). So, the idea of making BuildAccumulator responsible
for providing the initial sorted runs does resonate with me, and can
also be worth pursuing.

I think it would indeed save time otherwise spent comparing if tuples
can be merged before they're first spilled to disk, when we already
have knowledge about which tuples are a sorted run. Afterwards, only
the phases where we merge sorted runs from disk would require my
buffered write approach that merges Gin tuples.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: can we mark upper/lower/textlike functions leakproof?
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: Injection points: preloading and runtime arguments