Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id 3b011125-7489-4ecb-8973-bbe6f00cbf1b@enterprisedb.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Parallel CREATE INDEX for GIN indexes
List pgsql-hackers

On 7/8/24 11:45, Matthias van de Meent wrote:
> On Sun, 7 Jul 2024, 18:26 Tomas Vondra, <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 7/5/24 21:45, Matthias van de Meent wrote:
>>> On Wed, 3 Jul 2024 at 20:36, Matthias van de Meent
>>> <boekewurm+postgres@gmail.com> wrote:
>>>>
>>>> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
>>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>> So I've switched this to use the regular data-type comparisons, with
>>>>> SortSupport etc. There's a bit more cleanup remaining and testing
>>>>> needed, but I'm not aware of any bugs.
>>>> ---
>>>>> +++ b/src/backend/utils/sort/tuplesortvariants.c
>>>>
>>>> I was thinking some more about merging tuples inside the tuplesort. I
>>>> realized that this could be implemented by allowing buffering of tuple
>>>> writes in writetup. This would require adding a flush operation at the
>>>> end of mergeonerun to store the final unflushed tuple on the tape, but
>>>> that shouldn't be too expensive. This buffering, when implemented
>>>> through e.g. a GinBuffer in TuplesortPublic->arg, could allow us to
>>>> merge the TID lists of same-valued GIN tuples while they're getting
>>>> stored and re-sorted, thus reducing the temporary space usage of the
>>>> tuplesort by some amount with limited overhead for other
>>>> non-deduplicating tuplesorts.
>>>>
>>>> I've not yet spent the time to get this to work though, but I'm fairly
>>>> sure it'd use less temporary space than the current approach with the
>>>> 2 tuplesorts, and could have lower overall CPU overhead as well
>>>> because the number of sortable items gets reduced much earlier in the
>>>> process.
>>>
>>> I've now spent some time on this. Attached the original patchset, plus
>>> 2 incremental patches, the first of which implement the above design
>>> (patch no. 8).
>>>
>>> Local tests show it's significantly faster: for the below test case
>>> I've seen reindex time reduced from 777455ms to 626217ms, or ~20%
>>> improvement.
>>>
>>> After applying the 'reduce the size of GinTuple' patch, index creation
>>> time is down to 551514ms, or about 29% faster total. This all was
>>> tested with a fresh stock postgres configuration.
>>>
>>> """
>>> CREATE UNLOGGED TABLE testdata
>>> AS SELECT sha256(i::text::bytea)::text
>>>     FROM generate_series(1, 15000000) i;
>>> CREATE INDEX ON testdata USING gin (sha256 gin_trgm_ops);
>>> """
>>>
>>
>> Those results look really promising. I certainly would not have expected
>> such improvements - 20-30% speedup on top of the already parallel run
>> seems great. I'll do a bit more testing to see how much this helps for
>> the "more realistic" data sets.
>>
>> I can't say I 100% understand how the new stuff in tuplesortvariants.c
>> works, but that's mostly because my knowledge of tuplesort internals is
>> fairly limited (and I managed to survive without that until now).
>>
>> What makes me a bit unsure/uneasy is that this seems to "inject" custom
>> code fairly deep into the tuplesort logic. I'm not quite sure if this is
>> a good idea ...
> 
> I thought it was still fairly high-level: it adds (what seems to me) a
> natural extension to the pre-existing "write a tuple to the tape" API,
> allowing the Tuplesort (TS) implementation to further optimize its
> ordered tape writes through buffering (and combining) of tuple writes.
> While it does remove the current 1:1 relation of TS tape writes to
> tape reads for the GIN case, there is AFAIK no code in TS that relies
> on that 1:1 relation.
> 
> As to the GIN TS code itself; yes it's more complicated, mainly
> because it uses several optimizations to reduce unnecessary
> allocations and (de)serializations of GinTuples, and I'm aware of even
> more such optimizations that can be added at some point.
> 
> As an example: I suspect the use of GinBuffer in writetup_index_gin to
> be a significant resource drain in my patch because it lacks
> "freezing" in the tuplesort buffer. When no duplicate key values are
> encountered, the code is nearly optimal (except for a full tuple copy
> to get the data into the GinBuffer's memory context), but if more than
> one GinTuple has the same key in the merge phase we deserialize both
> tuple's posting lists and merge the two. I suspect that merge to be
> more expensive than operating on the compressed posting lists of the
> GinTuples themselves, so that's something I think could be improved. I
> suspect/guess it could save another 10% in select cases, and will
> definitely reduce the memory footprint of the buffer.
> Another thing that can be optimized is the current approach of
> inserting data into the index: I think it's kind of wasteful to
> decompress and later re-compress the posting lists once we start
> storing the tuples on disk.
> 

I need to experiment with this a bit more, to better understand the
behavior and pros/cons. But one thing that's not clear to me is why
would this be better than simply increasing the amount of memory for the
initial BuildAccumulator buffer ...

Wouldn't that have pretty much the same effect?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Is it expected behavior index only scan shows "OPERATOR(pg_catalog." for EXPLAIN?
Next
From: "Joel Jacobson"
Date:
Subject: Re: Thoughts on NBASE=100000000