Home > mailing lists

Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Parallel CREATE INDEX for BRIN indexes
Date	November 30, 2023 03:10:48
Msg-id	c49dfd75-0a40-7feb-1018-54740e9a91db@enterprisedb.com Whole thread Raw
In response to	Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses	Re: Parallel CREATE INDEX for BRIN indexes
List	pgsql-hackers

Tree view

On 11/29/23 23:59, Matthias van de Meent wrote:
> On Wed, 29 Nov 2023 at 21:56, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 11/29/23 21:30, Matthias van de Meent wrote:
>>> On Wed, 29 Nov 2023 at 18:55, Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>> I did try to measure how much it actually saves, but none of the tests I
>>>> did actually found measurable improvement. So I'm tempted to just not
>>>> include this part, and accept that we may deserialize some of the tuples
>>>> unnecessarily.
>>>>
>>>> Did you actually observe measurable improvements in some cases?
>>>
>>> The improvements would mostly stem from brin indexes with multiple
>>> (potentially compressed) by-ref types, as they go through more complex
>>> and expensive code to deserialize, requiring separate palloc() and
>>> memcpy() calls each.
>>> For single-column and by-value types the improvements are expected to
>>> be negligible, because there is no meaningful difference between
>>> copying a single by-ref value and copying its container; the
>>> additional work done for each tuple is marginal for those.
>>>
>>> For an 8-column BRIN index ((sha256((id)::text::bytea)::text),
>>> (sha256((id+1)::text::bytea)::text),
>>> (sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I
>>> measured a difference of 10x less time spent in the main loop of
>>> _brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block
>>> ranges. It's not a lot, but worth at least something, I guess?
>>>
>>
>> It is something, but I can't really convince myself it's worth the extra
>> code complexity. It's a somewhat extreme example, and the parallelism
>> certainly saves much more than this.
> 
> True. For this, I usually keep in mind that the docs on multi-column
> indexes still indicate to use 1 N-column brin index over N 1-column
> brin indexes (assuming the same storage parameters), so multi-column
> BRIN indexes should not be considered to be uncommon:
> 
> "The only reason to have multiple BRIN indexes instead of one
> multicolumn BRIN index on a single table is to have a different
> pages_per_range storage parameter."
> 
> Note that most of the time in my example index is spent in creating
> the actual tuples due to the use of hashing for data generation; for
> index or plain to-text formatting the improvement is much more
> pronounced: If I use an 8-column index (id::text, id, ...), index
> creation takes ~500ms with 4+ workers. Of this, deforming takes some
> 20ms, though when skipping the deforming step (i.e.,with my patch) it
> takes ~3.5ms. That's a 3% shaved off the build time when the index
> shape is beneficial.
> 

That's all true, and while 3.5% is not something to ignore, my POV is
that the parallelism speeds this up from ~2000ms to ~500ms. Yes, it
would be great to shave off the extra 1% (relative to the original
duration). But I don't have a great idea how to do code that in a way
that is readable, and I don't want to stall the patch indefinitely
because of a comparatively small improvement.

Therefore I propose we get the simpler code committed and leave this as
a future improvement.

>>> The attached patch fixes the issue that you called out .
>>> It also further updates _brin_end_parallel: the final 'write empty
>>> tuples' loop is never hit and is thus removed, because if there were
>>> any tuples in the spool we'd have filled the empty ranges at the end
>>> of the main loop, and if there were no tuples in the spool then the
>>> memtuple would still be at its original initialized value of 0 thus
>>> resulting in a constant false condition. I also updated some comments.
>>>
>>
>> Ah, right. I'll take a look tomorrow, but I guess I didn't realize we
>> insert the empty ranges in the main loop, because we're already looking
>> at the *next* summary.
> 
> Yes, merging adds some significant complexity here. I don't think we
> can easily get around that though...
> 
>> But I think the idea was to insert empty ranges if there's a chunk of
>> empty ranges at the end of the table, after the last tuple the index
>> build reads. But I'm not sure that can actually happen ...
> 
> This would be trivial to construct with partial indexes; e.g. WHERE
> (my_pk IS NULL) would consist of exclusively empty ranges.
> I don't see a lot of value in partial BRIN indexes, but I may be
> overlooking something.
> 

Oh, I haven't even thought about partial BRIN indexes! I'm sure for
those it's even more important to actually fill-in the empty ranges,
otherwise we end up scanning the whole supposedly filtered-out part of
the table. I'll do some testing with that.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Thomas Munro
Date: 30 November 2023, 03:01:46
Subject: Re: Streaming I/O, vectored I/O (WIP)

From: Jeff Davis
Date: 30 November 2023, 03:23:22
Subject: Re: encoding affects ICU regex character classification

Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers

Previous

Next