Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Parallel CREATE INDEX for BRIN indexes |
Date | |
Msg-id | c49dfd75-0a40-7feb-1018-54740e9a91db@enterprisedb.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Parallel CREATE INDEX for BRIN indexes
|
List | pgsql-hackers |
On 11/29/23 23:59, Matthias van de Meent wrote: > On Wed, 29 Nov 2023 at 21:56, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 11/29/23 21:30, Matthias van de Meent wrote: >>> On Wed, 29 Nov 2023 at 18:55, Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> I did try to measure how much it actually saves, but none of the tests I >>>> did actually found measurable improvement. So I'm tempted to just not >>>> include this part, and accept that we may deserialize some of the tuples >>>> unnecessarily. >>>> >>>> Did you actually observe measurable improvements in some cases? >>> >>> The improvements would mostly stem from brin indexes with multiple >>> (potentially compressed) by-ref types, as they go through more complex >>> and expensive code to deserialize, requiring separate palloc() and >>> memcpy() calls each. >>> For single-column and by-value types the improvements are expected to >>> be negligible, because there is no meaningful difference between >>> copying a single by-ref value and copying its container; the >>> additional work done for each tuple is marginal for those. >>> >>> For an 8-column BRIN index ((sha256((id)::text::bytea)::text), >>> (sha256((id+1)::text::bytea)::text), >>> (sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I >>> measured a difference of 10x less time spent in the main loop of >>> _brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block >>> ranges. It's not a lot, but worth at least something, I guess? >>> >> >> It is something, but I can't really convince myself it's worth the extra >> code complexity. It's a somewhat extreme example, and the parallelism >> certainly saves much more than this. > > True. For this, I usually keep in mind that the docs on multi-column > indexes still indicate to use 1 N-column brin index over N 1-column > brin indexes (assuming the same storage parameters), so multi-column > BRIN indexes should not be considered to be uncommon: > > "The only reason to have multiple BRIN indexes instead of one > multicolumn BRIN index on a single table is to have a different > pages_per_range storage parameter." > > Note that most of the time in my example index is spent in creating > the actual tuples due to the use of hashing for data generation; for > index or plain to-text formatting the improvement is much more > pronounced: If I use an 8-column index (id::text, id, ...), index > creation takes ~500ms with 4+ workers. Of this, deforming takes some > 20ms, though when skipping the deforming step (i.e.,with my patch) it > takes ~3.5ms. That's a 3% shaved off the build time when the index > shape is beneficial. > That's all true, and while 3.5% is not something to ignore, my POV is that the parallelism speeds this up from ~2000ms to ~500ms. Yes, it would be great to shave off the extra 1% (relative to the original duration). But I don't have a great idea how to do code that in a way that is readable, and I don't want to stall the patch indefinitely because of a comparatively small improvement. Therefore I propose we get the simpler code committed and leave this as a future improvement. >>> The attached patch fixes the issue that you called out . >>> It also further updates _brin_end_parallel: the final 'write empty >>> tuples' loop is never hit and is thus removed, because if there were >>> any tuples in the spool we'd have filled the empty ranges at the end >>> of the main loop, and if there were no tuples in the spool then the >>> memtuple would still be at its original initialized value of 0 thus >>> resulting in a constant false condition. I also updated some comments. >>> >> >> Ah, right. I'll take a look tomorrow, but I guess I didn't realize we >> insert the empty ranges in the main loop, because we're already looking >> at the *next* summary. > > Yes, merging adds some significant complexity here. I don't think we > can easily get around that though... > >> But I think the idea was to insert empty ranges if there's a chunk of >> empty ranges at the end of the table, after the last tuple the index >> build reads. But I'm not sure that can actually happen ... > > This would be trivial to construct with partial indexes; e.g. WHERE > (my_pk IS NULL) would consist of exclusively empty ranges. > I don't see a lot of value in partial BRIN indexes, but I may be > overlooking something. > Oh, I haven't even thought about partial BRIN indexes! I'm sure for those it's even more important to actually fill-in the empty ranges, otherwise we end up scanning the whole supposedly filtered-out part of the table. I'll do some testing with that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: