Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Parallel CREATE INDEX for BRIN indexes |
Date | |
Msg-id | c990c65c-93e9-f9c3-6792-02f929ce308f@enterprisedb.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Parallel CREATE INDEX for BRIN indexes
|
List | pgsql-hackers |
On 11/29/23 21:30, Matthias van de Meent wrote: > On Wed, 29 Nov 2023 at 18:55, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 11/29/23 15:52, Tomas Vondra wrote: >>>> ... >>>> >>>> This also made me think a bit more about how we're working with the >>>> tuples. With your latest patch, we always deserialize and re-serialize >>>> the sorted brin tuples, just in case the next tuple will also be a >>>> BRIN tuple of the same page range. Could we save some of that >>>> deserialization time by optimistically expecting that we're not going >>>> to need to merge the tuple and only store a local copy of it locally? >>>> See attached 0002; this saves some cycles in common cases. >>>> >>> >>> Good idea! >>> >> >> FWIW there's a bug, in this part of the optimization: >> >> ------------------ >> + if (memtuple == NULL) >> + memtuple = brin_deform_tuple(state->bs_bdesc, btup, >> + memtup_holder); >> + >> union_tuples(state->bs_bdesc, memtuple, btup); >> continue; >> ------------------ >> >> The deforming should use prevbtup, otherwise union_tuples() jut combines >> two copies of the same tuple. > > Good point. There were some more issues as well, fixes are attached. > >> Which however brings me to the bigger issue with this - my stress test >> found this issue pretty quickly, but then I spent quite a bit of time >> trying to find what went wrong. I find this reworked code pretty hard to >> understand, and not necessarily because of how it's written. The problem >> is it the same loop tries to juggle multiple pieces of information with >> different lifespans, and so on. I find it really hard to reason about >> how it behaves ... > > Yeah, it'd be nice if we had a peek option for sortsupport, that'd > improve context handling. > >> I did try to measure how much it actually saves, but none of the tests I >> did actually found measurable improvement. So I'm tempted to just not >> include this part, and accept that we may deserialize some of the tuples >> unnecessarily. >> >> Did you actually observe measurable improvements in some cases? > > The improvements would mostly stem from brin indexes with multiple > (potentially compressed) by-ref types, as they go through more complex > and expensive code to deserialize, requiring separate palloc() and > memcpy() calls each. > For single-column and by-value types the improvements are expected to > be negligible, because there is no meaningful difference between > copying a single by-ref value and copying its container; the > additional work done for each tuple is marginal for those. > > For an 8-column BRIN index ((sha256((id)::text::bytea)::text), > (sha256((id+1)::text::bytea)::text), > (sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I > measured a difference of 10x less time spent in the main loop of > _brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block > ranges. It's not a lot, but worth at least something, I guess? > It is something, but I can't really convince myself it's worth the extra code complexity. It's a somewhat extreme example, and the parallelism certainly saves much more than this. > The attached patch fixes the issue that you called out . > It also further updates _brin_end_parallel: the final 'write empty > tuples' loop is never hit and is thus removed, because if there were > any tuples in the spool we'd have filled the empty ranges at the end > of the main loop, and if there were no tuples in the spool then the > memtuple would still be at its original initialized value of 0 thus > resulting in a constant false condition. I also updated some comments. > Ah, right. I'll take a look tomorrow, but I guess I didn't realize we insert the empty ranges in the main loop, because we're already looking at the *next* summary. But I think the idea was to insert empty ranges if there's a chunk of empty ranges at the end of the table, after the last tuple the index build reads. But I'm not sure that can actually happen ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: