Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Parallel CREATE INDEX for BRIN indexes
Date
Msg-id CAEze2Whg43uK9g3CT_qWxWa2PjtcOU_eqTuxjBOOfNuzsPGAMA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel CREATE INDEX for BRIN indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Parallel CREATE INDEX for BRIN indexes
List pgsql-hackers
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 7/4/23 23:53, Matthias van de Meent wrote:
> > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> Hi,
> >>
> >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
> >> infrastructure (starting workers etc.) is "inspired" by the BTREE code
> >> (i.e. copied from that and massaged a bit to call brin stuff).
> >
> > Nice work.
> >
> >> In both cases _brin_end_parallel then reads the summaries from worker
> >> files, and adds them into the index. In 0001 this is fairly simple,
> >> although we could do one more improvement and sort the ranges by range
> >> start to make the index nicer (and possibly a bit more efficient). This
> >> should be simple, because the per-worker results are already sorted like
> >> that (so a merge sort in _brin_end_parallel would be enough).
> >
> > I see that you manually built the passing and sorting of tuples
> > between workers, but can't we use the parallel tuplesort
> > infrastructure for that? It already has similar features in place and
> > improves code commonality.
> >
>
> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
> do, and the little I knew I managed to forget since I wrote this patch.
> Which similar features do you have in mind?

I was referring to the feature that is "emitting a single sorted run
of tuples at the leader backend based on data gathered in parallel
worker backends". It manages the sort state, on-disk runs etc. so that
you don't have to manage that yourself.

Adding a new storage format for what is effectively a logical tape
(logtape.{c,h}) and manually merging it seems like a lot of changes if
that functionality is readily available, standardized and optimized in
sortsupport; and adds an additional place to manually go through for
disk-related changes like TDE.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)



pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Re: Removing unneeded self joins
Next
From: Jakub Wartak
Date:
Subject: Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns