Re: _mdfd_getseg can be expensive - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: _mdfd_getseg can be expensive
Date
Msg-id CAM3SWZR3ScU+NhNnL-sEsyDadV0DfyoyKS+Wu0rBdbq2E2M3_g@mail.gmail.com
Whole thread Raw
In response to Re: _mdfd_getseg can be expensive  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Jun 30, 2016 at 7:08 PM, Andres Freund <andres@anarazel.de> wrote:
> If you have a big enough index (maybe ~150GB+), sure. Before that,
> probably not.
>
> It's usually pretty easy to see in cpu profiles whether this issue
> exists.

I think that this is a contributing factor to why merging in parallel
CREATE INDEX becomes much more CPU bound when building such very large
indexes, which Corey Huinker has benchmarked using an advanced copy of
the patch. He has shown cases that are sped up by 3.6x when 8 parallel
workers are used (compared to a serial CREATE INDEX), but a several
hundred gigabyte index case only sees a speedup of about 1.5x. (This
bottleneck affects serial CREATE INDEX merging just as much as
parallel, since that part isn't parallelized, but it's far more
noticeable with parallel CREATE INDEX simply because merging in the
leader becomes a huge bottleneck).

Those two cases were not exactly comparable in perhaps several other
ways, but even still my sense is that that this can be at least
partially explained by md.c bottlenecks. This is something that we'll
need to confirm through profiling. Hopefully it's just this one
bottleneck.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Reviewing freeze map code
Next
From: Tom Lane
Date:
Subject: Re: [sqlsmith] ERROR: plan should not reference subplan's variable