Re: Building multiple indexes concurrently - Mailing list pgsql-performance

From Andres Freund
Subject Re: Building multiple indexes concurrently
Date
Msg-id 201003172024.10538.andres@anarazel.de
Whole thread Raw
In response to Re: Building multiple indexes concurrently  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: Building multiple indexes concurrently
List pgsql-performance
On Wednesday 17 March 2010 19:44:56 Greg Smith wrote:
> Rob Wultsch wrote:
> > On Wed, Mar 17, 2010 at 7:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> No, it's not optimistic in the least, at least not since we implemented
> >> synchronized seqscans (in 8.3 or thereabouts).
> >
> > Where can I find details about this in the documentation?
>
> It's a behind the scenes optimization so it's not really documented on
> the user side very well as far as I know; easy to forget it's even there
> as I did this morning.
> http://j-davis.com/postgresql/syncscan/syncscan.pdf is a presentation
> covering it, and http://j-davis.com/postgresql/83v82_scans.html is also
> helpful.
>
> While my pessimism on this part may have been overwrought, note the
> message interleaved on the list today with this discussion from Bob
> Lunney discussing the other issue I brought up:  "When using 8-way
> parallel restore against a six-disk RAID 10 group I found that table and
> index scan performance dropped by about 10x.  I/O performance was
> restored by either clustering the tables one at a time, or by dropping
> and restoring them one at a time.  The only reason I can come up with
> for this behavior is file fragmentation and increased seek times."  Now,
> Bob's situation may very well involve a heavy dose of table
> fragmentation from multiple active loading processes rather than index
> fragmentation, but this class of problem is common when trying to do too
> many things at the same time.  I'd hate to see you chase a short-term
> optimization (reduce total index built time) at the expense of long-term
> overhead (resulting indexes are not as efficient to scan).
I find it way much easier to believe such issues exist on a tables in
constrast to indexes. The likelihood to get sequential accesses on an index is
small enough on a big table to make it unlikely to matter much.

Whats your theory to make it matter much?

Andres

pgsql-performance by date:

Previous
From: Justin Pitts
Date:
Subject: Re: Testing FusionIO
Next
From: Kenny Gorman
Date:
Subject: Re: Testing FusionIO