Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Parallel Seq Scan
Date	January 28, 2015 15:39:07
Msg-id	CA+TgmoYM4WT2sQ4W9LVRV30272hetha3-f+XKn1F4hLo63VMJA@mail.gmail.com Whole thread
In response to	Re: Parallel Seq Scan (Thom Brown <thom@linux.com>)
Responses	Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown <thom@linux.com> wrote:
> On 28 January 2015 at 14:03, Robert Haas <robertmhaas@gmail.com> wrote:
>> The problem here, as I see it, is that we're flying blind.  If there's
>> just one spindle, I think it's got to be right to read the relation
>> sequentially.  But if there are multiple spindles, it might not be,
>> but it seems hard to predict what we should do.  We don't know what
>> the RAID chunk size is or how many spindles there are, so any guess as
>> to how to chunk up the relation and divide up the work between workers
>> is just a shot in the dark.
>
> Can't the planner take effective_io_concurrency into account?

Maybe.  It's answering a somewhat the right question -- to tell us how
many parallel I/O channels we think we've got.  But I'm not quite sure
what the to do with that information in this case.  I mean, if we've
got effective_io_concurrency = 6, does that mean it's right to start
scans in 6 arbitrary places in the relation and hope that keeps all
the drives busy?  That seems like throwing darts at the wall.  We have
no idea which parts are on which underlying devices.  Or maybe it mean
we should prefetch 24MB, on the assumption that the RAID stripe is
4MB?  That's definitely blind guesswork.

Considering the email Amit just sent, it looks like on this machine,
regardless of what algorithm we used, the scan took between 3 minutes
and 5.5 minutes, and most of them took between 4 minutes and 5.5
minutes.  The results aren't very predictable, more workers don't
necessarily help, and it's not really clear that any algorithm we've
tried is clearly better than any other.  I experimented with
prefetching a bit yesterday, too, and it was pretty much the same.
Some settings made it slightly faster.  Others made it slower.  Whee!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Tom Lane
Date: 28 January 2015, 15:35:52
Subject: Re: Misaligned BufferDescriptors causing major performance problems on AMD

From: Robert Haas
Date: 28 January 2015, 15:39:52
Subject: Re: Misaligned BufferDescriptors causing major performance problems on AMD

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next