Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Daniel Bausch
Subject Re: Parallel Seq Scan
Date
Msg-id 8761bqrrjh.fsf@gelnhausen.dvs.informatik.tu-darmstadt.de
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:

> On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown <thom@linux.com> wrote:
>> On 28 January 2015 at 14:03, Robert Haas <robertmhaas@gmail.com> wrote:
>>> The problem here, as I see it, is that we're flying blind.  If there's
>>> just one spindle, I think it's got to be right to read the relation
>>> sequentially.  But if there are multiple spindles, it might not be,
>>> but it seems hard to predict what we should do.  We don't know what
>>> the RAID chunk size is or how many spindles there are, so any guess as
>>> to how to chunk up the relation and divide up the work between workers
>>> is just a shot in the dark.
>>
>> Can't the planner take effective_io_concurrency into account?
>
> Maybe.  It's answering a somewhat the right question -- to tell us how
> many parallel I/O channels we think we've got.  But I'm not quite sure
> what the to do with that information in this case.  I mean, if we've
> got effective_io_concurrency = 6, does that mean it's right to start
> scans in 6 arbitrary places in the relation and hope that keeps all
> the drives busy?  That seems like throwing darts at the wall.  We have
> no idea which parts are on which underlying devices.  Or maybe it mean
> we should prefetch 24MB, on the assumption that the RAID stripe is
> 4MB?  That's definitely blind guesswork.
>
> Considering the email Amit just sent, it looks like on this machine,
> regardless of what algorithm we used, the scan took between 3 minutes
> and 5.5 minutes, and most of them took between 4 minutes and 5.5
> minutes.  The results aren't very predictable, more workers don't
> necessarily help, and it's not really clear that any algorithm we've
> tried is clearly better than any other.  I experimented with
> prefetching a bit yesterday, too, and it was pretty much the same.
> Some settings made it slightly faster.  Others made it slower.  Whee!

I have been researching this topic long time ago.  One notably fact is
that active prefetching disables automatic readahead prefetching (by
Linux kernel), which can occour in larger granularities than 8K.
Automatic readahead prefetching occours when consecutive addresses are
read, which may happen by a seqscan but also by "accident" through an
indexscan in correlated cases.

My consequence was to NOT prefetch seqscans, because OS does good enough
without advice.  Prefetching indexscan heap accesses is very valuable
though, but you need to detect the accidential sequential accesses to
not hurt your performance in correlated cases.

In general I can give you the hint to not only focus on HDDs with their
single spindle.  A single SATA SSD scales up to 32 (31 on Linux)
requests in parallel (without RAID or anything else).  The difference in
throughput is extreme for this type of storage device.  While single
spinning HDDs can only gain up to ~20% by NCQ, SATA SSDs can easily gain
up to 700%.

+1 for using effective_io_concurrency to tune for this, since
prefetching random addresses is effectively a type of parallel I/O.

Regards,
Daniel
--
MSc. Daniel Bausch
Research Assistant (Computer Science)
Technische Universität Darmstadt
http://www.dvs.tu-darmstadt.de/staff/dbausch



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: jsonb, unicode escapes and escaped backslashes
Next
From: Pavel Stehule
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]