Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Daniel Bausch |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | 8761bqrrjh.fsf@gelnhausen.dvs.informatik.tu-darmstadt.de Whole thread Raw |
In response to | Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown <thom@linux.com> wrote: >> On 28 January 2015 at 14:03, Robert Haas <robertmhaas@gmail.com> wrote: >>> The problem here, as I see it, is that we're flying blind. If there's >>> just one spindle, I think it's got to be right to read the relation >>> sequentially. But if there are multiple spindles, it might not be, >>> but it seems hard to predict what we should do. We don't know what >>> the RAID chunk size is or how many spindles there are, so any guess as >>> to how to chunk up the relation and divide up the work between workers >>> is just a shot in the dark. >> >> Can't the planner take effective_io_concurrency into account? > > Maybe. It's answering a somewhat the right question -- to tell us how > many parallel I/O channels we think we've got. But I'm not quite sure > what the to do with that information in this case. I mean, if we've > got effective_io_concurrency = 6, does that mean it's right to start > scans in 6 arbitrary places in the relation and hope that keeps all > the drives busy? That seems like throwing darts at the wall. We have > no idea which parts are on which underlying devices. Or maybe it mean > we should prefetch 24MB, on the assumption that the RAID stripe is > 4MB? That's definitely blind guesswork. > > Considering the email Amit just sent, it looks like on this machine, > regardless of what algorithm we used, the scan took between 3 minutes > and 5.5 minutes, and most of them took between 4 minutes and 5.5 > minutes. The results aren't very predictable, more workers don't > necessarily help, and it's not really clear that any algorithm we've > tried is clearly better than any other. I experimented with > prefetching a bit yesterday, too, and it was pretty much the same. > Some settings made it slightly faster. Others made it slower. Whee! I have been researching this topic long time ago. One notably fact is that active prefetching disables automatic readahead prefetching (by Linux kernel), which can occour in larger granularities than 8K. Automatic readahead prefetching occours when consecutive addresses are read, which may happen by a seqscan but also by "accident" through an indexscan in correlated cases. My consequence was to NOT prefetch seqscans, because OS does good enough without advice. Prefetching indexscan heap accesses is very valuable though, but you need to detect the accidential sequential accesses to not hurt your performance in correlated cases. In general I can give you the hint to not only focus on HDDs with their single spindle. A single SATA SSD scales up to 32 (31 on Linux) requests in parallel (without RAID or anything else). The difference in throughput is extreme for this type of storage device. While single spinning HDDs can only gain up to ~20% by NCQ, SATA SSDs can easily gain up to 700%. +1 for using effective_io_concurrency to tune for this, since prefetching random addresses is effectively a type of parallel I/O. Regards, Daniel -- MSc. Daniel Bausch Research Assistant (Computer Science) Technische Universität Darmstadt http://www.dvs.tu-darmstadt.de/staff/dbausch
pgsql-hackers by date: