There's random access and then there's random access - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | There's random access and then there's random access |
Date | |
Msg-id | 87ir3hgrsa.fsf@oxford.xeocode.com Whole thread Raw |
Responses |
Re: There's random access and then there's random access
("Douglas McNaught" <doug@mcnaught.org>)
Re: There's random access and then there's random access (Tom Lane <tgl@sss.pgh.pa.us>) Re: There's random access and then there's random access (Gregory Stark <stark@enterprisedb.com>) Re: There's random access and then there's random access (Gregory Stark <stark@enterprisedb.com>) |
List | pgsql-hackers |
Recently there was a post on -performance about a particular case where Postgres doesn't make very good use of the I/O system. This is when you try to fetch many records spread throughout a table in random order. http://archives.postgresql.org/pgsql-performance/2007-12/msg00005.php Currently Postgres reads each record as needed and processes it. This means even if you have a large raid array you get no benefit from it since you're limited by the latency of each request. The raid array might let you run more queries simultaneously but not improve the response time of a single query. But in most cases, as in the use case in the email message above, we can do substantially better. We can arrange to issue all the read requests without blocking, then process the blocks either as they come in or in the order we want blocking until they're actually satisfied. Handling them as they come in is in theory more efficient but either way I would expect to see more or less a speedup nearly equal to the number of drives in the array. Even on a single drive it should slightly improve performance as it allows us to do some CPU work while the I/O requests are pending. The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've run tests with a synthetic benchmark which generates a large file then reads a random selection of blocks from within it using either synchronous reads like we do now or either of those interfaces. I saw impressive speed gains on a machine with only three drives in a raid array. I did this a while ago so I don't have the results handy. I'll rerun the tests again and post them. I think this will be easiest to do for bitmap index scans. Since we gather up all the pages we'll need before starting the heap scan we can easily skim through them, issue posix_fadvises for at least a certain number ahead of the actual read point and then proceed with the rest of the scan unchanged. For regular index scans I'm not sure how easy it will be to beat them into doing this but I suspect it might not be too hard to at least prefetch the tuples in the page-at-a-time buffer. That's probably safer too since for such scans we're more likely to not actually read all the results anyways; there could be a limit or something else above which will stop us. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
pgsql-hackers by date: