Re: Prefetch - Mailing list pgsql-performance
From | Matt Olson |
---|---|
Subject | Re: Prefetch |
Date | |
Msg-id | 200505101136.31759.molson@oceanconsulting.com Whole thread Raw |
In response to | Re: Prefetch (Greg Stark <gsstark@mit.edu>) |
List | pgsql-performance |
My postgres binaries and WAL are on a separate disk from the raid array. The table I'm doing the selects from is probably about 4GB in size and 18-20 million records. No concurrent or dependent inserts or deletes are going on. Tom's point and your points about optimizing the application are well taken. I know my approach is sub optimal and prone to getting caught by latency issues (seek times, cache hit rates, etc.). However, the question of prefetch in my mind is all about eliminating latencies, so, I thought my problem would be good for the sake of discussing prefetching. The two approaches I'm in the process of testing are Rod and Greg's suggestion of using 'CLUSTER'. And for the sake of not letting a good idea get away, I'll probably spend time on doing a parallel query approach which Tom suggested. I'll report back to the list what I find and maybe do some _rough_ benchmarking. This is a production app, so I can't get too much in the way of the daily batches. -- Matt Olson Ocean Consulting http://www.oceanconsulting.com/ On Tuesday 10 May 2005 11:13 am, Greg Stark wrote: > Matt Olson writes: > > I've done other things that make sense, like using indexes, playing with > > the planner constants and turning up the postgres cache buffers. > > > > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields > > no apparent difference in database performance. The random nature of the > > I/O drops disk reads down to about 1MB/sec for the array. A linear table > > scan can easily yield 70-80MB/sec on this system. Total table size is > > usually around 1GB and with indexes should be able to fit completely in > > main memory. > > Actually forcing things to use indexes is the wrong direction to go if > you're trying to process lots of data and want to stream it off disk as > rapidly as possible. I would think about whether you can structure your > data such that you can use sequential scans. That might mean partitioning > your raw data into separate tables and then accessing only the partitions > that are relevant to the query. > > In your application that might be hard. It sounds like you would need more > or less one table per stock ticker which would really be hard to manage. > > One thing you might look into is using the CLUSTER command. But postgres > doesn't maintain the cluster ordering so it would require periodically > rerunning it. > > I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s > even for completely random reads. Is it possible you're seeing something > else interfering? Do you have INSERT/UPDATE/DELETE transactions happening > concurrently with this select scan? If so you should strongly look into > separating the transaction log from the data files.
pgsql-performance by date: