Thread: Prefetch
I wanted to get some opinions about row prefetching. AFAIK, there is no prefetching done by PostgreSQL; all prefetching is delegated to the operating system. The hardware (can't say enough good things about it): Athlon 64, dual channel 4GB ram 240GB usable 4 disk raid5 (ATA133) Fedora Core 3 PostgreSQL 7.4.7 I have what is essentially a data warehouse of stock data. Each day has around 30,000 records (tickers). A typical operation is to get the 200 day simple moving average (of price) for each ticker and write the result to a summary table. In running this process (Perl/DBI), it is typical to see 70-80% I/O wait time with postgres running a about 8-9%. If I run the next day's date, the postgres cache and file cache is now populated with 199 days of the needed data, postgres runs 80-90% of CPU and total run time is greatly reduced. My conclusion is that this is a high cache hit rate in action. I've done other things that make sense, like using indexes, playing with the planner constants and turning up the postgres cache buffers. Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no apparent difference in database performance. The random nature of the I/O drops disk reads down to about 1MB/sec for the array. A linear table scan can easily yield 70-80MB/sec on this system. Total table size is usually around 1GB and with indexes should be able to fit completely in main memory. Other databases like Oracle and DB2 implement some sort of row prefetch. Has there been serious consideration of implementing something like a prefetch subsystem? Does anyone have any opinions as to why this would be a bad idea for postgres? Postges is great for a multiuser environment and OLTP applications. However, in this set up, a data warehouse, the observed performance is not what I would hope for. Regards, Matt Olson Ocean Consulting http://www.oceanconsulting.com/
Matt Olson <molson@oceanconsulting.com> writes: > Other databases like Oracle and DB2 implement some sort of row prefetch. Has > there been serious consideration of implementing something like a prefetch > subsystem? No. > Does anyone have any opinions as to why this would be a bad idea for > postgres? We know even less than the OS does about disk layout, and not a lot more than it about what our next request will be. (If we're doing a seqscan, then of course that's not true, but I would expect the OS to be able to figure that one out and do readahead.) You haven't shown us your problem queries, but I think that conventional query tuning would be a more appropriate answer. In particular I wonder whether you shouldn't be looking at ways to calculate multiple aggregates in parallel. regards, tom lane
> I've done other things that make sense, like using indexes, playing with the > planner constants and turning up the postgres cache buffers. After you load the new days data try running CLUSTER on the structure using a key of (stockID, date) -- probably your primary key. This should significantly reduce the amount of IO required for your calculations involving a few stocks over a period of time. --
Matt Olson <molson@oceanconsulting.com> writes: > I've done other things that make sense, like using indexes, playing with the > planner constants and turning up the postgres cache buffers. > > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no > apparent difference in database performance. The random nature of the I/O > drops disk reads down to about 1MB/sec for the array. A linear table scan > can easily yield 70-80MB/sec on this system. Total table size is usually > around 1GB and with indexes should be able to fit completely in main memory. Actually forcing things to use indexes is the wrong direction to go if you're trying to process lots of data and want to stream it off disk as rapidly as possible. I would think about whether you can structure your data such that you can use sequential scans. That might mean partitioning your raw data into separate tables and then accessing only the partitions that are relevant to the query. In your application that might be hard. It sounds like you would need more or less one table per stock ticker which would really be hard to manage. One thing you might look into is using the CLUSTER command. But postgres doesn't maintain the cluster ordering so it would require periodically rerunning it. I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s even for completely random reads. Is it possible you're seeing something else interfering? Do you have INSERT/UPDATE/DELETE transactions happening concurrently with this select scan? If so you should strongly look into separating the transaction log from the data files. -- greg
My postgres binaries and WAL are on a separate disk from the raid array. The table I'm doing the selects from is probably about 4GB in size and 18-20 million records. No concurrent or dependent inserts or deletes are going on. Tom's point and your points about optimizing the application are well taken. I know my approach is sub optimal and prone to getting caught by latency issues (seek times, cache hit rates, etc.). However, the question of prefetch in my mind is all about eliminating latencies, so, I thought my problem would be good for the sake of discussing prefetching. The two approaches I'm in the process of testing are Rod and Greg's suggestion of using 'CLUSTER'. And for the sake of not letting a good idea get away, I'll probably spend time on doing a parallel query approach which Tom suggested. I'll report back to the list what I find and maybe do some _rough_ benchmarking. This is a production app, so I can't get too much in the way of the daily batches. -- Matt Olson Ocean Consulting http://www.oceanconsulting.com/ On Tuesday 10 May 2005 11:13 am, Greg Stark wrote: > Matt Olson writes: > > I've done other things that make sense, like using indexes, playing with > > the planner constants and turning up the postgres cache buffers. > > > > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields > > no apparent difference in database performance. The random nature of the > > I/O drops disk reads down to about 1MB/sec for the array. A linear table > > scan can easily yield 70-80MB/sec on this system. Total table size is > > usually around 1GB and with indexes should be able to fit completely in > > main memory. > > Actually forcing things to use indexes is the wrong direction to go if > you're trying to process lots of data and want to stream it off disk as > rapidly as possible. I would think about whether you can structure your > data such that you can use sequential scans. That might mean partitioning > your raw data into separate tables and then accessing only the partitions > that are relevant to the query. > > In your application that might be hard. It sounds like you would need more > or less one table per stock ticker which would really be hard to manage. > > One thing you might look into is using the CLUSTER command. But postgres > doesn't maintain the cluster ordering so it would require periodically > rerunning it. > > I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s > even for completely random reads. Is it possible you're seeing something > else interfering? Do you have INSERT/UPDATE/DELETE transactions happening > concurrently with this select scan? If so you should strongly look into > separating the transaction log from the data files.
Greg Stark <gsstark@mit.edu> writes: > Actually forcing things to use indexes is the wrong direction to go if you're > trying to process lots of data and want to stream it off disk as rapidly as > possible. I would think about whether you can structure your data such that > you can use sequential scans. Agreed. > In your application that might be hard. It sounds like you would need more or > less one table per stock ticker which would really be hard to manage. Actually, in a previous lifetime I used to do pretty much the same stuff Matt is working on. The reason I suggested parallelizing is that what you want is usually not so much the 200day moving average of FOO, as the 200day moving averages of a whole bunch of things. If your input table contains time-ordered data for all those things, then a seqscan works out pretty well. > One thing you might look into is using the CLUSTER command. But postgres > doesn't maintain the cluster ordering so it would require periodically > rerunning it. If the desired sort order is time-based, it falls out pretty much for free in this application, because historical data doesn't change -- you are only interested in appending at the right. In said previous lifetime, we used Postgres for tracking our actual transactions, but we built a custom file format for storing the individual tick data. That's not stuff you need transactional semantics for; the historical data is what it is. Besides, you need to compress it as much as you can because there's always too much of it. Machines are faster and disk space cheaper than they were at the time, but I'd still question the wisdom of using a Postgres row for each daily bar, let alone finer-grain data. regards, tom lane
Matt Olson wrote: > Other databases like Oracle and DB2 implement some sort of row prefetch. Has > there been serious consideration of implementing something like a prefetch > subsystem? Does anyone have any opinions as to why this would be a bad idea > for postgres? > Postges is great for a multiuser environment and OLTP applications. However, > in this set up, a data warehouse, the observed performance is not what I > would hope for. Oracle doesn't pre-fetch data to get its fast results in this case. pre-fetching doesn't give you the 100 times speed increases. Bitmap indexes are very important for data mining. You might want to see http://www.it.iitb.ac.in/~rvijay/dbms/proj/ I have no idea how well developed this is, but this is often the biggest win with Data Warehousing. If it works, you'll get results back in seconds, if it doesn't you'll have plenty of time while your queries run to reflect on the possibility that commercial databases might actually have important features that haven't even penetrated the awareness of most free database developers. Another trick you can use with large data sets like this when you want results back in seconds is to have regularly updated tables that aggregate the data along each column normally aggregated against the main data set. Of couse, Pg doesn't have the nice features that make this just work and make queries against your data source faster (called "OLAP Query rewrite" in Oracle), so you'll have to put a lot of work into changing your application to figure out when to use the summary tables. As far as I know it doesn't have materialized views, either, so updating these summary tables is also a more complex task than just a single REFRESH command. Maybe some bright person will prove me wrong by posting some working information about how to get these apparently absent features working. You might also want to consider ditching RAID 5 and switching to plain mirroring. RAID 5 is a helluva performance penalty (by design). This is why they said RAID - fast, cheap, reliable - pick any two. RAID 5 ain't fast. But that's probably not your main problem. Sam. > > Regards, > > Matt Olson > Ocean Consulting > http://www.oceanconsulting.com/ > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> Another trick you can use with large data sets like this when you want > results > back in seconds is to have regularly updated tables that aggregate the data > along each column normally aggregated against the main data set. > Maybe some bright person will prove me wrong by posting some working > information about how to get these apparently absent features working. Most people just use simple triggers to maintain aggregate summary tables... Chris
On Wed, 2005-05-11 at 12:53 +0800, Christopher Kings-Lynne wrote: > > Another trick you can use with large data sets like this when you want > > results > > back in seconds is to have regularly updated tables that aggregate the data > > along each column normally aggregated against the main data set. > > > Maybe some bright person will prove me wrong by posting some working > > information about how to get these apparently absent features working. > > Most people just use simple triggers to maintain aggregate summary tables... Agreed. I've also got a view which calls a function that will 1) use the summary table where data exists, or 2) calculate the summary information, load it into summary table, and send a copy to the client (partial query results cache). It's not all nicely abstracted behind user friendly syntax, but most of those features can be cobbled together (with effort) in PostgreSQL. --
Christopher Kings-Lynne wrote: >> Another trick you can use with large data sets like this when you want >> results >> back in seconds is to have regularly updated tables that aggregate the >> data >> along each column normally aggregated against the main data set. > > >> Maybe some bright person will prove me wrong by posting some working >> information about how to get these apparently absent features working. > > > Most people just use simple triggers to maintain aggregate summary > tables... > > Chris However, if (insert) triggers prove to be too much of a performance hit, try cron'd functions that perform the aggregation for you. This system works well for us, using the pk's (sequence) for start and stop points. -- _______________________________ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. _______________________________
Quoting Christopher Kings-Lynne <chriskl@familyhealth.com.au>: > > Another trick you can use with large data sets like this when you > want > > results > > back in seconds is to have regularly updated tables that aggregate > the data > > along each column normally aggregated against the main data set. > > > Maybe some bright person will prove me wrong by posting some > working > > information about how to get these apparently absent features > working. > > Most people just use simple triggers to maintain aggregate summary > tables... Don't know if this is more appropriate to bizgres, but: What the first poster is talking about is what OLAP cubes do. For big aggregating systems (OLAP), triggers perform poorly, compared to messy hand-rolled code. You may have dozens of aggregates at various levels. Consider the effect of having each detail row cascade into twenty updates. It's particularly silly-looking when data is coming in as batches of thousands of rows in a single insert, e.g. COPY temp_table FROM STDIN; UPDATE fact_table ... FROM ... temp_table INSERT INTO fact_table ...FROM...temp_table (the above pair of operations is so common, Oracle added its "MERGE" operator for it). Hence my recent post (request) for using RULES to aggregate --- given no luck with triggers "FOR EACH STATEMENT".