Thread: Mostly read performance
I have a largely table-append-only application where most transactions are read-intensive and many are read-only. The transactions may span many tables, and in some cases might need to pull 70 MB of data out of a couple of the larger tables. In 7.3, I don't seem to see any file system or other caching that helps with repeated reads of the 70MB of data. Secondary fetches are pretty much as slow as the first fetch. (The 70MB in this example might take place via 2000 calls to a parameterized statement via JDBC). Were there changes after 7.3 w.r.t. caching of data? I read this list and see people saying that 8.0 will use the native file system cache to good effect. Is this true? Is it supposed to work with 7.3? Is there something I need to do to get postgresql to take advatage of large ram systems? Thanks for any advice.
Jeffrey Tenny wrote: > I have a largely table-append-only application where most transactions > are read-intensive and many are read-only. The transactions may span > many tables, and in some cases might need to pull 70 MB of data out of a > couple of the larger tables. > > > In 7.3, I don't seem to see any file system or other caching that helps > with repeated reads of the 70MB of data. Secondary fetches are pretty > much as slow as the first fetch. (The 70MB in this example might take > place via 2000 calls to a parameterized statement via JDBC). > > Were there changes after 7.3 w.r.t. caching of data? I read this list > and see people saying that 8.0 will use the native file system cache to > good effect. Is this true? Is it supposed to work with 7.3? Is there > something I need to do to get postgresql to take advatage of large ram > systems? > > Thanks for any advice. > Well, first off, the general recommendation is probably that 7.3 is really old, and you should try to upgrade to at least 7.4, though recommended to 8.0. The bigger questions: How much RAM do you have? How busy is your system? 8.0 doesn't really do anything to do make the system cache the data. What kernel are you using? Also, if your tables are small enough, and your RAM is big enough, you might already have everything cached. One way to flush the caches, is to allocate a bunch of memory, and then scan through it. Or maybe mmap a really big file, and access every byte. But if your kernel is smart enough, it could certainly deallocate pages after you stopped accessing them, so I can't say for sure that you can flush the memory cache. Usually, I believe these methods are sufficient. John =:->
Attachment
John A Meinel wrote: > Well, first off, the general recommendation is probably that 7.3 is > really old, and you should try to upgrade to at least 7.4, though > recommended to 8.0. There have been issues with each release that led me to wait. Even now I'm waiting for some things to settle in the 8.0 JDBC driver (timezones), and 7.3 has behaved well for me. But yes, I'd like to upgrade. > > The bigger questions: How much RAM do you have? How busy is your system? The system for testing was 512MB. I'm in the process of buying some additional memory. However there was no swap activity on that system, so I doubt memory was the limiting factor. > > 8.0 doesn't really do anything to do make the system cache the data. > What kernel are you using? 2.4.X for various large x. (Multiple systems). Gonna try 2.6.x soon. > > Also, if your tables are small enough, and your RAM is big enough, you > might already have everything cached. Well, that's what you'd expect. But a first time 70MB fetch on a freshly rebooted system took just as long as all secondary times. (Took over a minute to fetch, which is too long for my needs, at least on secondary attempts). > One way to flush the caches, is to allocate a bunch of memory, and then > scan through it. Or maybe mmap a really big file, and access every byte. > But if your kernel is smart enough, it could certainly deallocate pages > after you stopped accessing them, so I can't say for sure that you can > flush the memory cache. Usually, I believe these methods are sufficient. Not sure how that would really help. It doesn't seem like the database or file system is caching the table content either way, which led me to this inquiry.
On Thu, Aug 11, 2005 at 07:13:27PM -0400, Jeffrey Tenny wrote: >The system for testing was 512MB That's definately *not* a "large ram" system. If you're reading a subset of data that totals 70MB I'm going to guess that your data set is larger than or at least a large fraction of 512MB. >additional memory. However there was no swap activity on that system, >so I doubt memory was the limiting factor. The system won't swap if your data set is larger than your memory, it just won't cache the data. >Well, that's what you'd expect. But a first time 70MB fetch on a >freshly rebooted system took just as long as all secondary times. (Took >over a minute to fetch, which is too long for my needs, at least on >secondary attempts). If the query involves a table scan and the data set is larger than your available memory, you'll need a full scan every time. If you do a table scan and the table fits in RAM, subsequent runs should be faster. If you have an index and only need to look at a subset of the table, subsequent runs should be faster. Without knowing more about your queries it's not clear what your situation is. Mike Stone
Michael Stone <mstone+postgres@mathom.us> writes: > > Well, that's what you'd expect. But a first time 70MB fetch on a freshly > > rebooted system took just as long as all secondary times. (Took over a > > minute to fetch, which is too long for my needs, at least on secondary > > attempts). That's not impressively fast even for the disk. You should get up to about 40Mbit/s or 5MByte/s from the disk. Add some overhead for postgres; so I would expect a full table scan of 70MB to take more like 15-30s, not over a minute. What is your shared_buffers setting? Perhaps you have it set way too high or way too low? Also, you probably should post the "explain analyze" output of the actual query you're trying to optimize. Even if you're not looking for a better plan having hard numbers is better than guessing. And the best way to tell if the data is cached is having a "vmstat 1" running in another window. Start the query and look at the bi/bo columns. If you see bi spike upwards then it's reading from disk. -- greg