Re: Caching of Queries - Mailing list pgsql-performance
From | Iain |
---|---|
Subject | Re: Caching of Queries |
Date | |
Msg-id | 00b201c4a4ff$bd73b8b0$7201a8c0@mst1x5r347kymb Whole thread Raw |
In response to | Caching of Queries (Scott Kirkwood <scottakirkwood@gmail.com>) |
Responses |
Re: Caching of Queries
|
List | pgsql-performance |
Jim, I can only tell you (roughly) how it works wth Oracle, and it's a very well documented and laboured point over there - it's the cornerstone of Oracle's scalability architecture, so if you don't believe me, or my explanation is just plain lacking, then it wouldn't be a bad idea to check it out. The "other Tom" aka Tomas Kyte runs the Ask Tom site which is a great source of info on this. It's also very well explained in his book "Expert one on one Oracle" I think it was called. I havn't seen any reason yet as to why the same issues shouldn't, don't or wouldn't apply to pg. Your comment is both right and wrong. Yes, metadata lookups are essentially the same as as access methods for normal queries. Any time you read data in the DB you have to place a shared lock, often called a latch - it's a lightweight type of lock. The trouble is that while a data page can have multiple latches set at any time, only 1 process can be placing a a latch on a page at a time. This doesn't sound so serious so far, latches are "lightweight" afterall, however... even in a database of a billion rows and 100+ tables, the database metadata is a very _small_ area. You must put latches on the metadata tables to do optimization, so for example, if you are optimizing a 10 table join, you must queue up 10 times to place your latchs. You then do your optimization and queue up 10 more times to remove your latches. In fact it is worse than this, because you won't queue up 10 times it's more likely to be a hundred times since it is far more complex than 1 latch per table being optimized (you will be looking up statistics and other things). As I already said, even in a huge DB of a billion rows, these latches are happening on a realatively small and concentrated data set - the metadata. Even if there is no contention for the application data, the contention for the metadata may be furious. Consider this scenario, you have a 1000 users constantly submitting queries that must not only be soft parsed (SQL statement syntax) but hard parsed (optimized) because you have no query cache. Even if they are looking at completely different data, they'll all be queuing up for latches on the same little patch of metadata. Doubling your CPU speed or throwing in a fibre channel disk array will not help here, the system smply won't scale. Tom Lane noted that since the query cache would be in shared memory the contention issue does not go away. This is true, but I don't think that it's hard to see that the amount of contention is consderably less in any system that is taking advantage of the caching facility - ie applications using bind variables to reduce hard parsing. However, badly written applications (from the point of view of query cache utilization) could very well experience a degradation in performance. This could be handled with an option to disable caching - or even better to disable caching of any sql not using binds. I don't think even the mighty Oracle has that option. As you may have guessed, my vote is for implementing a query cache that includes plans. I have no specific preference as to data caching. It doesn't seem to be so important to me. Regards Iain > On Thu, Sep 23, 2004 at 08:29:25AM -0700, Mr Pink wrote: > > Not knowing anything about the internals of pg, I don't know how this relates, but in theory, > > query plan caching is not just about saving time re-planning queries, it's about scalability. > > Optimizing queries requires shared locks on the database metadata, which, as I understand it > > causes contention and serialization, which kills scalability. > > One of the guru's can correct me if I'm wrong here, but AFAIK metadata > lookups use essentially the same access methods as normal queries. This > means MVCC is used and no locking is required. Even if locks were > required, they would be shared read locks which wouldn't block each > other. > -- > Jim C. Nasby, Database Consultant decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
pgsql-performance by date: