Thread: Having query cache in core
Does anybody think having in-memory query result cache in core is a good idea? From the experience of implementing the feature in Pgpool-II, I would think this is not terribly hard job. But first of all I'm wondering if there's a demand for the feature. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Tatsuo Ishii <ishii@sraoss.co.jp> writes: > Does anybody think having in-memory query result cache in core is a > good idea? No. regards, tom lane
Tom Lane wrote: > Tatsuo Ishii <ishii@sraoss.co.jp> writes: > > Does anybody think having in-memory query result cache in core is a > > good idea? > > No. I agree. Having been bitten by the feature on MySQL, I think it's not a good thing. Essentially, it's a band-aid for badly written applications, but it will only help in certain cases and hurts in many others. Yours, Laurenz Albe
On 07/05/18 05:47, Tom Lane wrote: > Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> Does anybody think having in-memory query result cache in core is a >> good idea? > > No. Agreed. You could probably write an extension for that, though. I think the planner hook and custom scans give you enough flexibility to do that without modifying the server code. - Heikki
> Having been bitten by the feature on MySQL, I think it's not a good thing. Even in MySQL itself this feature was already removed.
On 07.05.2018 08:23, Laurenz Albe wrote: > Having been bitten by the feature on MySQL, I think it's not a good thing. > > Essentially, it's a band-aid for badly written applications, but it will > only help in certain cases and hurts in many others. The MySQL query cache helped quite a bit in the early web days, before session handling became a thing (e.g. before PHP 4.0), before massive caching on other layers, and when most machines were still having a single CPU core only. With multiple cores the cost of query cache maintenance, and especially result purging, quickly outweighed the gain though. I'd assume that this would be even more of a burden in a multi-process model as it is in MySQLs one-process-multiple-threads model already. And so for about a decade now there's a simple rule: Q: What is the ptimal size for my query cache? A: zero (There's even a tuning wizzard web site for this, created by a former team member of mine: <https://dom.as/tech/query-cache-tuner/> ) All of the above is probably true for all query result caches that try to invalidate cache entries as soon as the underlying tables change. Caches with a simple "time-to-live" logic will not suffer from cache maintenance contention, but at the same time they don't really need any information available on the server side only. So having such kind of cache in a separate proxy process, like pqc, or on the MySQL/MariaDB side e.g. in the cache modules for MaxScale and ProxySQL, looks like the better approach for me. PS: I tried PQC for one of my PostGIS based OpenStreetMap projects, there i suffer from "badly written application", or actually from too many layers of abstraction. For rendering maps in different file formats (SVG, PDF, PNG) I need to run the full Mapnik rendering queue multiple times, even though the map bounding box, and so the underlying queries from the Mapnik style sheet, actually stay the same. Even with that setup, a set of moderately complex queries on a rather large database being run three times in a row, adding PQC didn't provide that much of an improvement though, and in the end I didn't bother to have to take care of yet another service component in the setup. -- Hartmut Holzgraefe, Principal Support Engineer (EMEA) MariaDB Corporation | http://www.mariadb.com/
On 07.05.2018 05:32, Tatsuo Ishii wrote: > Does anybody think having in-memory query result cache in core is a > good idea? From the experience of implementing the feature in > Pgpool-II, I would think this is not terribly hard job. But first of > all I'm wondering if there's a demand for the feature. Do you want to implement this cache locally in backend? If so, then I do not think that this cache will be able to demonstrate some noticeable improvements of performance. Yes, it is possible to expect that there is some subset of queries which is used to be execute by application multiple times. But most likely such query will be issued by different clients. It is unlikely that the same client will execute the same query (with the same parameter values) more than once. Usually clients are using ORM which will in any case somehow cache fetched data. Global result cache (like one used in mySQL) may be more efficient. But I think it is better to start first with 1. Global prepared statements cache 2. Global catalog cache 3. Global relation cache Switch to global caches seems to be a challenged task, requiring a lot of changes in Postgres core. But I think that sometime we will have to implement them in any case (as it was done in most of other DBMSes). Concerning result cache, I think it will be better to ask opinion of mysql users: how useful it is. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] > But I think it is better to start first with > 1. Global prepared statements cache > 2. Global catalog cache > 3. Global relation cache May I ask why prepared statements need to precede catalog and relation caches? We're suffering from the bloat of catalogand relation caches, and are thinking of proposing placing those caches in shared memory. > Switch to global caches seems to be a challenged task, requiring a lot > of changes in Postgres core. > But I think that sometime we will have to implement them in any case (as > it was done in most of other DBMSes). Agreed. I respect your attitude toward revolutionizing PostgreSQL. > Concerning result cache, I think it will be better to ask opinion of > mysql users: how useful it is. And possibly Oracle Database users, as Oracle implemented it relatively recently, IIRC. Regards Takayuki Tsunakawa
On 07.05.2018 10:12, Konstantin Knizhnik wrote: > Concerning result cache, I think it will be better to ask opinion of > mysql users: how useful it is. It isn't useful. I haven't seen a customer case in years where the query cache would have done any good. It is off by default ever since MySQL 5.5 (became "GA" in 2010), and has now finally been removed from MySQL 8.0. It is still there in MariaDB, and as far as I can tell there are no plans to remove it yet, at least not in the next two major releases 10.3 and 10.4. At the same time we (wearing my MariaDB Corp. hat right now) have added a "time-to-live" cache module to our MaxScale proxy solution, so moving caching out of the server for use cases where a time-to-live cache is "good enough" and immediate cache entry invalidation on changes of underlying data is not a requirement, so that getting outdated results back if they are not too old is OK. -- Hartmut Holzgraefe, Principal Support Engineer (EMEA) MariaDB Corporation | http://www.mariadb.com/
On 07.05.2018 11:24, Tsunakawa, Takayuki wrote: > From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] >> But I think it is better to start first with >> 1. Global prepared statements cache >> 2. Global catalog cache >> 3. Global relation cache > May I ask why prepared statements need to precede catalog and relation caches? We're suffering from the bloat of catalogand relation caches, and are thinking of proposing placing those caches in shared memory. > Sorry, I didn't assume some particular order here. Yes, shared catalog and relation cache seems to be more important to implement first. > >> Switch to global caches seems to be a challenged task, requiring a lot >> of changes in Postgres core. >> But I think that sometime we will have to implement them in any case (as >> it was done in most of other DBMSes). > Agreed. I respect your attitude toward revolutionizing PostgreSQL. > > >> Concerning result cache, I think it will be better to ask opinion of >> mysql users: how useful it is. > And possibly Oracle Database users, as Oracle implemented it relatively recently, IIRC. > > Regards > Takayuki Tsunakawa > -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
> On 07.05.2018 05:32, Tatsuo Ishii wrote: >> Does anybody think having in-memory query result cache in core is a >> good idea? From the experience of implementing the feature in >> Pgpool-II, I would think this is not terribly hard job. But first of >> all I'm wondering if there's a demand for the feature. > > Do you want to implement this cache locally in backend? No, as a global cache, if I employ the design of Pgpool-II. Pgpool-II offsers two types of query result cache: in the shared memory or using external cache server (i.e. memcached). > Global result cache (like one used in mySQL) may be more efficient. > But I think it is better to start first with > 1. Global prepared statements cache > 2. Global catalog cache > 3. Global relation cache Are they totally different story from query result cache, no? > Concerning result cache, I think it will be better to ask opinion of > mysql users: how useful it is. One RedShift user used to use the query result cache in Pgpool-II (I think they moved to more application layer cache). So I just wonder if there is any demand for in-core cache. I think one merit to have it in-core is, it's easy to get catalog info: Pgpool-II works hard to figure out if the query result is safe to cache or not. e.g. the SELECT uses stable functions or not, uses temp tables or not. In-core cache eliminates the hard work (no need to say cache invalidation). On the other hand, query cache definitely avoid heavy SELECT run twice, but user need to connect to PostgreSQL if it's in-core. Having query cache in the middle ware like Pgpool-II could completely avoid accessing PostgreSQL. Anyway, I already have a query cache in Pgpool-II. So unless users are eager to have it in core, I will not be so interested in implementing it. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 07.05.2018 11:58, Tatsuo Ishii wrote: >> On 07.05.2018 05:32, Tatsuo Ishii wrote: >>> Does anybody think having in-memory query result cache in core is a >>> good idea? From the experience of implementing the feature in >>> Pgpool-II, I would think this is not terribly hard job. But first of >>> all I'm wondering if there's a demand for the feature. >> Do you want to implement this cache locally in backend? > No, as a global cache, if I employ the design of Pgpool-II. Pgpool-II > offsers two types of query result cache: in the shared memory or using > external cache server (i.e. memcached). > >> Global result cache (like one used in mySQL) may be more efficient. >> But I think it is better to start first with >> 1. Global prepared statements cache >> 2. Global catalog cache >> 3. Global relation cache > Are they totally different story from query result cache, no? Well, catalog/relation/plan caches are really not related with query result cache. But from my point of view first three are much more important and critical. Just because catalog is used by all backends. It is possible to discuss/investigate how frequently DBMS applications are issuing the queries returning the same result (for example, search engines have classical 90/10 relation: most of users are doing the same queries, so caching is vital for Google&Co). But I am not sure that it is true for database applications... But there are no doubts, at least for OLAP, that applications used to run queries with the same query execution plan (but with different parameters). So having global prepared statement cache seems to be much more useful than caching results. Also, implementations of all shared caches have to solve the similar problems: access synchronization, invalidation,... In this sense query result cache implementation will be similar with other shared caches implementation. If we have infrastructure for building efficient shared caches (lockless algorithms, smart invalidation, ...) then it will be not so difficult to implement result cache on top of it. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sun, May 6, 2018 at 10:32 PM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > Does anybody think having in-memory query result cache in core is a > good idea? From the experience of implementing the feature in > Pgpool-II, I would think this is not terribly hard job. But first of > all I'm wondering if there's a demand for the feature. Caching results doesn't seem very useful to me, but caching plans does. I think the challenges are formidable, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-05-07 13:52:26 -0400, Robert Haas wrote: > On Sun, May 6, 2018 at 10:32 PM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > Does anybody think having in-memory query result cache in core is a > > good idea? From the experience of implementing the feature in > > Pgpool-II, I would think this is not terribly hard job. But first of > > all I'm wondering if there's a demand for the feature. > > Caching results doesn't seem very useful to me, but caching plans > does. I think the challenges are formidable, though. +1 Greetings, Andres Freund
2018-05-07 19:52 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:
On Sun, May 6, 2018 at 10:32 PM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> Does anybody think having in-memory query result cache in core is a
> good idea? From the experience of implementing the feature in
> Pgpool-II, I would think this is not terribly hard job. But first of
> all I'm wondering if there's a demand for the feature.
Caching results doesn't seem very useful to me, but caching plans
does. I think the challenges are formidable, though.
My last customer has queries where planning time is 10 sec, unfortunately execution time in worst case (badly optimized is 200ms).
Can we introduce some planning cost to cache only expensive queries, or some hints for query plan control.
For interactive application only for one subset of queries the plan cache is interesting.
@1 There are long queries - the planning time is not significant (although can be high), and then plan cache is not important
@2 there are fast queries with fast planning time - usually we don't need plan cache too
@3 there are fast queries with long planning time - and then plan cache is very interesting - can be difficult to separate this case from @1.
Regards
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, May 7, 2018 at 2:32 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > For interactive application only for one subset of queries the plan cache is > interesting. > > @1 There are long queries - the planning time is not significant (although > can be high), and then plan cache is not important > @2 there are fast queries with fast planning time - usually we don't need > plan cache too > @3 there are fast queries with long planning time - and then plan cache is > very interesting - can be difficult to separate this case from @1. That's not my experience. I agree that plan caching isn't important for long-running queries, but I think it *is* potentially important for fast queries with fast planning time. Even when the planning time is fast, it can be a significant percentage of the execution time. Not long ago, we had a case at EDB where a customer was getting custom plans instead of generic plans and that resulted in a significant reduction in TPS. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 07/05/18 05:47, Tom Lane wrote: >> Tatsuo Ishii <ishii@sraoss.co.jp> writes: >>> Does anybody think having in-memory query result cache in core is a >>> good idea? >> No. > > Agreed. > > You could probably write an extension for that, though. I think the > planner hook and custom scans give you enough flexibility to do that > without modifying the server code. I have simulated the idea and I wonder how to implement the query result cache on the streaming standby servers because no DML/DDL are executed on standby servers, that makes it impossible to invalidate the query cache. Probably the only way to allow to use the query cache is, 1) Let invalidate the query cache on the primary server. 2) The cache storage needed to be on the external cache server like memcached. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Wed, May 09, 2018 at 09:04:04AM +0900, Tatsuo Ishii wrote: > I have simulated the idea and I wonder how to implement the query > result cache on the streaming standby servers because no DML/DDL are > executed on standby servers, that makes it impossible to invalidate > the query cache. Probably the only way to allow to use the query cache > is, > > 1) Let invalidate the query cache on the primary server. > > 2) The cache storage needed to be on the external cache server like > memcached. Or a hook within the REDO loop which can be processed for each record? You could take any actions needed with that by tracking mostly heap records. -- Michael
Attachment
From: Robert Haas [mailto:robertmhaas@gmail.com] > That's not my experience. I agree that plan caching isn't important > for long-running queries, but I think it *is* potentially important > for fast queries with fast planning time. Even when the planning time > is fast, it can be a significant percentage of the execution time. > Not long ago, we had a case at EDB where a customer was getting custom > plans instead of generic plans and that resulted in a significant > reduction in TPS. I have the same experience with our customers. Their batch applications suffered from performance compared to Oracle andSQL Server. Those apps repeatedly issued simple SELECT statements that retrieve a single row. Regards Takayuki Tsunakawa
> You could probably write an extension for that, though. I think the > planner hook and custom scans give you enough flexibility to do that > without modifying the server code. Thanks for the advice. But I rather thought about bypassing the raw parser and the planner. i.e. use the query string (or its hash) as the index of the query cache. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, May 11, 2018, 7:13 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> You could probably write an extension for that, though. I think the
> planner hook and custom scans give you enough flexibility to do that
> without modifying the server code.
Thanks for the advice. But I rather thought about bypassing the raw
parser and the planner. i.e. use the query string (or its hash) as the
index of the query cache.
I think you need to know which tables are involved and if they were modified.
Regards,
-cktan
On 11.05.2018 11:12, Tatsuo Ishii wrote: > Thanks for the advice. But I rather thought about bypassing the raw > parser and the planner. i.e. use the query string (or its hash) as the > index of the query cache. that's almost actually how the MySQL query cache works: the query cache lookup kicks in even before the query is parsed, to get maximum gain from cache hits. It does not just take the query text into account alone though, but also the current default database, protocol version, and character set/collation settings of the client session, asl all these may have an influence on the actual result values, too ... -- Hartmut Holzgraefe, Principal Support Engineer (EMEA) MariaDB Corporation | http://www.mariadb.com/
>> Thanks for the advice. But I rather thought about bypassing the raw >> parser and the planner. i.e. use the query string (or its hash) as the >> index of the query cache. >> > > I think you need to know which tables are involved and if they were > modified. Of course. While creating a cache entry for a SELECT, we need to analyze it and extract tables involved in the SELECT. The information should be stored along with the cache entry. If any of the tables were modified, cache entries using the table must be removed. (these are already implemented in Pgpool-II's in memory query cache) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> If any of the tables were modified, cache entries using the table must be removed.
> (these are already implemented in Pgpool-II's in memory query cache)
How do you identify updates made from a pl/pgsql procedure?
Vladimir
On 11.05.2018 14:26, Tatsuo Ishii wrote: > If any of the tables were > modified, cache entries using the table must be removed. > (these are already implemented in Pgpool-II's in memory query cache) ... and this is the expensive part in the MySQL implementation that has rendered the query cache mostly useless for the last decade or so: Unless you come up with a clever lockless implementation concurrent writes on the same table effectively become serialized by this, creating serious contention problems. Peter Zaitsev once listed several points that could be improved to make the query cache somewhat useful again, but in the end noone really seemed to be interested in really doing so, including Percona themselves, as apparently even without the contention issues there are only few workloads these days that would significantly profit from cached result sets. https://www.percona.com/blog/2011/04/10/should-we-give-a-mysqlquery-cache-a-second-chance/ Maybe this list can be taken as a "what to avoid" hint sheet ... -- hartmut
>> If any of the tables were modified, cache entries using the table must be > removed. >> (these are already implemented in Pgpool-II's in memory query cache) > > How do you identify updates made from a pl/pgsql procedure? Pgpool-II does not invalidate query cache in that case. I think in-core query cache could deal with the case (probably as Michael suggested). Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, May 11, 2018, 10:26 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>
> I think you need to know which tables are involved and if they were
> modified.
Of course. While creating a cache entry for a SELECT, we need to
analyze it and extract tables involved in the SELECT. The information
should be stored along with the cache entry. If any of the tables were
modified, cache entries using the table must be removed.
(these are already implemented in Pgpool-II's in memory query cache)
How do you handle tables hiding behind views? Also how does cached entries in pgpools know if some tables are modified without going thru pgpool, eg pgplsql or trigger or via psql directly?
=cktan
> that's almost actually how the MySQL query cache works: the query > cache > lookup kicks in even before the query is parsed, to get maximum gain > from cache hits. > > It does not just take the query text into account alone though, > but also the current default database, protocol version, > and character > set/collation settings of the client session, asl all these may have > an influence on the actual result values, too ... Plus checking username is neccessary (otherwise any user could retrieve a cache for a table lookup which is not permitted by other users). Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 11.05.2018 18:01, Tatsuo Ishii wrote: > Plus checking username is neccessary (otherwise any user could > retrieve a cache for a table lookup which is not permitted by other > users). as the tables a cached query operated on is known anyway -- it's needed to purge cache entries when table content changes -- schema and table level SELECT privileges can be checked ... I'm not fully sure about how MySQL handles its column level privileges in that respect, something I'd need to try out ... -- hartmut
> On 11.05.2018 18:01, Tatsuo Ishii wrote: >> Plus checking username is neccessary (otherwise any user could >> retrieve a cache for a table lookup which is not permitted by other >> users). > > as the tables a cached query operated on is known anyway -- it's > needed > to purge cache entries when table content changes -- schema and table > level SELECT privileges can be checked ... I'm not fully sure about > how MySQL handles its column level privileges in that respect, > something > I'd need to try out ... I am not talking about cache invalidation. If a cache entry is created for a table which is only accessable by user A, the cache entry should be hit for only A, not someone else. Otherwise it will be a serious security problem. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> > I think you need to know which tables are involved and if they were >> > modified. >> >> Of course. While creating a cache entry for a SELECT, we need to >> analyze it and extract tables involved in the SELECT. The information >> should be stored along with the cache entry. If any of the tables were >> modified, cache entries using the table must be removed. >> (these are already implemented in Pgpool-II's in memory query cache) >> > > How do you handle tables hiding behind views? Also how does cached entries > in pgpools know if some tables are modified without going thru pgpool, eg > pgplsql or trigger or via psql directly? Pgpool-II do not invalidate cache entries for views, triggers and others. That's an limitation of the implementation in Pgpool-II. I think in-core query cache would not have the limitation because it would have a full access to system catalogs and wal. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Sat, May 12, 2018 at 8:18 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>
> How do you handle tables hiding behind views? Also how does cached entries
> in pgpools know if some tables are modified without going thru pgpool, eg
> pgplsql or trigger or via psql directly?
Pgpool-II do not invalidate cache entries for views, triggers and
others. That's an limitation of the implementation in Pgpool-II.
I think in-core query cache would not have the limitation because it
would have a full access to system catalogs and wal.
Yes. That's my point. You can't do it outside the core.
-cktan
On 2018-05-12 08:20:13 +1000, CK Tan wrote: > On Sat, May 12, 2018 at 8:18 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > > > > > How do you handle tables hiding behind views? Also how does cached > > entries > > > in pgpools know if some tables are modified without going thru pgpool, eg > > > pgplsql or trigger or via psql directly? > > > > Pgpool-II do not invalidate cache entries for views, triggers and > > others. That's an limitation of the implementation in Pgpool-II. > > > > I think in-core query cache would not have the limitation because it > > would have a full access to system catalogs and wal. > > > > > Yes. That's my point. You can't do it outside the core. There's a lot of possibilities between "external daemon" and "in core". ISTM that it's likely that the current extension API already make this possible. But if not, it seems like adding the few missing hooks wouldn't be that much work. Greetings, Andres Freund