Query caching - Mailing list pgsql-general

From Steve Wolfe
Subject Query caching
Date
Msg-id 002001c04383$6ea110c0$50824e40@iboats.com
Whole thread Raw
In response to how good is PostgreSQL  ("Arnold Gamboa" <arnold@php4us.com>)
Responses Re: Query caching
List pgsql-general
> >    (Incidentally,  we've toyed  around with  developping a
query-caching
> > system that would sit betwen PostgreSQL and our DB libraries.
>
>  Sounds  amazing, but  requires some  research, I  guess. However,  in
many
> cases one  would be  more than  happy with  cahced connections.  Of
course,
> cahced query results  can be naturally added to that,  but just
connections
> are OK to start with. Security....

    To me, it doesn't sound like it would be that difficult of a project, at
least not for the likes of the PostgreSQL developpers.  It also doesn't seem
like it would really introduce any security problems, not if it were done
inside of PostgreSQL.  Long ago, I got sidetracked from my endeavors in C,
and so I don't feel that I'm qualified to do it.  (otherwise, I would have
done it already. : ) )   If you wanted it done in Perl or Object Pascal, I
could help. : )

    Here's a simple design that I was tossing back and forth.  Please
understand that I'm not saying this is the best way to do it, or even a good
way to do it.  Just a possible way to do it.  I haven't been able to give it
as much thought as I would like to.  Here goes.

------------
Implementation

    Upon starting, the PostgreSQL engine could allocate a chunk of memory,
sized according to the administrator's desire.  That chunk would be used
solely for query caching.

    When a query came in that was not cached (say, the first query), the
database engine would process it as normal.  It would then return it to the
user, and add it to the cache.  "Adding it to the cache" would mean that it
would enter the query itself, the result set, and a list of which tables the
query relied upon.  The query that is stored could be either the query
coming from the user, or the query after it goes through the optimizer.
Each has pros and cons, I would probably favor using the query that comes
from the user.

     When another query comes along, the caching engine would quickly look
in the hash table, and see if it already had the cached results of the
query.  If so, it returns them, and wham.  You've just avoided all of the
work of optimizing, parsing, and executing, not to mention the disk I/O.  A
hash lookup seems extremely cheap compared to the work of actually
processing a query.

     When an update/delete/insert comes along, the engine would analyze
which tables were affected, and clear the cache entries that relied upon
those tables.

-----------------
Cache Clearing

     Cache clearing would be achieved via an LRU-based algorithm, which
would also take into account the amount of RAM used by each query in the
cache.
-----------------
Performance Impact

   The potential performance differences range from a miniscule decrease to
a tremendous increase.  And it's a lot cheaper to throw an extra half gig of
RAM in a machine that to upgrade processors and disk subsystems!

------------------
Possible Changes

     One potential drawback is that when a table is modified, the queries
that rely upon it would be discarded.  Where a table is updated frequently,
that could greatly reduce the performance benefit.  One possible alternative
is to store the query cost with each query in the cache.  When a table is
updated, those queries are marked as "dirty".  If the system load is below a
certain amount, or the system has been idle, it could then re-execute those
queries and update the cache.  Which queries it re-executed would be
determined on a factor of query cost and how frequently those cache entries
were used.
-------------------

   The reason I would prefer it done in the PostgreSQL engine (as opposed to
in a middleware application) is that the caching engine needs to know (a)
which tables a query relies upon, and (b) which tables get changed.  It
seems that it would significantly reduce overhead to do those inside of
PostgreSQL (which is already doing the query parsing and analysis).

    This could certainly give PostgreSQL a huge advantage over other
database systems, too.   It could save administrators a very large chunk of
cash that they would otherwise have to spend on large systems.  And it would
just be cool.   ; )

steve



pgsql-general by date:

Previous
From: Lamar Owen
Date:
Subject: Re: postgres on redhat 7.0
Next
From: "os390 ibmos"
Date:
Subject: Re: postgres on redhat 7.0