Database Caching - Mailing list pgsql-hackers

From Greg Sabino Mullane
Subject Database Caching
Date
Msg-id E16gYpD-0007KY-00@mclean.mail.mindspring.net
Whole thread Raw
Responses Re: Database Caching  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Database Caching  (Karel Zak <zakkr@zf.jcu.cz>)
Re: Database Caching  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I had to make a relatively long drive yesterday, so I had lots of free time 
to do some thinking...and my thoughts were turning to caching and databases. 
The following is what I came up with: forgive me if it seems to be just an 
obvious ramble...

Why does a database need caching?

Normally, when one thinks of a database (or to be precise, a RDBMS) the 
ACID acronym comes up. This is concerned with having a stable database that 
can reliably be used by many users at the same time. Caching a query is 
unintuitive because it involves sharing information from transactions that 
may be separated by a great amount of time and/or by different users. 
However, from the standpoint of the database server, caching increases 
efficiency enormously. If 800 users all make the same query, then caching 
can help the database server backend (hereafter simply "database") to 
save part or all of the work it performs so it doesn't have to repeat the 
entire sequence of steps 800 times.

What is caching?

Caching basically means that we want to save frequently-used information 
into an easy to get to area. Usually, this means storing it into memory. 
Caching has three main goals: reducing disk access, reducing computation 
(i.e. CPU utilization), and speeding up the time as measured by how long a 
it takes a user to seea  result.  It does all this at the expense of RAM, 
and the tradeoff is almost always worth it.

In a database, there are three basic types of caching: query results, 
query plans, and relations.

The first, query result caching, simply means that we store into memory 
the exact output of a SELECT query for the next time that somebody performs 
that exact same SELECT query. Thus, if 800 people do a "SELECT * FROM foo", 
the database runs it for the first person, saves the results, and simply 
reads the cache for the next 799 requests. This saves the database from doing 
any disk access, practically removes CPU usage, and speeds up the query.

The second, query plan caching, involves saving the results of the optimizer, 
which is responsible for figuring out exactly "how" the databse is going to 
fetch the requested data. This type of caching usually involves a "prepared" 
query, which has almost all of the information needed to run the query with 
the exception of one or more "placeholders" (spots that are populated with 
variables at a later time). The query could also involve non-prepared 
statments as well. Thus, if someone prepares the query "SELECT flavor FROM 
foo WHERE size=?", and then executes it by sending in 300 different values 
for "size", the prepared statement is run through the optimizer, the r
esulting path is stored into the query plan cache, and the stored path is 
used for the 300 execute requests. Because the path is already known, the 
optimizer does not need to be called, which saves the database CPU and time.

The third, relation caching, simply involves putting the entire relation 
(usually a table or index) into memory so that it can be read quickly. 
This saves disk access, which basically means that it saves time. (This type 
of caching also can occur at the OS level, which caches files, but that will 
not be discussed here).

Those are the three basic types of caching, ways of implementing each are 
discussed below. Each one should complement the other, and a query may be 
able to use one, two, or all three of the caches.

I. Query result caching:

A query result cache is only used for SELECT queries that involve a 
relation (i.e. not for "SELECT version") Each cache entry has the following 
fields: the query itself, the actual results, a status, an access time, an 
access number, and a list of all included columns. (The column list actually 
tells as much information as needed to uniquely identify it, i.e. schema, 
database, table, and column). The status is merely an indicator of whether or 
not this cached query is valid. It may not be, because it may be invalidated 
for a user within a transaction but still be of use to others. 

When a select query is processed, it is first parsed apart into a basic common 
form, stripping whitespace, standardizing case, etc., in order to facilitate 
an accurate match. Note that no other pre-processing is really required, 
since we are only interested in exact matches that produce the exact same 
output. An advanced version of this would ideally be able to use the cached 
output of "SELECT bar,baz FROM foo" when it receives the query "SELECT 
baz,bar FROM foo", but that will require some advanced parsing. Possible, 
but probably not something to attempt in the first iteration of a query 
caching function. :) If there *is* a match (via a simple strcmp at first), 
and the status is marked as "valid", then the database simply uses the 
stored output, updates the access time and count, and exits. This should be 
extremely fast, as no disk access is needed, and almost no CPU. The 
complexity of the query will not matter either: a simple query will run just 
as fast as something with 12 sorts and 28 joins.

If a query is *not* already in the cache, then after the results are found 
and delivered to the user, the database will try and store them for the 
next appearance of that query. First, the size of the cache will be compared 
to the size of the query+output, to see if there is room for it. If there 
is, the query will be saved, with a status of valid, a time of 'now', a count 
of 1, a list of all affected columns found by parsing the query, and the total 
size of the query+output. If there is no room, then it will try to delete one 
or more to make room. Deleting can be done based on the oldest access time, 
smallest access count, or size of the query+output. Some balance of the first 
two would probably work best, with the access time being the most important. 
Everything will be configurable, of course.

Whenever a table is changed, the cache must be checked as well. A list of 
all columns that were actually changed is computed and compared against 
the list of columns for each query. At the first sign of a match, the 
query is marked as "invalid." This should happen before the changes are made 
to the table itself. We do not delete the query immediately since this may 
be inside of a transaction, and subject to rollback. However, we do need 
to mark it as invalid for the current user inside the current transaction: 
thus, the status flag. When the transaction is commited, all queries that have 
an "invalid" flag are deleted, then the tables are changed. Since the only 
time a query can be flagged as "invalid" is inside your own transaction, 
the deletion can be done very quickly.


II. Query plan caching

If a query is not cached, then it "falls through" to the next level of 
caching, the query plan. This can either be automatic or strictly on a 
user-requested format (i.e. through the prepare-execute paradigm). The latter 
is probably better, but it also would not hurt much to store non-explicitly 
prepared queries in this cache as long as there is room. This cache has a 
field for the query itself, the plan to be followed (i.e. scan this table, 
that index, sort the results, then group them), the columns used, the access 
time, the access count, and the total size. It may also want a simple flag 
of "prepared or non-prepared", where prepared indicates an explicitly 
prepared statment that has placeholders for future values. A good optimizer 
will actually change the plan based on the values plugged in to the prepared 
queries, so that information should become a part of the query itself as 
needed, and multiple queries may exist to handle different inputs. In 
general, most of the inputs will be similar enough to use the same path (e.g. 
"SELECT flavor FROM foo WHERE size=?" will most usually result in a simple 
numeric value for the executes). If a match *is* found, then the database 
can use the stored path, and not have to bother calling up the optimizer 
to figure it out. It then updates the access time, the access count, and
continues as normal. If a match was *not* found, then it might possibly 
want to be cached. Certainly, explicit prepares should always be cached. 
Non-explicitly prepared queries (those without placeholders) can also be 
cached. In theory, some of this will also be in the result cache, so that 
should be checked as well: it it is there, no reason to put it here. Prepared 
queries should always have priority over non-prepared, and the rest of the 
rules above for the result query should also apply, with a caveat that things 
that would affect the output of the optimizer (e.g. vacuuming) should also 
be taken into account when deleting entries.


III. Relation caching

The final cache is the relation itself, and simply involves putting the entire 
relation into memory. This cache has a field for the name of the relation, 
the table info itself, the type (indexes should ideally be cached more than 
tables, for example), the access time, and the acccess number. Loading could 
be done automatically, but most likely should be done according to a flag 
on the table itself or as an explicit command by the user.


Notes:

The "strcmp" used may seem rather crude, as it misses all but the exact 
same query, but it does cover most of the everyday cases. Queries are 
usually called through an application that keeps it in the same format, 
time after time, so the queries are very often exactly identical. A better 
parser would help, of course, but it would get rather complicated quickly.
Two quick examples: a web page that is read from a database is a query that 
is called many times with exactly the same syntax; a user doing a "refresh" 
to constantly check if something has changed since they last looked.

Sometimes a query may jump back to a previous type of cache, especially 
for things like subselects. The entire subselect query may not match, 
but the inner query should also be checked against the query result cache.

Each cache should have some configurable parameters, including the size in 
RAM, the maximum number of entries, and rules for adding and deleting.
They should also be directly viewable through a system table, so a DBA 
can quickly see exactly which queries are being cached and how often 
they are being used. There should be a command to quickly flush the cache, 
remove "old" entries, and to populate the query plan cache via a prepare 
statment. It should also be possible to do table changes without stopping 
to check the cache: perhaps flushing the cache and setting a global 
"cache is empty" flag would suffice.

Another problem is the prepare and execute: you are not always guaranteed 
to get a cached prepare if you do an execute, as it may have expired or 
there may simply be no more room. Those prepared statements inside a 
transaction should probably be flagged as "non-deletable" until the 
transaction is ended.

Storing the results of an execute in the query result cache is another 
problem. When a prepare is made, the database returns a link to that 
exact prepared statement in cache, so that all the client has to say is 
"run the query at 0x4AB3 with the value "12". Ideally, the database 
should be able to check these against the query result cache as well. It 
can do this by reconstructing the resulting query (by plugging the value into 
the prepared statement) or it can store the execute request as a type of 
query itself; instead of "SELECT baz FROM bar WHERE size=12" it would 
store "p0x4aB3:12".


The result cache would probaby be the easiest to implement, and also gives 
the most "bang for the buck." The path cache may be a bit harder, but is 
a very necessary feature. I don't know about the relation caching: it looks 
to be fairly easy, and I don't trust that, so I am guessing it is actually 
quite difficult.

Greg Sabino Mullane  greg@turnstep.com
PGP Key: 0x14964AC8 200202281132

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE8fqybvJuQZxSWSsgRAps9AKDwCkIH7GKSBjflyYSA0F7mQqD1MwCeJLCw
hqE1SxJ2Z7RxFGCu3UwIBrI=
=jlBy
-----END PGP SIGNATURE-----




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: elog() patch
Next
From: Bruce Momjian
Date:
Subject: Re: elog() patch