Thread: Re: eWeek Poll: Which database is most critical to your

Re: eWeek Poll: Which database is most critical to your

From
"Dann Corbit"
Date:
[snip]
> Very likely, it is only my limited understanding not really grasping
> what it is that you are trying to do.  Even so,  I don't think it
really
> helps even for read only queries, unless it is exactly the same query
> with the same parameter markers and everything that was issued before.
> That is very unusual.  Normally, you won't have the entire query hard-
> wired, but with allow the customer to do some sort of filtering of the
> data.

Hmmm... the more I think about it, the more unusual it would be for
_exactly_ the same query to be repeated a lot. However, the article
reported a significant performance gain when this feature was enabled.
That could mean that:
   (a) the performance measurements/benchmarks used by the article were
synthetic and don't reflect real database applications
   (b) the feature MySQL implements is different than the one I am
describing

When I get a chance I'll investigate further the technique used by MySQL
to see if (b) is the case. However, it is beginning to look like this
isn't a good idea, overall.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I did not read the article at all, but I am familiar with query cache
and in fact, I do it a lot (I work for a database company).  Here is
how the algorithm works:

You intercept every incoming query and parse it.  Any physical data
gets replaced with parameter markers.  A 64 bit hash is formed from the
parsed query with the parameter markers removed.  The hash is used as
an index into a skiplist which also stores the original query.  After
all, if a client has a million dollar request, he won't be happy that
the unbelievably rare thing happened and the checksums agreed.

You can add a counter to the data in the skiplist so that you know how
often the query happens.  The parsed query will only be useful to a
system that can save time from having a query prepared (most systems
call it preparing the query).  I was kind of surprised to see that
PostgreSQL does not have a prepare stage in libpq.  This can be a
very large speedup in query execution (for obvious reasons).
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


Re: eWeek Poll: Which database is most critical to your

From
"Ken Hirsch"
Date:
Here is the documentation about MySQL's new Query Cache.  I think that it
would be helpful, as they indicate, for dynamic web sites, such as Slashdot.
There are hundreds or maybe thousands of queries in between added comments
and there are probably only a few common combinations of
threshold/nesting/sort.

MySQL Query Cache
=================

From version 4.0.1, `MySQL server' features a `Query Cache'.  When in
use, the query cache stores the text of a `SELECT' query together with
the corresponding result that is sent to a client.  If another
identical query is received, the server can then retrieve the results
from the query cache rather than parsing and executing the same query
again.

The query cache is extremely useful in an environment where (some)
tables don't change very often and you have a lot of identical queries.
This is a typical situation for many web servers that use a lot of
dynamic content.

Following are some performance data for the query cache (We got these
by running the MySQL benchmark suite on a Linux Alpha 2x500 MHz with
2GB RAM and a 64MB query cache):
  * If you want to disable the query cache code set    `query_cache_size=0'.  By disabling the query cache code there
is   no noticeable overhead.
 
  * If all of the queries you're preforming are simple (such as    selecting a row from a table with one row); but
stilldiffer so    that the queries can not be cached, the overhead for having the    query cache active is 13%.  This
couldbe regarded as the worst    case scenario. However, in real life, queries are much more    complicated than our
simpleexample so the overhead is normally    significantly lower.
 
  * Searches after one row in a one row table is 238% faster.  This    can be regarded as close to the minimum speedup
tobe expected for    a query that is cached.
 

How The Query Cache Operates
----------------------------

Queries are compared before parsing, thus
    SELECT * FROM TABLE

and
    Select * from table

are regarded as different queries for query cache, so queries need to
be exactly the same (byte for byte) to be seen as identical.  In
addition, a query may be seen as different if for instance one client
is using a new communication protocol format or another character set
than another client.

Queries that uses different databases, uses different protocol versions
or the uses different default character sets are considered different
queries and cached separately.

The cache does work for `SELECT CALC_ROWS ...' and `SELECT FOUND_ROWS()
...' type queries because the number of found rows is also stored in
the cache.

If a table changes (`INSERT', `UPDATE', `DELETE', `TRUNCATE', `ALTER'
or `DROP TABLE|DATABASE'), then all cached queries that used this table
(possibly through a `MRG_MyISAM' table!) become invalid and are removed
from the cache.

Currently all `InnoDB' tables are invalidated on `COMMIT', in the
future this will be changed so only tables changed in the transaction
cause the corresponding cache entries to be invalidated.

A query cannot be cached if it contains one of the functions:
*Function*         *Function*         *Function*         *Function*
`User Defined      `CONNECTION_ID'    `FOUND_ROWS'       `GET_LOCK'
Functions'
`RELEASE_LOCK'     `LOAD_FILE'        `MASTER_POS_WAIT'  `NOW'
`SYSDATE'          `CURRENT_TIMESTAMP'`CURDATE'          `CURRENT_DATE'
`CURTIME'          `CURRENT_TIME'     `DATABASE'         `ENCRYPT' (with
       one parameter)
 
`LAST_INSERT_ID'   `RAND'             `UNIX_TIMESTAMP'   `USER'                                     (without
                        parameters)
 
`BENCHMARK'

Nor can a query be cached if it contains user variables, if it is of
the form `SELECT ... IN SHARE MODE' or of the form `SELECT * FROM
AUTOINCREMENT_FIELD IS NULL' (to retrieve last insert id - ODBC work
around).

However, `FOUND ROWS()' will return the correct value, even if the
preceding query was fetched from the cache.

Queries that don't use any tables or if the user has a column privilege
for any of the involved tables are not cached.

Before a query is fetched from the query cache, MySQL will check that
the user has SELECT privilege to all the involved databases and tables.
If this is not the case, the cached result will not be used.

Query Cache Configuration
-------------------------

The query cache adds a few `MySQL' system variables for `mysqld' which
may be set in a configuration file, on the command line when starting
`mysqld'.
  * `query_cache_limit' Don't cache results that are bigger than this.    (Default 1M).
  * `query_cache_size' The memory allocated to store results from old    queries.  If this is 0, the query cache is
disabled(default).
 
  * `query_cache_startup_type' This may be set (only numeric) to    *Option*             *Description*    0
      (OFF, don't cache or retrieve results)    1                    (ON, cache all results except `SELECT
          SQL_NO_CACHE ...' queries)    2                    (DEMAND, cache only `SELECT SQL_CACHE ...'
       queries)
 

Inside a thread (connection), the behaviour of the query cache can be
changed from the default. The syntax is as follows:

`SQL_QUERY_CACHE_TYPE = OFF | ON | DEMAND' `SQL_QUERY_CACHE_TYPE = 0
| 1  | 2'

*Option*               *Description*
0 or OFF               Don't cache or retrieve results.
1 or ON                Cache all results except `SELECT SQL_NO_CACHE                      ...' queries.
2 or DEMAND            Cache only `SELECT SQL_CACHE ...' queries.

By default `SQL_QUERY_CACHE_TYPE' depends on the value of
`query_cache_startup_type' when the thread was created.

Query Cache Options in `SELECT'
-------------------------------

There are two possible query cache related parameters that may be
specified in a `SELECT' query:

*Option*               *Description*
`SQL_CACHE'            If `SQL_QUERY_CACHE_TYPE' is `DEMAND', allow the                      query to be cached.   If
`SQL_QUERY_CACHE_TYPE'                     is `ON', this is the default.   If
`SQL_QUERY_CACHE_TYPE'is `OFF', do nothing.
 
`SQL_NO_CACHE'         Make this query non-cachable, don't allow this                      query to be stored in the
cache.

Query Cache Status and Maintenance
----------------------------------

With the `FLUSH QUERY CACHE' command you can defragment the query cache
to better utilise its memory. This command will not remove any queries
from the cache.  `FLUSH TABLES' also flushes the query cache.

The `RESET QUERY CACHE' command removes all query results from the
query cache.

You can monitor query cache performance in `SHOW STATUS':

*Variable*             *Description*
`Qcache_queries_in_cache'Number of queries registered in the cache.
`Qcache_inserts'       Number of queries added to the cache.
`Qcache_hits'          Number of cache hits.
`Qcache_not_cached'    Number of non-cached queries  (not cachable, or                      due to
`SQL_QUERY_CACHE_TYPE').
`Qcache_free_memory'   Amount of free memory for query cache.
`Qcache_total_blocks'  Total number of blocks in query cache.
`Qcache_free_blocks'   Number of free memory blocks in query cache.

Total number of queries = `Qcache_inserts' + `Qcache_hits' +
`Qcache_not_cached'.

The query cache uses variable length blocks, so `Qcache_total_blocks'
and `Qcache_free_blocks' may indicate query cache memory fragmentation.
After `FLUSH QUERY CACHE' only a single (big) free block remains.

Note: Every query needs a minimum of two blocks (one for the query text
and one or more for the query results). Also, every table that is used
by a query needs one block, but if two or more queries use same table
only one block needs to be allocated.