Thread: Postgresql Caching

Postgresql Caching

From

Anon Mous

Date:

15 October 2006, 07:48:11

Hi

I may have a workable idea on a way to add caching to Postgres without
disturbing the MVCC functionality.

Caching, as I've been reading can provide an amazing and sometimes
almost unbelievable performance boost to a database based application,
especially for data that is rarely modified.

http://www.danga.com/memcached/users.bml

The problem, as Tom Lane suggested, and the reason caching hasn't become
more popular is simply because it's very complex to keep up to date
especially in an MVCC environment. But, most every database has some
tables (and usually these are some of the most popular) that would
benefit greatly from caching. For example security permissions or product
descriptions and prices in an e-commerce application. Other tables,
however, like orders and accounting, are very dynamic and cannot afford
to ever be stale.

However, the problem is surmountable and has been figured out by Oracle,
although I don't know how they did it:

http://www.oracle.com/technology/products/ias/joc/index.html

What if there was a way to automatically setup caching so data would
only ever be cached when it's not being changed?

I think that may be possible with an extendion the Postgresql api that
tracks which tables are involved in a transaction. Queries could be
cached until a transaction starts to modify a table. At that point, the
api passes all queries that reference that table to the database for
processing.

In reality, even these tables that are very active may see great speed
improvements from caching because most of the time transactions don't
make any modifications until they are near the end of the transaction.

So

Would it be possible to combine a special memcache implementation of
memcache with a Postgresql interface wrapper?

Additional Possibilities

If the caching was implemented in an api, perhaps one could also add
connection caching.

When Stale Data is OK

During a query request, let the application specifically say if they
would be OK with stale data no older than ___ seconds...

Yes, this would be a change to the api, but for those applications that
can withstand receiving stale data, it could provide a significant
performance boost on very active tables. I ran across a report recently that suggested for some applications/tables, a 10 second delay can reduce database hits by over 50 percent.

Automatic Tuning

Let the caching system automatically track how often a particular table
benefits from caching. If it doesn't, or its a critical table as defined by the database administrator always bypass caching on that table.

Write Caching

On some tables, like web sessions, it may be worthwhile to implement a
batching function where updates are written to disk (to be acid
compliant) and can later be more efficiently processed by the database
in a batch. Of course, the api would have to detect any queries touching
that table, and run the batch first, but it seems that some performance
gains are available since most of the requests will be for a single row,
and that would be available in the cache.

Thanks

Daniel