Thread: Postgresql Caching
Hi
I may have a workable idea on a way to add caching to Postgres without
disturbing the MVCC functionality.
Caching, as I've been reading can provide an amazing and sometimes
almost unbelievable performance boost to a database based application,
especially for data that is rarely modified.
http://www.danga.com/memcached/users.bml
The problem, as Tom Lane suggested, and the reason caching hasn't become
more popular is simply because it's very complex to keep up to date
especially in an MVCC environment. But, most every database has some
tables (and usually these are some of the most popular) that would
benefit greatly from caching. For example security permissions or product
descriptions and prices in an e-commerce application. Other tables,
however, like orders and accounting, are very dynamic and cannot afford
to ever be stale.
However, the problem is surmountable and has been figured out by Oracle,
although I don't know how they did it:
http://www.oracle.com/technology/products/ias/joc/index.html
What if there was a way to automatically setup caching so data would
only ever be cached when it's not being changed?
I think that may be possible with an extendion the Postgresql api that
tracks which tables are involved in a transaction. Queries could be
cached until a transaction starts to modify a table. At that point, the
api passes all queries that reference that table to the database for
processing.
In reality, even these tables that are very active may see great speed
improvements from caching because most of the time transactions don't
make any modifications until they are near the end of the transaction.
So
Would it be possible to combine a special memcache implementation of
memcache with a Postgresql interface wrapper?
Additional Possibilities
If the caching was implemented in an api, perhaps one could also add
connection caching.
When Stale Data is OK
During a query request, let the application specifically say if they
would be OK with stale data no older than ___ seconds...
Yes, this would be a change to the api, but for those applications that
can withstand receiving stale data, it could provide a significant
performance boost on very active tables. I ran across a report recently that suggested for some applications/tables, a 10 second delay can reduce database hits by over 50 percent.
Automatic Tuning
Let the caching system automatically track how often a particular table
benefits from caching. If it doesn't, or its a critical table as defined by the database administrator always bypass caching on that table.
Write Caching
On some tables, like web sessions, it may be worthwhile to implement a
batching function where updates are written to disk (to be acid
compliant) and can later be more efficiently processed by the database
in a batch. Of course, the api would have to detect any queries touching
that table, and run the batch first, but it seems that some performance
gains are available since most of the requests will be for a single row,
and that would be available in the cache.
Thanks
Daniel
I may have a workable idea on a way to add caching to Postgres without
disturbing the MVCC functionality.
Caching, as I've been reading can provide an amazing and sometimes
almost unbelievable performance boost to a database based application,
especially for data that is rarely modified.
http://www.danga.com/memcached/users.bml
The problem, as Tom Lane suggested, and the reason caching hasn't become
more popular is simply because it's very complex to keep up to date
especially in an MVCC environment. But, most every database has some
tables (and usually these are some of the most popular) that would
benefit greatly from caching. For example security permissions or product
descriptions and prices in an e-commerce application. Other tables,
however, like orders and accounting, are very dynamic and cannot afford
to ever be stale.
However, the problem is surmountable and has been figured out by Oracle,
although I don't know how they did it:
http://www.oracle.com/technology/products/ias/joc/index.html
What if there was a way to automatically setup caching so data would
only ever be cached when it's not being changed?
I think that may be possible with an extendion the Postgresql api that
tracks which tables are involved in a transaction. Queries could be
cached until a transaction starts to modify a table. At that point, the
api passes all queries that reference that table to the database for
processing.
In reality, even these tables that are very active may see great speed
improvements from caching because most of the time transactions don't
make any modifications until they are near the end of the transaction.
So
Would it be possible to combine a special memcache implementation of
memcache with a Postgresql interface wrapper?
Additional Possibilities
If the caching was implemented in an api, perhaps one could also add
connection caching.
When Stale Data is OK
During a query request, let the application specifically say if they
would be OK with stale data no older than ___ seconds...
Yes, this would be a change to the api, but for those applications that
can withstand receiving stale data, it could provide a significant
performance boost on very active tables. I ran across a report recently that suggested for some applications/tables, a 10 second delay can reduce database hits by over 50 percent.
Automatic Tuning
Let the caching system automatically track how often a particular table
benefits from caching. If it doesn't, or its a critical table as defined by the database administrator always bypass caching on that table.
Write Caching
On some tables, like web sessions, it may be worthwhile to implement a
batching function where updates are written to disk (to be acid
compliant) and can later be more efficiently processed by the database
in a batch. Of course, the api would have to detect any queries touching
that table, and run the batch first, but it seems that some performance
gains are available since most of the requests will be for a single row,
and that would be available in the cache.
Thanks
Daniel
On Sun, Oct 15, 2006 at 03:41:25AM -0700, Anon Mous wrote: > However, the problem is surmountable and has been figured out by Oracle, > although I don't know how they did it: > > http://www.oracle.com/technology/products/ias/joc/index.html I'm pretty sure this is application-side caching. The application is able to cache intelligently and efficiently, as it is able to make assumptions. It's only seeing one view of the data. The view is internally consistent, and only the application is making changes to the view that it sees. On the rest of your thinking: Are people comfortable in believing that tables that do not change are a valuable point to consider for caching? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On 10/15/06, Anon Mous <soundami@yahoo.com> wrote: > Would it be possible to combine a special memcache implementation of > memcache with a Postgresql interface wrapper? have you seen http://people.freebsd.org/~seanc/pgmemcache/ merlin
On Sun, Oct 15, 2006 at 08:31:36PM +0530, Merlin Moncure wrote: > On 10/15/06, Anon Mous <soundami@yahoo.com> wrote: > > Would it be possible to combine a special memcache implementation of > > memcache with a Postgresql interface wrapper? > have you seen > http://people.freebsd.org/~seanc/pgmemcache/ Interesting. I note that they don't address the view consistency problem any better than an application using memcached directly. And that's the real problem with memcached, and why people are tempted to 'indulge' by relying on PostgreSQL. Some people value the consistency. Others don't. memcached, whether application-side, or whether automatically invoked by triggers (pgmemcache) is a decision to ignore the consistency. Using memcache, I've had problems with consistency brought right to the front. Both of these have failed me: 1) When updating a PostgreSQL record, I invalidate the memcache record. If another process comes along in parallelbefore I commit, notices that the memcache record is invalidated, it queries the data from SQL, and updatesthe memcache record back to the old value. :-( 2) When updating a PostgreSQL record, I updated the memcache record to the new value. If another process comes alongin parallel before I commit, that is still looking at an older view, cross-referencing may not work as expected. I'm currently settled on 2), but setting a short timeout (5 seconds) on the data. Still an imperfect compromise between speed and accuracy, but it isn't causing me problems... yet. I don't see memcache as a general solution to query plan or query result caching. Along these lines, I would look more towards having the query plans or query results stored in cache along with the transaction numbers that would let us know whether either is valid. Consistency is very valuable to me. If it wasn't for memcache being hundreds or more times faster, I wouldn't use it in the cases I do. It can be dangerous. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc wrote: > Using memcache, I've had problems with consistency brought right to > the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > If another process comes along in parallel before I commit, notices > that the memcache record is invalidated, it queries the data from > SQL, and updates the memcache record back to the old value. :-( > > 2) When updating a PostgreSQL record, I updated the memcache record > to the new value. If another process comes along in parallel before > I commit, that is still looking at an older view, cross-referencing > may not work as expected. Hmm, have you tried having a NOTIFY when the Postgres record is updated, and having a listener daemon that updates the value when it gets the notification? One issue is that if the listener is not running you don't get to update nor invalidate the record, which would cause consistency problems. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > mark@mark.mielke.cc wrote: > >> Using memcache, I've had problems with consistency brought right to >> the front. Both of these have failed me: >> >> 1) When updating a PostgreSQL record, I invalidate the memcache >> record. >> If another process comes along in parallel before I commit, >> notices >> that the memcache record is invalidated, it queries the data from >> SQL, and updates the memcache record back to the old value. :-( >> >> 2) When updating a PostgreSQL record, I updated the memcache record >> to the new value. If another process comes along in parallel >> before >> I commit, that is still looking at an older view, >> cross-referencing >> may not work as expected. > > Hmm, have you tried having a NOTIFY when the Postgres record is updated, > and having a listener daemon that updates the value when it gets the > notification? > > One issue is that if the listener is not running you don't get to > update nor invalidate the record, which would cause consistency problems. The original poster seemed to be talking about seldom/never changed data. But the interesting and often-discussed case is really rapidly changing data such as shopping carts, web session objects etc. On a somewhat unrelated note, I recently discussed a high performance postgres installation with the CIO, who told me they were using a large array of high end solid state disks (there is some serious money here) for the backing store. That's probably something worth thinking about for high traffic sites, which might help to relieve the problems for which caching or in-memory tables are thought to be possible answers. cheers andrew
Merlin Moncure wrote: > On 10/15/06, Anon Mous <soundami@yahoo.com> wrote: >> Would it be possible to combine a special memcache implementation of >> memcache with a Postgresql interface wrapper? > > have you seen > http://people.freebsd.org/~seanc/pgmemcache/ > > merlin > Now you got me thinkin ;-P Just throwing some ideas around - What if we could do something like CREATE TABLESPACE myramcache LOCATION MEMORY(2GB); CREATE TABLE mycache ( id as integer, data as varchar(50)) USING TABLESPACE myramcache; INSERT INTO mycache SELECT id,data FROM myrealtable; You could setup a table in memory to contain known popular data, you could also use this to create a temporary table in memory to speed up multiple intermediate calculations without touching disks. Or maybe just a view for caching - CREATE MEMORY VIEW mycacheview USING MAX(2GB) FOR LIFE(10) AS SELECT * FROM myrealtable; which would cache SELECTed rows in ram for LIFE seconds before purging and inserts/updates to myrealtable would trigger or can contain a trigger that would purge all or some of the view cache. Or better yet maybe the memory tablespace idea could also be extended to allow CREATE VIEW ... USING TABLESPACE myramcache LIFE(10); TABLESPACE LOCATION MEMORY would seem to give an opportunity for later expansion. The memory tablespace idea could be expanded to work with something like memcached (pg_ramcache_slave ?) - allowing multiple machines to work as a ram cache for the server. Something like - CREATE MEMCACHE group1 SLAVE 192.168.0.5; CREATE MEMCACHE group1 SLAVE 192.168.0.6 PORT 5436; CREATE MEMCACHE group2 SLAVE 192.168.0.7; CREATE TABLESPACE myramcache LOCATION MEMORY WITH group1 SLAVES; CREATE TABLESPACE myramcache2 LOCATION MEMORY WITH group2 SLAVES; Probably want to put in some limits such as only temporary tables and 'caching' views are allowed in memory tablespace. Apart from temp tables these could all be saved into system tables so they are re-created upon server startup.
mark@mark.mielke.cc wrote: > Using memcache, I've had problems with consistency brought right to > the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > If another process comes along in parallel before I commit, notices > that the memcache record is invalidated, it queries the data from > SQL, and updates the memcache record back to the old value. :-( How can this fail? The PostgreSQL MVCC will hold the second transaction back until the effect on the tuple is known (i.e. after the first transaction is over). Have you not been using SERIALIZABLE transactions? With a bit of careful planning (and a few SELECT FOR UPDATE queries to prevent deadlock), having perfect consistency and correct caching is possible. > 2) When updating a PostgreSQL record, I updated the memcache record > to the new value. If another process comes along in parallel before > I commit, that is still looking at an older view, cross-referencing > may not work as expected. This breaks integrity, and all bets are off. > I'm currently settled on 2), but setting a short timeout (5 seconds) on > the data. Still an imperfect compromise between speed and accuracy, but > it isn't causing me problems... yet. What exactly does your application do about the possibility of incorrect data? > Consistency is very valuable to me. If it wasn't for memcache being > hundreds or more times faster, I wouldn't use it in the cases I do. > It can be dangerous. Consistency and caching are not mutually exclusive, and there are many frameworks that handle the burden of maintaining both for you. Regards, LL
On Sun, Oct 15, 2006 at 06:43:43PM +0100, Lexington Luthor wrote: > mark@mark.mielke.cc wrote: > >Using memcache, I've had problems with consistency brought right to > >the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > > If another process comes along in parallel before I commit, notices > > that the memcache record is invalidated, it queries the data from > > SQL, and updates the memcache record back to the old value. :-( > How can this fail? The PostgreSQL MVCC will hold the second transaction > back until the effect on the tuple is known (i.e. after the first > transaction is over). Have you not been using SERIALIZABLE transactions? I don't *want* to use SERIALIZABLE transactions. That would grind my application to a halt. Consistency isn't the same as serializable. Memcache offers *NEITHER*. > With a bit of careful planning (and a few SELECT FOR UPDATE queries to > prevent deadlock), having perfect consistency and correct caching is > possible. Your conclusion is false for all cases except data that will never change. You can never have perfect consistency across different systems (memcache / postgresql) and especially not when their visibility rules differ. What is visible to something via memcache is always latest uncommitted. What is visible in PostgreSQL is something less than that. Consistency is not possible. Correct caching is therefore also not possible unless you define correct as 'latest', and even then, you have problems if memcache expires the record, before the real record has been commited into PostgreSQL. Under a significant enough load, it becomes visible. > > 2) When updating a PostgreSQL record, I updated the memcache record > > to the new value. If another process comes along in parallel before > > I commit, that is still looking at an older view, cross-referencing > > may not work as expected. > This breaks integrity, and all bets are off. Both break integrity. > >I'm currently settled on 2), but setting a short timeout (5 seconds) on > >the data. Still an imperfect compromise between speed and accuracy, but > >it isn't causing me problems... yet. > What exactly does your application do about the possibility of incorrect > data? Right now? I've limited it to display problems. Wrong counts. I think I tracked down all the significant problems. For me, "latest" is often equally good to "consistent" where memcache is giving "latest" and PostgreSQL is giving "consistent". > >Consistency is very valuable to me. If it wasn't for memcache being > >hundreds or more times faster, I wouldn't use it in the cases I do. > >It can be dangerous. > Consistency and caching are not mutually exclusive, and there are many > frameworks that handle the burden of maintaining both for you. Consistency and memcached, *are* mutually exclusive. memcached provides no mechanisms for consistency. Close may good enough for many. Close is the best that it can do. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, Oct 16, 2006 at 03:08:39AM +0930, Shane Ambler wrote: > Just throwing some ideas around - > What if we could do something like > CREATE TABLESPACE myramcache LOCATION MEMORY(2GB); > CREATE TABLE mycache ( > id as integer, data as varchar(50)) > USING TABLESPACE myramcache; > INSERT INTO mycache SELECT id,data FROM myrealtable; > ... > You could setup a table in memory to contain known popular data, you > could also use this to create a temporary table in memory to speed up > multiple intermediate calculations without touching disks. I'm not sure what this would save. If the table is read-only, there shouldn't be writes happening. If it's small, and frequently accessed, it should fit in the buffer cache. None of this avoids the cost of query planning, or query execution. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sun, Oct 15, 2006 at 06:43:43PM +0100, Lexington Luthor wrote: > With a bit of careful planning (and a few SELECT FOR UPDATE queries to > prevent deadlock), having perfect consistency and correct caching is > possible. I didn't respond directly to this claim of yours. SELECT FOR UPDATE is only useful if I'm going to do SELECT. If I am using memcache, one would presume that I am using it in place of SELECT, to improve performance. If I'm going to SELECT and then memcache, I haven't gained anything. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc wrote: > On Mon, Oct 16, 2006 at 03:08:39AM +0930, Shane Ambler wrote: >> You could setup a table in memory to contain known popular data, you >> could also use this to create a temporary table in memory to speed up >> multiple intermediate calculations without touching disks. > > I'm not sure what this would save. If the table is read-only, there > shouldn't be writes happening. If it's small, and frequently accessed, > it should fit in the buffer cache. Because it is frequently accessed doesn't mean that it is small - the main point is control over what is cached and a starting point for other options mentioned later. > None of this avoids the cost of query planning, or query execution. No but you can avoid costly disk access and still have the postgres level of integrity and integration that memcached doesn't offer.
Shane Ambler <pgsql@007Marketing.com> writes: > mark@mark.mielke.cc wrote: >> None of this avoids the cost of query planning, or query execution. > No but you can avoid costly disk access and still have the postgres > level of integrity and integration that memcached doesn't offer. If you're just trying to cache data, it's not clear what you are doing that the shared buffer cache and/or kernel-level disk cache doesn't do already. regards, tom lane
On Mon, Oct 16, 2006 at 05:14:59AM +0930, Shane Ambler wrote: > mark@mark.mielke.cc wrote: > >On Mon, Oct 16, 2006 at 03:08:39AM +0930, Shane Ambler wrote: > >>You could setup a table in memory to contain known popular data, you > >>could also use this to create a temporary table in memory to speed up > >>multiple intermediate calculations without touching disks. > >I'm not sure what this would save. If the table is read-only, there > >shouldn't be writes happening. If it's small, and frequently accessed, > >it should fit in the buffer cache. > Because it is frequently accessed doesn't mean that it is small - the > main point is control over what is cached and a starting point for other > options mentioned later. This presumes that your instructions will do a better job than it is already doing. You are telling it "use this much cache memory". Only, if you specify more than the amount of RAM you have, this will be swapped to disk, and you won't have avoided a disk access. If you specify less than the amount of RAM you have, you are preventing PostgreSQL or the kernel from deciding that another page is more valuable than your "large static table". > >None of this avoids the cost of query planning, or query execution. > No but you can avoid costly disk access and still have the postgres > level of integrity and integration that memcached doesn't offer. If PostgreSQL or the kernel is caching it, this is zero. As a thought experiment, I'm not seeing the benefit. I think if you could prove a benefit, then any proof you provided could be used to improve the already existing caching layers, and would apply equally to read-only or read-write pages. For example, why not be able to hint to PostgreSQL that a disk-based table should be considered a priority to keep in RAM. That way, PostgreSQL would avoid pushing pages from this table out. I'm not convinced that this would be a gain, though. I highly expect that an LRU rule is better than a statically defined "keep this in RAM" rule. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc wrote: > As a thought experiment, I'm not seeing the benefit. I think if you > could prove a benefit, then any proof you provided could be used to > improve the already existing caching layers, and would apply equally > to read-only or read-write pages. For example, why not be able to > hint to PostgreSQL that a disk-based table should be considered a > priority to keep in RAM. That way, PostgreSQL would avoid pushing > pages from this table out. > If memcached (or pgmemcached implemented in triggers) can show a speed improvement using ram based caching (even with network overhead) of specific data then it stands to reason that this ram based cache can be integrated into postgres with better integration that will overcome the issues that pgmemcached has. So I threw some ideas out there to get others thinking on these lines to see if we can come up with a way to improve or integrate this principle. My original thoughts were integrating it into the sql level to allow the database structure to define what we would want to cache in ram, which is similar to what is happening with using pgmemcached. Expanding create table to specify that a table gets priority in cache or allocate x amount of cache to be used by table y could be a better way than saying all of this table in ram. I think the main benefit of my first ideas would come from the later examples I gave where create memory tablespace with slaves would allow the use of extra machines, effectively increasing the ram available outside the current postgres setup. Maybe implementing this idea as a way of increasing the current postgres caching would be a better implementation than thememory tablespaces idea. As in integrating a version of pgmemcached as an option into the current caching layers. Thus implementing it at the config level instead of the structure design. Although defining tables to get priority or allocated space in the ram cache would fit well with that.
Mark, > Using memcache, I've had problems with consistency brought right to > the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > If another process comes along in parallel before I commit, notices > that the memcache record is invalidated, it queries the data from > SQL, and updates the memcache record back to the old value. :-( > > 2) When updating a PostgreSQL record, I updated the memcache record > to the new value. If another process comes along in parallel before > I commit, that is still looking at an older view, cross-referencing > may not work as expected. The answer is that cached values are not expected to be consistent. If they were, then they'd have to have all of the transaction overhead which PostgreSQL has, and lose any gain in efficiency. Generally, memcached is best used for values that don't get kept in the database at all. Example (how I use it) Using pgmemcache for user session information: 1) User logs in. Their session information is stored in the permanent user_session table and the pgmemcache pseudotable, user_session_cache. 2) User browses the website. This requires updating their session every time a page is loaded with their last activity timestamp, their page path (for backtracking) and their ip information (for hijack prevention). This informaiton is recorded in user_session_cache *only*, with the presumption that it will be lost if the server goes down. 3) User logs out (or is auto-logged-out). Keys are deleted from user_session_cache and their exit information is written to the permanent table user_session. The advantage of this is that it allows lowering the amount of write activity to the user_session table by 95% with no loss of information we care about. Of course, if you are using a Java or .NET application server, it probably provides the above functionality itself. -- Josh Berkus PostgreSQL @ Sun San Francisco
On Sun, Oct 15, 2006 at 02:39:36PM -0700, Josh Berkus wrote: > > Using memcache, I've had problems with consistency brought right to > > the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > > If another process comes along in parallel before I commit, notices > > that the memcache record is invalidated, it queries the data from > > SQL, and updates the memcache record back to the old value. :-( > > > > 2) When updating a PostgreSQL record, I updated the memcache record > > to the new value. If another process comes along in parallel before > > I commit, that is still looking at an older view, cross-referencing > > may not work as expected. > The answer is that cached values are not expected to be consistent. If they > were, then they'd have to have all of the transaction overhead which > PostgreSQL has, and lose any gain in efficiency. Agree. > Generally, memcached is best used for values that don't get kept in the > database at all. Example (how I use it) > > Using pgmemcache for user session information: > > 1) User logs in. Their session information is stored in the permanent > user_session table and the pgmemcache pseudotable, user_session_cache. > 2) User browses the website. This requires updating their session every time > a page is loaded with their last activity timestamp, their page path (for > backtracking) and their ip information (for hijack prevention). This > informaiton is recorded in user_session_cache *only*, with the presumption > that it will be lost if the server goes down. > 3) User logs out (or is auto-logged-out). Keys are deleted from > user_session_cache and their exit information is written to the permanent > table user_session. Is there a benefit here to doing this with pgmemcache over the application calling in to memcache directly? Are you able to effectively and efficiently include memcache derived information within select queries that include information you want to pull out of the database? I like the logout commits information part, and it's something I've been meaning to do, but haven't gotten around to. > The advantage of this is that it allows lowering the amount of write activity > to the user_session table by 95% with no loss of information we care about. > Of course, if you are using a Java or .NET application server, it probably > provides the above functionality itself. Agree. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote: > mark@mark.mielke.cc wrote: > >As a thought experiment, I'm not seeing the benefit. I think if you > >could prove a benefit, then any proof you provided could be used to > >improve the already existing caching layers, and would apply equally > >to read-only or read-write pages. For example, why not be able to > >hint to PostgreSQL that a disk-based table should be considered a > >priority to keep in RAM. That way, PostgreSQL would avoid pushing > >pages from this table out. > If memcached (or pgmemcached implemented in triggers) can show a speed > improvement using ram based caching (even with network overhead) of > specific data then it stands to reason that this ram based cache can be > integrated into postgres with better integration that will overcome the > issues that pgmemcached has. I think you might misunderstand how memcache gets its speed. It gets away from reading or writing to disk because it makes no promise that the data is safe, even in RAM. memcache may choose to overwrite the data at any time. It makes no consistency guarantees. There are no read/write boundaries. You and I can both read, and both write, and the result is indeterminate. It limits itself to only two fields per row. A key and a value. Both fields must be string types. Based on all of the assumptions above, all read and write operations are fast enough, that they can be serialized. This allows for the process to be single-process, single-thread, with no requirement for co-ordination between these processes or threads. There is no need for locking any data structures, or waiting or any resources. Requests can be dispatched immediately. What of the above fits into PostgreSQL? Can PostgreSQL choose to remove records on a whim? Can PostgreSQL get away with removing transaction boundaries, and making specific tables always read and write to latest? Can PostgreSQL tables be limited to two fields? Can PostgreSQL get away with zero synchronization between processes or threads? The architectures are almost opposite of each other. I don't see how you could combine the architectures. Effectively, you would need to have two engines inside PostgreSQL, with the engine type selected from the table type. MySQL sort of does this. In MySQL, some tables support transactions while others do not. Is that what you are asking for? > My original thoughts were integrating it into the sql level to allow the > database structure to define what we would want to cache in ram, which > is similar to what is happening with using pgmemcached. In my experience, the most costly part of SQL, for very simple queries, is the query planning. As soon as you have more than one key and one value, you require query planning of some sort. I believe this is the SQL overhead. Parsing the SQL, and determining how to best execute it. Lighter weight databases, such as BerkeleyDB already exist to do what you are asking for. memcache makes few guarantees. BerkeleyDB and similar gives you transactions. PostgreSQL and similar give you SQL. Each level of abstraction costs. > Expanding create table to specify that a table gets priority in cache or > allocate x amount of cache to be used by table y could be a better way > than saying all of this table in ram. Or, it could be a worse way. Where is the evidence that it would be better? > I think the main benefit of my first ideas would come from the later > examples I gave where create memory tablespace with slaves would allow > the use of extra machines, effectively increasing the ram available > outside the current postgres setup. MySQL has some sort of distributed scheme like this, based on a partitioning of the keys. I'm don't know how great it is. Other than the sales pitch we were given when MySQL gave us a presentation, I haven't heard of it in use by others. > Maybe implementing this idea as a way of increasing the current postgres > caching would be a better implementation than the memory tablespaces > idea. As in integrating a version of pgmemcached as an option into the > current caching layers. Thus implementing it at the config level instead > of the structure design. Although defining tables to get priority or > allocated space in the ram cache would fit well with that. If there is a problem with the current cache algorithm, it should be fixed. What is the problem with it? I think the memcache people are thinking that the cost of PostgreSQL is about the disk. Although the disk plays a part, I'm pretty sure it's only a fraction. Not providing transaction guarantees, not providing an SQL level abstraction, and not having multiple processes or threads plays a much bigger part. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sun, 15 Oct 2006, mark@mark.mielke.cc wrote: > On Sun, Oct 15, 2006 at 08:31:36PM +0530, Merlin Moncure wrote: > > On 10/15/06, Anon Mous <soundami@yahoo.com> wrote: > > > Would it be possible to combine a special memcache implementation of > > > memcache with a Postgresql interface wrapper? > > have you seen > > http://people.freebsd.org/~seanc/pgmemcache/ > > Interesting. I note that they don't address the view consistency > problem any better than an application using memcached directly. > And that's the real problem with memcached, and why people are > tempted to 'indulge' by relying on PostgreSQL. Some people value > the consistency. Others don't. memcached, whether application-side, > or whether automatically invoked by triggers (pgmemcache) is a > decision to ignore the consistency. > > Using memcache, I've had problems with consistency brought right to > the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > If another process comes along in parallel before I commit, notices > that the memcache record is invalidated, it queries the data from > SQL, and updates the memcache record back to the old value. :-( > > 2) When updating a PostgreSQL record, I updated the memcache record > to the new value. If another process comes along in parallel before > I commit, that is still looking at an older view, cross-referencing > may not work as expected. Shouldn't you be able to use 2-stage commit for this? Prepare to commit, update the memcache record, then commit? Or am I thinking of something else? > I'm currently settled on 2), but setting a short timeout (5 seconds) on > the data. Still an imperfect compromise between speed and accuracy, but > it isn't causing me problems... yet. -- Fortune's Real-Life Courtroom Quote #32: Q: Do you know how far pregnant you are right now? A: I will be three months November 8th. Q: Apparently then, the date of conception was August 8th? A: Yes. Q: What were you and your husband doing at that time?
On 15 Oct 2006, at 19:55, mark@mark.mielke.cc wrote: > On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote: >> mark@mark.mielke.cc wrote: >>> As a thought experiment, I'm not seeing the benefit. I think if you >>> could prove a benefit, then any proof you provided could be used to >>> improve the already existing caching layers, and would apply equally >>> to read-only or read-write pages. For example, why not be able to >>> hint to PostgreSQL that a disk-based table should be considered a >>> priority to keep in RAM. That way, PostgreSQL would avoid pushing >>> pages from this table out. >> If memcached (or pgmemcached implemented in triggers) can show a >> speed >> improvement using ram based caching (even with network overhead) of >> specific data then it stands to reason that this ram based cache >> can be >> integrated into postgres with better integration that will >> overcome the >> issues that pgmemcached has. > > ... > I think the memcache people are thinking that the cost of > PostgreSQL is > about the disk. Although the disk plays a part, I'm pretty sure it's > only a fraction. Not providing transaction guarantees, not > providing an > SQL level abstraction, and not having multiple processes or threads > plays a much bigger part. > Forgive my intrusion and perhaps simplistic viewpoint, however, improved caching would be of great benefit for me as a web developer. I wholeheartedly agree that the disk IO is often a small part of the expense of obtaining data from the database, especially for the nominal web based application. Query parsing, joining, sorting, etc. are all likely to be real culprits. The existing caching mechanism (as I understand them) and especially the kernel disk caches do nothing to eliminate these overhead costs. I would venture that the 80/20 rule applies here as in many, many other instances. A full 80+% of the queries performed against the database are performed over and over and over again with the same criteria for a period of time and then the criteria changes for the next period of time. This would be particularly true for seldom changed tables that, for example, contain a list of the day's advertisements. The data is changed slowly, once a day or once a week, but, a query is made for every page hit. Usually the exact same query. I know, for you purists out there, that this is an obvious call for an application level cache. Perhaps so, however, it complicates the end-programmer environment _and_ it has consistency disadvantages. Many of the programming languages being used provide direct interfaces to PostgreSQL (not surprising given that the programmers are using PostgreSQL) and some may even provide a caching mechanism. Best case, integrating the two remains a task for the end- programmer, worse case, the end-programmer has to implement the cache as well. Rolling a cache into the database removes that complexity by incorporating it into the existing PostgreSQL API. (BTW, I'm aware that the consistency disadvantages of the application level cache could probably be overcome by implementing notify in the cache but, again, at added end-programmer expense.) Getting back to the original posting, as I remember it, the question was about seldom changed information. In that case, and assuming a repetitive query as above, a simple query results cache that is keyed on the passed SQL statement string and that simply returns the previously cooked result set would be a really big performance win. Registering each cache entry by the tables included in the query and invalidating the cache during on a committed update or insert transaction to any of the tables would, transparently, solve the consistency problem. Does this improve the "more interesting" case of heavily updated tables? Not likely, however, for many web applications, it will likely improve 80% of the queries leaving more cycles (and bandwidth) available for the non-cacheable queries. There would be other issues as well, for example, non-invalidated cache entries will accumulate rapidly if the criteria changes often, large result sets will cause cache contention, cursors will (likely) be unable to use the cache, syntax/commands for manipulating cacheability, etc. THIS DOES NOT ELIMINATE THE BASIC VALUE of a results cache for the conditions specified above. Conditions that I would venture to say make up a large part of the queries that are (or could be) made by a web application. Thanks, F -- F Harvell 407 467-1919
Harvell F wrote: > Getting back to the original posting, as I remember it, the question > was about seldom changed information. In that case, and assuming a > repetitive query as above, a simple query results cache that is keyed on > the passed SQL statement string and that simply returns the previously > cooked result set would be a really big performance win. I believe the main point that Mark made was the extra overhead is in the sql parsing and query planning - this is the part that postgres won't get around. Even if you setup simple tables for caching it still goes through the parser and planner and looses the benefits that memcached has. Or you fork those requests before the planner and loose the benefits of postgres. The main benefit of using memcached is to bypass the parsing and query planning. You will find there is more to sql parsing than you first think, it needs to find the components that make up the sql statement (tables column names functions) and check that they exist and can be used in the context of the given sql and the given data matches the context that is given to be used in, it needs to check that the current user has enough privileges to perform the requested task, then it locates the data whether it be in the memory cache, on disk or an integrated version of memcached, this would also include checks to make sure another user hasn't locked the data to change it and whether there exists more than one version of the data, committed and uncommitted and then sends the results back to the client requesting it. > Registering each cache entry by the tables included in the query and > invalidating the cache during on a committed update or insert > transaction to any of the tables would, transparently, solve the > consistency problem. That was part of my thinking when I made the suggestion of adding something like memcached into postgres.
On Sun, Oct 15, 2006 at 06:33:36PM -0700, Jeremy Drake wrote: > > 2) When updating a PostgreSQL record, I updated the memcache record > > to the new value. If another process comes along in parallel before > > I commit, that is still looking at an older view, cross-referencing > > may not work as expected. > Shouldn't you be able to use 2-stage commit for this? Prepare to commit, > update the memcache record, then commit? Or am I thinking of something > else? Two stage commits makes the window of error smaller, it can't eliminate it. I believe the window would be reduced to: 1) The transactions that are currently looking at older data, and: 2) Any memcache query that happens between the time of the PostgreSQL commit and the memcache commit. It's a pretty small window. The question for my use case, would be very hundreds of people clicking on web links per second, might happen to hit the window. By setting the memcache store to 5 seconds instead of the regular 60+, I seem to have eliminated all reported cases of the problems. If the stored data is invalid, it only stays invalid for a short time. I'm compromising accuracy for efficiency. The thing about all of this is, if what memcache is doing could be done with consistency? PostgreSQL would probably already be doing it right now... Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, Oct 16, 2006 at 05:59:05PM +0930, Shane Ambler wrote: > > Registering each cache entry by the tables included in the query and > >invalidating the cache during on a committed update or insert > >transaction to any of the tables would, transparently, solve the > >consistency problem. > That was part of my thinking when I made the suggestion of adding > something like memcached into postgres. There is a valid suggestion in here, but I think it's the caching of query plans, and caching of query plan results that the PostgreSQL gain would be at. The query to query plan cache could map SQL statements (with parameters specified) to a query plan, and be invalidated upon changes to the statistical composition of any of the involved tables. The query plan to query results cache would keep the results and first and last transaction ids that the results are valid for. Although it sounds simple, I believe the above to be very complicated to pursue. The real PostgreSQL hackers (not me) have talked at length about it over the last while that I've read their mailing lists. They've come up with good ideas, that have not all been shot down. Nobody is willing to tackle it, because it seems like a lot of effort, for a problem that can be mostly solved by application-side caching. It's a subject that interests me - but it would take a lot of time, and that's the thing that few of us have. Time sucks. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Hi, Shane, Shane Ambler wrote: > CREATE TABLESPACE myramcache LOCATION MEMORY(2GB); It's already possible to do this, just create the TABLESPACE in a ramdisk / tmpfs or whatever is available for your OS. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
On 10/15/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote: > Using memcache, I've had problems with consistency brought right to > the front. Both of these have failed me: > > 1) When updating a PostgreSQL record, I invalidate the memcache record. > If another process comes along in parallel before I commit, notices > that the memcache record is invalidated, it queries the data from > SQL, and updates the memcache record back to the old value. :-( > > 2) When updating a PostgreSQL record, I updated the memcache record > to the new value. If another process comes along in parallel before > I commit, that is still looking at an older view, cross-referencing > may not work as expected. > > I'm currently settled on 2), but setting a short timeout (5 seconds) on > the data. Still an imperfect compromise between speed and accuracy, but > it isn't causing me problems... yet. use advisory locks for 'race sensitive' data. (or user locks in < 8.2). or, just use tables, becuase you need mvcc, not performance :) merlin
On 16 Oct 2006, at 4:29, Shane Ambler wrote: > Harvell F wrote: > >> Getting back to the original posting, as I remember it, the >> question was about seldom changed information. In that case, and >> assuming a repetitive query as above, a simple query results cache >> that is keyed on the passed SQL statement string and that simply >> returns the previously cooked result set would be a really big >> performance win. > > I believe the main point that Mark made was the extra overhead is > in the sql parsing and query planning - this is the part that > postgres won't get around. Even if you setup simple tables for > caching it still goes through the parser and planner and looses the > benefits that memcached has. Or you fork those requests before the > planner and loose the benefits of postgres. > The main benefit of using memcached is to bypass the parsing and > query planning. That was the basis of my suggestion to just use the passed query string as the key. No parsing or processing of the query, just a simple string match. > You will find there is more to sql parsing than you first think, it > needs to find the components that make up the sql statement (tables > column names functions) and check that they exist and can be used > in the context of the given sql and the given data matches the > context that is given to be used in, it needs to check that the > current user has enough privileges to perform the requested task, > then it locates the data whether it be in the memory cache, on disk > or an integrated version of memcached, this would also include > checks to make sure another user hasn't locked the data to change > it and whether there exists more than one version of the data, > committed and uncommitted and then sends the results back to the > client requesting it. The user permissions checking is a potential issue but again, for the special case of repeated queries by the same user (the webserver process) for the same data, a simple match of the original query string _and_ the original query user, would still be very simple. The big savings by having the simple results cache would be the elimination of the parsing, planning, locating, combining, and sorting of the results set. I don't believe normal locking plays a part in the cache (there are basic cache integrity locking issues though) nor does the versioning or commit states, beyond the invalidation of the cache upon a commit to a referenced table. It may be that the invalidation needs to happen whenever a table is locked as well. (The hooks for the invalidation would be done during the original caching of the results set.) I know that the suggestion is a very simple minded suggestion and is limited to a very small subset of the potential query types and interactions, however, at least for web applications, it would be a very big win. Many website want to display today's data on their webpage and have it change as dates change (or as users change). The data in the source table doesn't change very often (especially compared to a popular website) and the number of times that the exact same query could be issued between changes can measure into the hundreds of thousands or more. Putting even this simple results cache into the database would really simplify the programmer's life and improve reliability (and the use of PostgreSQL).
On Mon, 2006-10-16 at 13:59 +0200, Markus Schaber wrote: > It's already possible to do this, just create the TABLESPACE in a > ramdisk / tmpfs or whatever is available for your OS. This is not an ideal solution: if the machine reboots, the content of the tablespace will disappear, requiring manual administrator intervention to get Postgres running again. -Neil
On Mon, Oct 16, 2006 at 12:40:44PM -0400, Neil Conway wrote: > On Mon, 2006-10-16 at 13:59 +0200, Markus Schaber wrote: > > It's already possible to do this, just create the TABLESPACE in a > > ramdisk / tmpfs or whatever is available for your OS. > This is not an ideal solution: if the machine reboots, the content of > the tablespace will disappear, requiring manual administrator > intervention to get Postgres running again. It's enough to show whether disk read/write is the crux of this issue or not. I suspect not. Anybody have numbers? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, 16 Oct 2006, mark@mark.mielke.cc wrote: > On Sun, Oct 15, 2006 at 06:33:36PM -0700, Jeremy Drake wrote: > > > 2) When updating a PostgreSQL record, I updated the memcache record > > > to the new value. If another process comes along in parallel before > > > I commit, that is still looking at an older view, cross-referencing > > > may not work as expected. > > Shouldn't you be able to use 2-stage commit for this? Prepare to commit, > > update the memcache record, then commit? Or am I thinking of something > > else? > > Two stage commits makes the window of error smaller, it can't eliminate it. Right, I was thinking there was still some raciness there. I think what I remembered is that if you updated the cache and then the transaction failed (or rolled back for whatever reason) later on, the cache would have data that was never committed. The two-phase commit thing is intended to deal with that eventuality. Which is also a possibility for a consistency issue. -- Oh, I have slipped the surly bonds of earth, And danced the skies on laughter silvered wings; Sunward I've climbed and joined the tumbling mirth Of sun-split clouds and done a hundred things You have not dreamed of -- Wheeled and soared and swung High in the sunlit silence. Hovering there I've chased the shouting wind along and flung My eager craft through footless halls of air. Up, up along delirious, burning blue I've topped the wind-swept heights with easy grace, Where never lark, or even eagle flew; And, while with silent, lifting mind I've trod The high untrespassed sanctity of space, Put out my hand, and touched the face of God. -- John Gillespie Magee Jr., "High Flight"
Hi<br /><br /> I've loved reading all of your thoughts and comments.<br /><br /> Yet, I'm left with the question:<br /><br/> Can we can brainstorm a caching solution that is workable...<br /><br /> I've seen some posts talking about someof the challenges.<br /><br /> 1.) Only good for static data<br /><br /> As it was proposed that is largely true. Thisdoesn't mean, however that the idea as stated isn't worth persuing because I think you'll find most applications havea great deal of "static enough" data to benefit greatly from this type of caching.<br /><br /> However, I think somesimple changes to the idea may make it useful for busy tables...<br /><br /> These changes, would probably require directcommunication between the caching controller and the the postmaster.<br /><br /> a.) Rather than table locking, trackchanges at the row level.<br /> b.) Rather than requiring a complete reseeding of a table after an update, just invalidate,or repopulate the affected rows. <br /> c.) Rather than destroying popular query results, try to update them ifpossible. For example, it's easy to remove one entry from the cache if that row was just deleted. (It's probably cheaperto rerun the query on just the few changed rows than to rerun the whole query.)<br /> d.) Any other ideas?<br /><br/> 2.) If any caching were possible, we'd already be doing it.<br /><br /> I don't think this statement will stand thetest of time! Agreed, caching quickly becomes a head hurting complex topic, but the benefits are well worth the effort!<br/><br /> 3.) Consistency and memcached, *are* mutually exclusive. Memcached provides no mechanisms for consistency.<br /><br /> "You can never have perfect consistency across different systems (memcache / <br /> postgresql)and especially not when their visibility rules differ. What is <br /> visible to something via memcache is alwayslatest uncommitted. What is <br /> visible in PostgreSQL is something less than that. Consistency is not <br /> possible.Correct caching is therefore also not possible unless you define <br /> correct as 'latest', and even then, youhave problems if memcache expires <br /> the record, before the real record has been commited into PostgreSQL."<br /><br/> I completely agree. I'm not talking about synchronizing memcached data to be consistent, I'm saying lets create acaching layer that works something like memcache and preserves consistency!<br /><br /> and, very related to this, <br />4.) Memcached Caching is exactly opposite to Postgres consistency.<br /><br /> Specifically:<br /> Memcache is serialized<br/> Memcache can loose data at any time<br /> Memcache has only 2 fields<br /> Memcache has no synchronization<br/><br /> Postgres needs consistency.<br /><br /> Memcache doesn't do any synchronization, and that meansconsistency is impossible. However, a special version of memcache that is embedded into the postgresql system or apithat does talk with the postmaster could be able to provide guaranteed consistency?<br /><br /> 5.) This idea won't saveany time with SQL parsing.<br /><br /> I believe it can... Because, as memcache has illustrated, you can avoid any sqlparsing by using the sql and user's table permissions (if different users are enabled) as the key to the cached data.<br /><br /> 6.) Postgresql is consistency. If an application needs speed let the application figure out how to cachethe data<br /><br /> I appreciate that Postgres is all about data consistency. Actually, that's why I'm here and noton the Mysql board... However, I believe that we can provide caching without losing consistency, and developers willlove the extra speed.<br /><br /> If we do implement the caching once, everyone will be able to use it without complicatingtheir end application!!! (Read: It will help the world and make PostgreSQL very popular!)<br /><br /> ---<br/><br /> So, would it work to create a caching pre-processor for Postgresql that would work serially on every request,and pass all uncached queries to the database?<br /><br /> - If it's a cache hit, and the data is currently availableand active, pass the data back. <br /><br /> - If it's a miss, pass the query along to the database, and populatethe cache with the results. <br /><br /> - If the query changes data, invalidate the cached queries that touch anytable rows that could be affected. After the update, re-enable the unaffected table rows, and repopulate the cache withthe updates. <br /><br /> - Cached queries using an affected table would normally be deleted except in special simplecases that could be updated. <br /><br /> A related interesting thought... It might help if the cache stored the datain separate table rows rather than in whatever format the query requested the way memcached does. <br /><br /> - Eachcached table row could be stored as a separate entity. <br /> - Complicated joins, rather than caching all of the datain whatever organization the user specified, would instead store a matrix of pointers to the exact table fields in mostcases.<br /><br /> Will it work? Am I missing anything?<br /><br /> Thanks<br /><br /> Daniel<br /><p><hr size="1" />Yahoo!Messenger with Voice. <a href="http://us.rd.yahoo.com/mail_us/taglines/postman1/*http://us.rd.yahoo.com/evt=39663/*http://voice.yahoo.com">Make PC-to-PhoneCalls</a> to the US (and 30+ countries) for 2¢/min or less.
On Tue, 2006-10-17 at 16:51 -0700, Anon Mous wrote: > Hi > > I've loved reading all of your thoughts and comments. > > Yet, I'm left with the question: > > Can we can brainstorm a caching solution that is workable... > I think you're making this a little complicated. A lot of these problems can be solved with something like materialized views (triggers updating another smaller relation), or they are already solved by the lower layers of caching (like PostgreSQL's shared buffers, or the OS buffer cache). If the application is executing many queries that are exactly the same, it would be a good idea to look at something like pgpool-II's query cache. The only way to gain a benefit on querying results that are already in memory is to avoid the query processing. Regards,Jeff Davis