Thread: memcached and PostgreSQL
http://pugs.postgresql.org/sfpug/archives/000021.html I noticed that some of you left coasters were talking about memcached and pgsql. I'm curious to know what was discussed. In reading about memcached, it seems that many people are using it to circumvent the scalability problems of MySQL (lack of MVCC). from their site: <snip> Shouldn't the database do this? Regardless of what database you use (MS-SQL, Oracle, Postgres, MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID properties in a RDBMS, especially when disks are involved, which means queries are going to block. For databases that aren't ACID-compliant (like MySQL-MyISAM), that overhead doesn't exist, but reading threads block on the writing threads. memcached never blocks. </snip> So What does memcached offer pgsql users? It would still seem to offer the benefit of a multi-machined cache. -Mike
Michael, > So What does memcached offer pgsql users? It would still seem to offer > the benefit of a multi-machined cache. Yes, and a very, very fast one too ... like, 120,000 operations per second. PostgreSQL can't match that because of the overhead of authentication, security, transaction visibility checking, etc. So memcached becomes a very good place to stick data that's read often but not updated often, or alternately data that changes often but is disposable. An example of the former is a user+ACL list; and example of the latter is web session information ... or simple materialized views. -- Josh Berkus Aglio Database Solutions San Francisco
On Tue, 16 Nov 2004 21:47:54 -0800, Josh Berkus wrote: > So memcached becomes a very good place to stick data that's read often but not > updated often, or alternately data that changes often but is disposable. An > example of the former is a user+ACL list; and example of the latter is web > session information ... or simple materialized views. Has anyone tried at least two of 1. memcached 2. Tugela Cache (pretty much the same as memcached, I think) 3. Sharedance In that case: Do you have any comparative remarks? Links: 1: http://www.danga.com/memcached/ 2: http://meta.wikimedia.org/wiki/Tugela_Cache http://cvs.sourceforge.net/viewcvs.py/wikipedia/tugelacache/ 3: http://sharedance.pureftpd.org/ -- Greetings from Troels Arvin, Copenhagen, Denmark
Josh Berkus <josh@agliodbs.com> writes: > So memcached becomes a very good place to stick data that's read often but not > updated often, or alternately data that changes often but is disposable. An > example of the former is a user+ACL list; and example of the latter is web > session information ... or simple materialized views. I would like very much to use something like memcached for a materialized view I have. The problem is that I have to join it against other tables. I've thought about providing a SRF in postgres to read records out of memcached but I'm unclear it would it really help at all. Has anyone tried anything like this? -- greg
On 17 Nov 2004 03:08:20 -0500, Greg Stark <gsstark@mit.edu> wrote: > Josh Berkus <josh@agliodbs.com> writes: > > > So memcached becomes a very good place to stick data that's read often but not > > updated often, or alternately data that changes often but is disposable. An > > example of the former is a user+ACL list; and example of the latter is web > > session information ... or simple materialized views. > > I would like very much to use something like memcached for a materialized view > I have. The problem is that I have to join it against other tables. > > I've thought about providing a SRF in postgres to read records out of > memcached but I'm unclear it would it really help at all. > > Has anyone tried anything like this? I haven't tried it yet, but I plan too. An intersting case might be to use plperlu to interface with memcached and store hashes in the cache via some external process, like a CGI script. Then just define a TYPE for the perl SRF to return, and store the data as an array of hashes with keys matching the TYPE. A (perhaps useless) example could then be something like: CREATE TYPE user_info AS ( sessionid TEXT, userid INT, lastaccess TIMESTAMP, lastrequest TEXT); CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$ use Cache::Memcached; my $session = shift; my $c = $_SHARED{memcached} || Cache::Memcached->new( {servers => '127.0.0.1:1111'} ); my $user_info = $m->get('web_access_list'); # $user_info looks like # [ {userid => 5, lastrequest => 'http://...', lastaccess => localtime(), # sessionid => '123456789'}, { ...} ] # and is stored by a CGI. @info = grep {$$_{sessionid} eq $session} @$user_info; return \@info; $$ LANGUAGE 'plperlu'; SELECT u.username, f.lastrequest FROM users u, get_user_info_by_session('123456789') WHERE f.userid = u.userid; Any thoughts? > > -- > greg > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer
On November 16, 2004 08:00 pm, Michael Adler wrote: > http://pugs.postgresql.org/sfpug/archives/000021.html > > I noticed that some of you left coasters were talking about memcached > and pgsql. I'm curious to know what was discussed. > > In reading about memcached, it seems that many people are using it to > circumvent the scalability problems of MySQL (lack of MVCC). > > from their site: > > <snip> > Shouldn't the database do this? > > Regardless of what database you use (MS-SQL, Oracle, Postgres, > MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID > properties in a RDBMS, especially when disks are involved, which means > queries are going to block. For databases that aren't ACID-compliant > (like MySQL-MyISAM), that overhead doesn't exist, but reading threads > block on the writing threads. memcached never blocks. > </snip> > > So What does memcached offer pgsql users? It would still seem to offer > the benefit of a multi-machined cache. Have a look at the pdf presentation found on the following site: http://people.freebsd.org/~seanc/pgmemcache/ > > -Mike > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote: > On November 16, 2004 08:00 pm, Michael Adler wrote: > > http://pugs.postgresql.org/sfpug/archives/000021.html > > > > I noticed that some of you left coasters were talking about memcached > > and pgsql. I'm curious to know what was discussed. > > Have a look at the pdf presentation found on the following site: > > http://people.freebsd.org/~seanc/pgmemcache/ Thanks for that. That presentation was rather broad and the API seems rather general purpose, but I wonder why you would really want access the cache by way of the DB? If one major point of memcache is to allocate RAM to a low-overhead server instead of to the RDBMS's disk cache, why would you add the overhead of the RDBMS to the process? (this is a bit of straw man, but just trying to flesh-out the pros and cons) Still, it seems like a convenient way to maintain cache coherency, assuming that your application doesn't already have a clean way to do that. (just my uninformed opinion, though...) -Mike
Michael, > Still, it seems like a convenient way to maintain cache coherency, > assuming that your application doesn't already have a clean way to do > that. Precisely. The big problem with memory caching is the cache getting out of sync with the database. Updating the cache through database triggers helps ameliorate that. However, our inability to pass messages with NOTIFY somewhat limits the the utility of this solution Sean wants "on commit triggers", but there's some major issues to work out with that. Passing messages with NOTIFY would be easier and almost as good. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
> So What does memcached offer pgsql users? It would still seem to offer > the benefit of a multi-machined cache. Ack, I totally missed this thread. Sorry for jumping in late. Basically, memcached and pgmemcache offer a more technically correct way of implementing query caching. MySQL's query caching is a disaster, IMHO. memcached alleviates this load from the database and puts it elsewhere in a more optimized form. The problem with memcached by itself is that you're relying on the application to invalidate the cache. How many different places have to be kept in sync? Using memcached, in its current form, makes relying on the application to be developed correctly with centralized libraries and database access routines. Bah, that's a cluster f#$@ waiting to happen. pgmemcache fixes that though so that you don't have to worry about invalidating the cache in every application/routine. Instead you just centralize that logic in the database and automatically invalidate via triggers. It's working out very well for me. I'd be interested in success stories, fwiw. In the next week or so I'll probably stick this on pgfoundry and build a proper make/release structure. -sc -- Sean Chittenden
Josh Berkus wrote: > Michael, > > > Still, it seems like a convenient way to maintain cache coherency, > > assuming that your application doesn't already have a clean way to do > > that. > > Precisely. The big problem with memory caching is the cache getting out of > sync with the database. Updating the cache through database triggers helps > ameliorate that. > > However, our inability to pass messages with NOTIFY somewhat limits the the > utility of this solution Sean wants "on commit triggers", but there's some > major issues to work out with that. Passing messages with NOTIFY would be > easier and almost as good. The big concern I have about memcache is that because it controls storage external to the database there is no way to guarantee the cache is consistent with the database. This is similar to sending email in a trigger or on commit where you can't be certain you send email always and only on a commit. In the database, we mark everything we do with a transaction id and mark the transaction id as committed in on operation. I see no way to do that with memcache. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce, > The big concern I have about memcache is that because it controls > storage external to the database there is no way to guarantee the cache > is consistent with the database. This is similar to sending email in a > trigger or on commit where you can't be certain you send email always > and only on a commit. Well, some things ... ON COMMIT triggers, or messages with NOTIFY, would improve the accuracy by cutting down on cached aborted transactions. However, caching is of necessity imperfect. Caching is a trade-off; greater access speed vs. perfect consistency (and any durability). There are cases where the access speed is more important than the consistency (or the durability). The answer is to use memcached judiciously and be prepared to account for minor inconsistencies. For that matter, as with other forms of cumulative asynchronous materialized view, it would be advisable to nightly re-build copies of data stored in memcached from scratch during system slow time, assuming that the data in memcached corresponds to a real table. Where memcached does not correspond to a real table (session keys, for example), it is not a concern at all. -- Josh Berkus Aglio Database Solutions San Francisco
> The big concern I have about memcache is that because it controls > storage external to the database there is no way to guarantee the cache > is consistent with the database. I've found that letting applications add data to memcache and then letting the database replace or delete keys seems to be the best approach to minimize exactly this issue. Having two clients update the cache is risky. Using triggers or using NOTIFY + tailing logs makes this much more bullet proof. > This is similar to sending email in a trigger or on commit where you > can't be certain you send email always > and only on a commit. While this is certainly a possibility, it's definitely closer to the exception and not the normal instance. > In the database, we mark everything we do with a transaction id and > mark > the transaction id as committed in on operation. I see no way to do > that with memcache. Correct. With an ON COMMIT trigger, it'll be easier to have a 100% accurate cache. That said, memcache does exist out side of the database so it's theoretically impossible to guarantee that the two are 100% in sync. pgmemcache goes a long way towards facilitating that the cache is in sync with the database, but it certainly doesn't guarantee it's in sync. That being said, I haven't had any instances of it not being in sync since using pgmemcache (I'm quite proud of this, to be honest *grin*). For critical operations such as financial transactions, however, I advise going to the database unless you're willing to swallow the financial cost of cache discrepancies. -sc -- Sean Chittenden
On Nov 21, 2004, at 11:55 PM, Sean Chittenden wrote: > >> This is similar to sending email in a trigger or on commit where you >> can't be certain you send email always >> and only on a commit. > > While this is certainly a possibility, it's definitely closer to the > exception and not the normal instance. While an exception, this is a very real possibility in day to day operations. The absence of any feedback or balancing mechanism between the database and cache makes it impossible to know that they are in sync and even a small error percentage multiplied over time will lead to an ever increasing likelihood of error. More dangerous is that this discrepancy will NOT always be apparent because without active verification of the correctness of the cache, we will not know about any errors unless the error grows to an obvious point. The errors may cause material damage long before they become obvious. This is a common failure pattern with caches. Patrick B. Kelly ------------------------------------------------------ http://patrickbkelly.org
> While an exception, this is a very real possibility in day to day > operations. The absence of any feedback or balancing mechanism between > the database and cache makes it impossible to know that they are in sync > and even a small error percentage multiplied over time will lead to an > ever increasing likelihood of error. Sure, but there are applications where it does not matter, and these applications are othen loading the database... think about displaying forum posts, products list in a web store, and especially category trees, top N queries... for all these, it does not matter if the data is a bit stale. For instance, a very popular forum will be cached, which is very important. In this case I think it is acceptable if a new post does not appear instantly. Of course, when inserting or updating data in the database, the primary keys and other important data should be fetched from the database and not the cache, which supposes a bit of application logic (for instance, in a forum, the display page should query the cache, but the "post message" page should query the database directly). Memcache can also save the database from update-heavy tasks like user session management. In that case sessions can be stored entirely in memory. ON COMMIT triggers would be very useful. > More dangerous is that this discrepancy will NOT always be apparent > because without active verification of the correctness of the cache, we > will not know about any errors unless the error grows to an obvious > point. > The errors may cause material damage long before they become obvious. > This is a common failure pattern with caches. This is why it would be dangerous to fetch referential integrity data from the cache... this fits your "banking" example for instance.
Pierre-Fr�d�ric Caillaud wrote: > > > While an exception, this is a very real possibility in day to day > > operations. The absence of any feedback or balancing mechanism between > > the database and cache makes it impossible to know that they are in sync > > and even a small error percentage multiplied over time will lead to an > > ever increasing likelihood of error. > > Sure, but there are applications where it does not matter, and these > applications are othen loading the database... think about displaying > forum posts, products list in a web store, and especially category trees, > top N queries... for all these, it does not matter if the data is a bit > stale. For instance, a very popular forum will be cached, which is very > important. In this case I think it is acceptable if a new post does not > appear instantly. My point was that there are two failure cases --- one where the cache is slightly out of date compared to the db server --- these are cases where the cache update is slightly before/after the commit. The second is where the cache update happens and the commit later fails, or the commit happens and the cache update never happens. In these cases the cache is out of date for the amount of time you cache the data and not expire it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> My point was that there are two failure cases --- one where the cache > is > slightly out of date compared to the db server --- these are cases > where > the cache update is slightly before/after the commit. I was thinking about this and ways to minimize this even further. Have memcache clients add data and have a policy to have the database only delete data. This sets the database up as the bottleneck again, but then you have a degree of transactionality that couldn't be previously achieved with the database issuing replace commands. For example: 1) client checks the cache for data and gets a cache lookup failure 2) client beings transaction 3) client SELECTs data from the database 4) client adds the key to the cache 5) client commits transaction This assumes that the client won't rollback or have a transaction failure. Again, in 50M transactions, I doubt one of them would fail (sure, it's possible, but that's a symptom of bigger problems: memcached isn't an RDBMS). The update case being: 1) client begins transaction 2) client updates data 3) database deletes record from memcache 4) client commits transaction 5) client adds data to memcache > The second is > where the cache update happens and the commit later fails, or the > commit > happens and the cache update never happens. Having pgmemcache delete, not replace data addresses this second issue. -sc -- Sean Chittenden
If instead of a select you do a select for update I think this would be transaction safe. Nothing would be able to modify the data in the database between when you do the SELECT and when you commit. If the transaction fails the value in memcached will be correct. Also, it's not clear if you're doing an update here or not... If you're doing an update then this wouldn't work. You'd want to do your update, then re-insert the value into memcached outside of the update transaction. On Tue, Nov 23, 2004 at 02:20:34PM -0800, Sean Chittenden wrote: > >My point was that there are two failure cases --- one where the cache > >is > >slightly out of date compared to the db server --- these are cases > >where > >the cache update is slightly before/after the commit. > > I was thinking about this and ways to minimize this even further. Have > memcache clients add data and have a policy to have the database only > delete data. This sets the database up as the bottleneck again, but > then you have a degree of transactionality that couldn't be previously > achieved with the database issuing replace commands. For example: > > 1) client checks the cache for data and gets a cache lookup failure > 2) client beings transaction > 3) client SELECTs data from the database > 4) client adds the key to the cache > 5) client commits transaction > > This assumes that the client won't rollback or have a transaction > failure. Again, in 50M transactions, I doubt one of them would fail > (sure, it's possible, but that's a symptom of bigger problems: > memcached isn't an RDBMS). > > The update case being: > > 1) client begins transaction > 2) client updates data > 3) database deletes record from memcache > 4) client commits transaction > 5) client adds data to memcache > > >The second is > >where the cache update happens and the commit later fails, or the > >commit > >happens and the cache update never happens. > > Having pgmemcache delete, not replace data addresses this second issue. > -sc > > -- > Sean Chittenden > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"