Thread: memcached and PostgreSQL

memcached and PostgreSQL

From
Michael Adler
Date:
http://pugs.postgresql.org/sfpug/archives/000021.html

I noticed that some of you left coasters were talking about memcached
and pgsql. I'm curious to know what was discussed.

In reading about memcached, it seems that many people are using it to
circumvent the scalability problems of MySQL (lack of MVCC).

from their site:

<snip>
Shouldn't the database do this?

Regardless of what database you use (MS-SQL, Oracle, Postgres,
MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID
properties in a RDBMS, especially when disks are involved, which means
queries are going to block. For databases that aren't ACID-compliant
(like MySQL-MyISAM), that overhead doesn't exist, but reading threads
block on the writing threads. memcached never blocks.
</snip>

So What does memcached offer pgsql users? It would still seem to offer
the benefit of a multi-machined cache.

-Mike

Re: memcached and PostgreSQL

From
Josh Berkus
Date:
Michael,

> So What does memcached offer pgsql users? It would still seem to offer
> the benefit of a multi-machined cache.

Yes, and a very, very fast one too ... like, 120,000 operations per second.
PostgreSQL can't match that because of the overhead of authentication,
security, transaction visibility checking, etc.

So memcached becomes a very good place to stick data that's read often but not
updated often, or alternately data that changes often but is disposable.   An
example of the former is a user+ACL list; and example of the latter is web
session information ... or simple materialized views.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: memcached and PostgreSQL

From
Troels Arvin
Date:
On Tue, 16 Nov 2004 21:47:54 -0800, Josh Berkus wrote:

> So memcached becomes a very good place to stick data that's read often but not
> updated often, or alternately data that changes often but is disposable.   An
> example of the former is a user+ACL list; and example of the latter is web
> session information ... or simple materialized views.

Has anyone tried at least two of

1. memcached
2. Tugela Cache (pretty much the same as memcached, I think)
3. Sharedance

In that case: Do you have any comparative remarks?


Links:

1: http://www.danga.com/memcached/

2: http://meta.wikimedia.org/wiki/Tugela_Cache
   http://cvs.sourceforge.net/viewcvs.py/wikipedia/tugelacache/

3: http://sharedance.pureftpd.org/

--
Greetings from Troels Arvin, Copenhagen, Denmark


Re: memcached and PostgreSQL

From
Greg Stark
Date:
Josh Berkus <josh@agliodbs.com> writes:

> So memcached becomes a very good place to stick data that's read often but not
> updated often, or alternately data that changes often but is disposable.   An
> example of the former is a user+ACL list; and example of the latter is web
> session information ... or simple materialized views.

I would like very much to use something like memcached for a materialized view
I have. The problem is that I have to join it against other tables.

I've thought about providing a SRF in postgres to read records out of
memcached but I'm unclear it would it really help at all.

Has anyone tried anything like this?

--
greg

Re: memcached and PostgreSQL

From
Mike Rylander
Date:
On 17 Nov 2004 03:08:20 -0500, Greg Stark <gsstark@mit.edu> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>
> > So memcached becomes a very good place to stick data that's read often but not
> > updated often, or alternately data that changes often but is disposable.   An
> > example of the former is a user+ACL list; and example of the latter is web
> > session information ... or simple materialized views.
>
> I would like very much to use something like memcached for a materialized view
> I have. The problem is that I have to join it against other tables.
>
> I've thought about providing a SRF in postgres to read records out of
> memcached but I'm unclear it would it really help at all.
>
> Has anyone tried anything like this?

I haven't tried it yet, but I plan too.  An intersting case might be
to use plperlu to interface with memcached and store hashes in the
cache via some external process, like a CGI script.  Then just define
a TYPE for the perl SRF to return, and store the data as an array of
hashes with keys matching the TYPE.

A (perhaps useless) example could then be something like:

CREATE TYPE user_info AS ( sessionid TEXT,  userid INT, lastaccess
TIMESTAMP, lastrequest TEXT);

CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$
  use Cache::Memcached;

  my $session = shift;

  my $c = $_SHARED{memcached} || Cache::Memcached->new( {servers =>
'127.0.0.1:1111'} );

  my $user_info = $m->get('web_access_list');

  # $user_info looks like
  # [ {userid => 5, lastrequest => 'http://...', lastaccess => localtime(),
  #    sessionid => '123456789'}, { ...} ]
  # and is stored by a CGI.

  @info = grep {$$_{sessionid} eq $session} @$user_info;

  return \@info;
$$ LANGUAGE 'plperlu';

SELECT u.username, f.lastrequest FROM users u,
get_user_info_by_session('123456789') WHERE f.userid = u.userid;


Any thoughts?

>
> --
> greg
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer

Re: memcached and PostgreSQL

From
Darcy Buskermolen
Date:
On November 16, 2004 08:00 pm, Michael Adler wrote:
> http://pugs.postgresql.org/sfpug/archives/000021.html
>
> I noticed that some of you left coasters were talking about memcached
> and pgsql. I'm curious to know what was discussed.
>
> In reading about memcached, it seems that many people are using it to
> circumvent the scalability problems of MySQL (lack of MVCC).
>
> from their site:
>
> <snip>
> Shouldn't the database do this?
>
> Regardless of what database you use (MS-SQL, Oracle, Postgres,
> MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID
> properties in a RDBMS, especially when disks are involved, which means
> queries are going to block. For databases that aren't ACID-compliant
> (like MySQL-MyISAM), that overhead doesn't exist, but reading threads
> block on the writing threads. memcached never blocks.
> </snip>
>
> So What does memcached offer pgsql users? It would still seem to offer
> the benefit of a multi-machined cache.

Have a look at the pdf presentation found on the following site:

http://people.freebsd.org/~seanc/pgmemcache/


>
> -Mike
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

--
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

Re: memcached and PostgreSQL

From
Michael Adler
Date:
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote:
> On November 16, 2004 08:00 pm, Michael Adler wrote:
> > http://pugs.postgresql.org/sfpug/archives/000021.html
> >
> > I noticed that some of you left coasters were talking about memcached
> > and pgsql. I'm curious to know what was discussed.
>
> Have a look at the pdf presentation found on the following site:
>
> http://people.freebsd.org/~seanc/pgmemcache/

Thanks for that.

That presentation was rather broad and the API seems rather general
purpose, but I wonder why you would really want access the cache by
way of the DB? If one major point of memcache is to allocate RAM to a
low-overhead server instead of to the RDBMS's disk cache, why would
you add the overhead of the RDBMS to the process?  (this is a bit of
straw man, but just trying to flesh-out the pros and cons)

Still, it seems like a convenient way to maintain cache coherency,
assuming that your application doesn't already have a clean way to do
that.

(just my uninformed opinion, though...)

-Mike

Re: memcached and PostgreSQL

From
Josh Berkus
Date:
Michael,

> Still, it seems like a convenient way to maintain cache coherency,
> assuming that your application doesn't already have a clean way to do
> that.

Precisely.    The big problem with memory caching is the cache getting out of
sync with the database.   Updating the cache through database triggers helps
ameliorate that.

However, our inability to pass messages with NOTIFY somewhat limits the the
utility of this solution   Sean wants "on commit triggers", but there's some
major issues to work out with that.   Passing messages with NOTIFY would be
easier and almost as good.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: memcached and PostgreSQL

From
Sean Chittenden
Date:
> So What does memcached offer pgsql users? It would still seem to offer
> the benefit of a multi-machined cache.

Ack, I totally missed this thread.  Sorry for jumping in late.

Basically, memcached and pgmemcache offer a more technically correct
way of implementing query caching.  MySQL's query caching is a
disaster, IMHO.  memcached alleviates this load from the database and
puts it elsewhere in a more optimized form.  The problem with memcached
by itself is that you're relying on the application to invalidate the
cache.  How many different places have to be kept in sync?  Using
memcached, in its current form, makes relying on the application to be
developed correctly with centralized libraries and database access
routines.  Bah, that's a cluster f#$@ waiting to happen.

pgmemcache fixes that though so that you don't have to worry about
invalidating the cache in every application/routine.  Instead you just
centralize that logic in the database and automatically invalidate via
triggers.  It's working out very well for me.

I'd be interested in success stories, fwiw.  In the next week or so
I'll probably stick this on pgfoundry and build a proper make/release
structure.  -sc

--
Sean Chittenden


Re: memcached and PostgreSQL

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Michael,
>
> > Still, it seems like a convenient way to maintain cache coherency,
> > assuming that your application doesn't already have a clean way to do
> > that.
>
> Precisely.    The big problem with memory caching is the cache getting out of
> sync with the database.   Updating the cache through database triggers helps
> ameliorate that.
>
> However, our inability to pass messages with NOTIFY somewhat limits the the
> utility of this solution   Sean wants "on commit triggers", but there's some
> major issues to work out with that.   Passing messages with NOTIFY would be
> easier and almost as good.

The big concern I have about memcache is that because it controls
storage external to the database there is no way to guarantee the cache
is consistent with the database.  This is similar to sending email in a
trigger or on commit where you can't be certain you send email always
and only on a commit.

In the database, we mark everything we do with a transaction id and mark
the transaction id as committed in on operation.  I see no way to do
that with memcache.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: memcached and PostgreSQL

From
Josh Berkus
Date:
Bruce,

> The big concern I have about memcache is that because it controls
> storage external to the database there is no way to guarantee the cache
> is consistent with the database.  This is similar to sending email in a
> trigger or on commit where you can't be certain you send email always
> and only on a commit.

Well, some things ... ON COMMIT triggers, or messages with NOTIFY, would
improve the accuracy by cutting down on cached aborted transactions.

However, caching is of necessity imperfect.   Caching is a trade-off; greater
access speed vs. perfect consistency (and any durability).    There are cases
where the access speed is more important than the consistency (or the
durability).   The answer is to use memcached judiciously and be prepared to
account for minor inconsistencies.

For that matter, as with other forms of cumulative asynchronous materialized
view, it would be advisable to nightly re-build copies of data stored in
memcached from scratch during system slow time, assuming that the data in
memcached corresponds to a real table.  Where memcached does not correspond
to a real table (session keys, for example), it is not a concern at all.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: memcached and PostgreSQL

From
Sean Chittenden
Date:
> The big concern I have about memcache is that because it controls
> storage external to the database there is no way to guarantee the cache
> is consistent with the database.

I've found that letting applications add data to memcache and then
letting the database replace or delete keys seems to be the best
approach to minimize exactly this issue.  Having two clients update the
cache is risky.  Using triggers or using NOTIFY + tailing logs makes
this much more bullet proof.

> This is similar to sending email in a trigger or on commit where you
> can't be certain you send email always
> and only on a commit.

While this is certainly a possibility, it's definitely closer to the
exception and not the normal instance.

> In the database, we mark everything we do with a transaction id and
> mark
> the transaction id as committed in on operation.  I see no way to do
> that with memcache.

Correct.  With an ON COMMIT trigger, it'll be easier to have a 100%
accurate cache.  That said, memcache does exist out side of the
database so it's theoretically impossible to guarantee that the two are
100% in sync.  pgmemcache goes a long way towards facilitating that the
cache is in sync with the database, but it certainly doesn't guarantee
it's in sync.  That being said, I haven't had any instances of it not
being in sync since using pgmemcache (I'm quite proud of this, to be
honest *grin*).  For critical operations such as financial
transactions, however, I advise going to the database unless you're
willing to swallow the financial cost of cache discrepancies.

-sc

--
Sean Chittenden


Re: memcached and PostgreSQL

From
Patrick B Kelly
Date:
On Nov 21, 2004, at 11:55 PM, Sean Chittenden wrote:

>
>> This is similar to sending email in a trigger or on commit where you
>> can't be certain you send email always
>> and only on a commit.
>
> While this is certainly a possibility, it's definitely closer to the
> exception and not the normal instance.

While an exception, this is a very real possibility in day to day
operations. The absence of any feedback or balancing mechanism between
the database and cache makes it impossible to know that they are in
sync and even a small error percentage multiplied over time will lead
to an ever increasing likelihood of error.

More dangerous is that this discrepancy will NOT always be apparent
because without active verification of the correctness of the cache, we
will not know about any errors unless the error grows to an obvious
point. The errors may cause material damage long before they become
obvious. This is a common failure pattern with caches.



Patrick B. Kelly
------------------------------------------------------
                               http://patrickbkelly.org


Re: memcached and PostgreSQL

From
Pierre-Frédéric Caillaud
Date:
> While an exception, this is a very real possibility in day to day
> operations. The absence of any feedback or balancing mechanism between
> the database and cache makes it impossible to know that they are in sync
> and even a small error percentage multiplied over time will lead to an
> ever increasing likelihood of error.

    Sure, but there are applications where it does not matter, and these
applications are othen loading the database... think about displaying
forum posts, products list in a web store, and especially category trees,
top N queries... for all these, it does not matter if the data is a bit
stale. For instance, a very popular forum will be cached, which is very
important. In this case I think it is acceptable if a new post does not
appear instantly.

    Of course, when inserting or updating data in the database, the primary
keys and other important data should be fetched from the database and not
the cache, which supposes a bit of application logic (for instance, in a
forum, the display page should query the cache, but the "post message"
page should query the database directly).

    Memcache can also save the database from update-heavy tasks like user
session management. In that case sessions can be stored entirely in memory.

    ON COMMIT triggers would be very useful.

> More dangerous is that this discrepancy will NOT always be apparent
> because without active verification of the correctness of the cache, we
> will not know about any errors unless the error grows to an obvious
> point.

> The errors may cause material damage long before they become obvious.
> This is a common failure pattern with caches.

    This is why it would be dangerous to fetch referential integrity data
 from the cache... this fits your "banking" example for instance.

Re: memcached and PostgreSQL

From
Bruce Momjian
Date:
Pierre-Fr�d�ric Caillaud wrote:
>
> > While an exception, this is a very real possibility in day to day
> > operations. The absence of any feedback or balancing mechanism between
> > the database and cache makes it impossible to know that they are in sync
> > and even a small error percentage multiplied over time will lead to an
> > ever increasing likelihood of error.
>
>     Sure, but there are applications where it does not matter, and these
> applications are othen loading the database... think about displaying
> forum posts, products list in a web store, and especially category trees,
> top N queries... for all these, it does not matter if the data is a bit
> stale. For instance, a very popular forum will be cached, which is very
> important. In this case I think it is acceptable if a new post does not
> appear instantly.

My point was that there are two failure cases --- one where the cache is
slightly out of date compared to the db server --- these are cases where
the cache update is slightly before/after the commit.  The second is
where the cache update happens and the commit later fails, or the commit
happens and the cache update never happens.  In these cases the cache is
out of date for the amount of time you cache the data and not expire it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: memcached and PostgreSQL

From
Sean Chittenden
Date:
> My point was that there are two failure cases --- one where the cache
> is
> slightly out of date compared to the db server --- these are cases
> where
> the cache update is slightly before/after the commit.

I was thinking about this and ways to minimize this even further.  Have
memcache clients add data and have a policy to have the database only
delete data.  This sets the database up as the bottleneck again, but
then you have a degree of transactionality that couldn't be previously
achieved with the database issuing replace commands.  For example:

1) client checks the cache for data and gets a cache lookup failure
2) client beings transaction
3) client SELECTs data from the database
4) client adds the key to the cache
5) client commits transaction

This assumes that the client won't rollback or have a transaction
failure.  Again, in 50M transactions, I doubt one of them would fail
(sure, it's possible, but that's a symptom of bigger problems:
memcached isn't an RDBMS).

The update case being:

1) client begins transaction
2) client updates data
3) database deletes record from memcache
4) client commits transaction
5) client adds data to memcache

> The second is
> where the cache update happens and the commit later fails, or the
> commit
> happens and the cache update never happens.

Having pgmemcache delete, not replace data addresses this second issue.
  -sc

--
Sean Chittenden


Re: memcached and PostgreSQL

From
"Jim C. Nasby"
Date:
If instead of a select you do a select for update I think this would be
transaction safe. Nothing would be able to modify the data in the
database between when you do the SELECT and when you commit. If the
transaction fails the value in memcached will be correct.

Also, it's not clear if you're doing an update here or not... If you're
doing an update then this wouldn't work. You'd want to do your update,
then re-insert the value into memcached outside of the update
transaction.

On Tue, Nov 23, 2004 at 02:20:34PM -0800, Sean Chittenden wrote:
> >My point was that there are two failure cases --- one where the cache
> >is
> >slightly out of date compared to the db server --- these are cases
> >where
> >the cache update is slightly before/after the commit.
>
> I was thinking about this and ways to minimize this even further.  Have
> memcache clients add data and have a policy to have the database only
> delete data.  This sets the database up as the bottleneck again, but
> then you have a degree of transactionality that couldn't be previously
> achieved with the database issuing replace commands.  For example:
>
> 1) client checks the cache for data and gets a cache lookup failure
> 2) client beings transaction
> 3) client SELECTs data from the database
> 4) client adds the key to the cache
> 5) client commits transaction
>
> This assumes that the client won't rollback or have a transaction
> failure.  Again, in 50M transactions, I doubt one of them would fail
> (sure, it's possible, but that's a symptom of bigger problems:
> memcached isn't an RDBMS).
>
> The update case being:
>
> 1) client begins transaction
> 2) client updates data
> 3) database deletes record from memcache
> 4) client commits transaction
> 5) client adds data to memcache
>
> >The second is
> >where the cache update happens and the commit later fails, or the
> >commit
> >happens and the cache update never happens.
>
> Having pgmemcache delete, not replace data addresses this second issue.
>  -sc
>
> --
> Sean Chittenden
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"