Re: Postgresql Caching - Mailing list pgsql-hackers

From mark@mark.mielke.cc
Subject Re: Postgresql Caching
Date
Msg-id 20061015235542.GB28246@mark.mielke.cc
Whole thread Raw
In response to Re: Postgresql Caching  (Shane Ambler <pgsql@007Marketing.com>)
Responses Re: Postgresql Caching  (Harvell F <fharvell@file13.info>)
List pgsql-hackers
On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote:
> mark@mark.mielke.cc wrote:
> >As a thought experiment, I'm not seeing the benefit. I think if you
> >could prove a benefit, then any proof you provided could be used to
> >improve the already existing caching layers, and would apply equally
> >to read-only or read-write pages. For example, why not be able to
> >hint to PostgreSQL that a disk-based table should be considered a
> >priority to keep in RAM. That way, PostgreSQL would avoid pushing
> >pages from this table out.
> If memcached (or pgmemcached implemented in triggers) can show a speed 
> improvement using ram based caching (even with network overhead) of 
> specific data then it stands to reason that this ram based cache can be 
> integrated into postgres with better integration that will overcome the 
> issues that pgmemcached has.

I think you might misunderstand how memcache gets its speed. It gets
away from reading or writing to disk because it makes no promise that
the data is safe, even in RAM. memcache may choose to overwrite the
data at any time.

It makes no consistency guarantees. There are no read/write
boundaries. You and I can both read, and both write, and the result
is indeterminate.

It limits itself to only two fields per row. A key and a value. Both
fields must be string types.

Based on all of the assumptions above, all read and write operations are
fast enough, that they can be serialized. This allows for the process to
be single-process, single-thread, with no requirement for co-ordination
between these processes or threads. There is no need for locking any
data structures, or waiting or any resources. Requests can be dispatched
immediately.

What of the above fits into PostgreSQL? Can PostgreSQL choose to remove
records on a whim? Can PostgreSQL get away with removing transaction
boundaries, and making specific tables always read and write to latest?
Can PostgreSQL tables be limited to two fields? Can PostgreSQL get away
with zero synchronization between processes or threads?

The architectures are almost opposite of each other. I don't see how you
could combine the architectures. Effectively, you would need to have two
engines inside PostgreSQL, with the engine type selected from the table
type. MySQL sort of does this. In MySQL, some tables support transactions
while others do not. Is that what you are asking for?

> My original thoughts were integrating it into the sql level to allow the 
> database structure to define what we would want to cache in ram, which 
> is similar to what is happening with using pgmemcached.

In my experience, the most costly part of SQL, for very simple queries,
is the query planning. As soon as you have more than one key and one value,
you require query planning of some sort. I believe this is the SQL overhead.
Parsing the SQL, and determining how to best execute it.

Lighter weight databases, such as BerkeleyDB already exist to do what
you are asking for. memcache makes few guarantees. BerkeleyDB and
similar gives you transactions. PostgreSQL and similar give you SQL.
Each level of abstraction costs.

> Expanding create table to specify that a table gets priority in cache or 
> allocate x amount of cache to be used by table y could be a better way 
> than saying all of this table in ram.

Or, it could be a worse way. Where is the evidence that it would be better?

> I think the main benefit of my first ideas would come from the later 
> examples I gave where create memory tablespace with slaves would allow 
> the use of extra machines, effectively increasing the ram available 
> outside the current postgres setup.

MySQL has some sort of distributed scheme like this, based on a
partitioning of the keys. I'm don't know how great it is. Other than
the sales pitch we were given when MySQL gave us a presentation, I
haven't heard of it in use by others.

> Maybe implementing this idea as a way of increasing the current postgres 
> caching would be a better implementation than the memory tablespaces 
> idea. As in integrating a version of pgmemcached as an option into the 
> current caching layers. Thus implementing it at the config level instead 
> of the structure design. Although defining tables to get priority or 
> allocated space in the ram cache would fit well with that.

If there is a problem with the current cache algorithm, it should be fixed.

What is the problem with it?

I think the memcache people are thinking that the cost of PostgreSQL is
about the disk. Although the disk plays a part, I'm pretty sure it's
only a fraction. Not providing transaction guarantees, not providing an
SQL level abstraction, and not having multiple processes or threads
plays a much bigger part.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



pgsql-hackers by date:

Previous
From: mark@mark.mielke.cc
Date:
Subject: Re: Postgresql Caching
Next
From: Jeremy Drake
Date:
Subject: Re: Postgresql Caching