Re: Postgresql Caching - Mailing list pgsql-hackers
From | mark@mark.mielke.cc |
---|---|
Subject | Re: Postgresql Caching |
Date | |
Msg-id | 20061015235542.GB28246@mark.mielke.cc Whole thread Raw |
In response to | Re: Postgresql Caching (Shane Ambler <pgsql@007Marketing.com>) |
Responses |
Re: Postgresql Caching
|
List | pgsql-hackers |
On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote: > mark@mark.mielke.cc wrote: > >As a thought experiment, I'm not seeing the benefit. I think if you > >could prove a benefit, then any proof you provided could be used to > >improve the already existing caching layers, and would apply equally > >to read-only or read-write pages. For example, why not be able to > >hint to PostgreSQL that a disk-based table should be considered a > >priority to keep in RAM. That way, PostgreSQL would avoid pushing > >pages from this table out. > If memcached (or pgmemcached implemented in triggers) can show a speed > improvement using ram based caching (even with network overhead) of > specific data then it stands to reason that this ram based cache can be > integrated into postgres with better integration that will overcome the > issues that pgmemcached has. I think you might misunderstand how memcache gets its speed. It gets away from reading or writing to disk because it makes no promise that the data is safe, even in RAM. memcache may choose to overwrite the data at any time. It makes no consistency guarantees. There are no read/write boundaries. You and I can both read, and both write, and the result is indeterminate. It limits itself to only two fields per row. A key and a value. Both fields must be string types. Based on all of the assumptions above, all read and write operations are fast enough, that they can be serialized. This allows for the process to be single-process, single-thread, with no requirement for co-ordination between these processes or threads. There is no need for locking any data structures, or waiting or any resources. Requests can be dispatched immediately. What of the above fits into PostgreSQL? Can PostgreSQL choose to remove records on a whim? Can PostgreSQL get away with removing transaction boundaries, and making specific tables always read and write to latest? Can PostgreSQL tables be limited to two fields? Can PostgreSQL get away with zero synchronization between processes or threads? The architectures are almost opposite of each other. I don't see how you could combine the architectures. Effectively, you would need to have two engines inside PostgreSQL, with the engine type selected from the table type. MySQL sort of does this. In MySQL, some tables support transactions while others do not. Is that what you are asking for? > My original thoughts were integrating it into the sql level to allow the > database structure to define what we would want to cache in ram, which > is similar to what is happening with using pgmemcached. In my experience, the most costly part of SQL, for very simple queries, is the query planning. As soon as you have more than one key and one value, you require query planning of some sort. I believe this is the SQL overhead. Parsing the SQL, and determining how to best execute it. Lighter weight databases, such as BerkeleyDB already exist to do what you are asking for. memcache makes few guarantees. BerkeleyDB and similar gives you transactions. PostgreSQL and similar give you SQL. Each level of abstraction costs. > Expanding create table to specify that a table gets priority in cache or > allocate x amount of cache to be used by table y could be a better way > than saying all of this table in ram. Or, it could be a worse way. Where is the evidence that it would be better? > I think the main benefit of my first ideas would come from the later > examples I gave where create memory tablespace with slaves would allow > the use of extra machines, effectively increasing the ram available > outside the current postgres setup. MySQL has some sort of distributed scheme like this, based on a partitioning of the keys. I'm don't know how great it is. Other than the sales pitch we were given when MySQL gave us a presentation, I haven't heard of it in use by others. > Maybe implementing this idea as a way of increasing the current postgres > caching would be a better implementation than the memory tablespaces > idea. As in integrating a version of pgmemcached as an option into the > current caching layers. Thus implementing it at the config level instead > of the structure design. Although defining tables to get priority or > allocated space in the ram cache would fit well with that. If there is a problem with the current cache algorithm, it should be fixed. What is the problem with it? I think the memcache people are thinking that the cost of PostgreSQL is about the disk. Although the disk plays a part, I'm pretty sure it's only a fraction. Not providing transaction guarantees, not providing an SQL level abstraction, and not having multiple processes or threads plays a much bigger part. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
pgsql-hackers by date: