Thread: GSoC Proposal - Caching query results in pgpool-II

GSoC Proposal - Caching query results in pgpool-II

From
Masanori Yamazaki
Date:
Hello<br /><br /> My name is Masanori Yamazaki. I am sending my proposal about <br />Google Summer Of Code2011. It
wouldbe nice if you could give <br />me your opinion.<br /><br /><br />・title<br /><br />Caching query results in
pgpool-II<br/><br /><br />・Synopsis<br /><br />Pgpool-II has query caching functionality using storage provided by<br
/>dedicated PostgreSQL ("system database"). This has several drawbacks<br />however. 1)it is slow because it needs to
accessdisk storage 2)it<br />does not invalidate the cache automatically.<br /><br />This proposal tries to solve these
problems.<br/><br />- To speed up the cache access, it will be placed on memory, rather<br />  than database. The
memorywill be put on shared memory or external<br />  memory services such as memcached so that the cache can be shared
by<br/>   multiple sessions. Old cache entries will be deleted by LRU manner.<br /><br />- The cache will be
invalidatedautomatically upon the timing when the<br />  relevant tables are updated. Note that this is not always
possible<br/>   because the query result might come from multiple tables, views or<br />  even functions. In this case
thecache will be invalidated by<br />  timeout(or they are not cached at all).<br /><br />- Fine tuning knobs need to
beinvented to control the cache behavior<br />   though they are not clear at this moment.<br /><br /><br />・Benefits
tothe PostgreSQL Community:<br /><br /><br />Query caching will effectively enhance the performance of PostgreSQL<br
/>andthis project will contribute to increase the number of users of<br /> PostgreSQL, who need more high performance
databasesystems.<br /><br />Note that implementing query cache in pgpool-II will bring merits not<br />only to the
latestversion of PostgreSQL but to the previous releases<br />of PostgreSQL.<br /><br /><br />・Project Schedule<br
/><br/>-April<br /> preparation <br /><br />-May 1 - May 22 <br /> write a specification<br /><br />-May 23 - June
19<br/> coding<br /><br />-June 20 - July 22<br /> test<br /><br />-July 23 - August 12<br /> complete of coding and
test,commit<br /><br /><br />・Personal Data and Biographical Information<br /><br /> Name : Masanori Yamazaki<br
/> Born: 23.1.1981<br /> School :Currently I learn contemporary philosophy, culture and literature at Waseda University
inJapan.<br /> Coding :<br />  1.About five years job as web application programer(PHP, Java).<br /> 2.I experienced
projectsused framework such as Symfony, Zend Framework, CakePHP, and Struts.<br /> 3.I am interested in OSS and like
coding.<br/><br /><br /> Regards<br /> 

Re: GSoC Proposal - Caching query results in pgpool-II

From
Magnus Hagander
Date:
<p>How does this relate to the existing pqc project ( <a
href="http://code.google.com/p/pqc/">http://code.google.com/p/pqc/</a>)?Seems the goals are fairly similar, and both
arebased off pgpool?<p>/Magnus<br /><div class="gmail_quote">On Apr 6, 2011 2:10 AM, "Masanori Yamazaki" <<a
href="mailto:m.yamazaki23@gmail.com">m.yamazaki23@gmail.com</a>>wrote:<br type="attribution" />> Hello<br />>
<br/>> My name is Masanori Yamazaki. I am sending my proposal about<br /> > Google Summer Of Code2011. It would
benice if you could give<br />> me your opinion.<br />> <br />> <br />> ・title<br />> <br />> Caching
queryresults in pgpool-II<br />> <br />> <br />> ・Synopsis<br />> <br /> > Pgpool-II has query caching
functionalityusing storage provided by<br />> dedicated PostgreSQL ("system database"). This has several
drawbacks<br/>> however. 1)it is slow because it needs to access disk storage 2)it<br /> > does not invalidate
thecache automatically.<br />> <br />> This proposal tries to solve these problems.<br />> <br />> - To
speedup the cache access, it will be placed on memory, rather<br />> than database. The memory will be put on shared
memoryor external<br /> > memory services such as memcached so that the cache can be shared by<br />> multiple
sessions.Old cache entries will be deleted by LRU manner.<br />> <br />> - The cache will be invalidated
automaticallyupon the timing when the<br /> > relevant tables are updated. Note that this is not always possible<br
/>>because the query result might come from multiple tables, views or<br />> even functions. In this case the
cachewill be invalidated by<br /> > timeout(or they are not cached at all).<br />> <br />> - Fine tuning knobs
needto be invented to control the cache behavior<br />> though they are not clear at this moment.<br />> <br
/>><br />> ・Benefits to the PostgreSQL Community:<br /> > <br />> <br />> Query caching will effectively
enhancethe performance of PostgreSQL<br />> and this project will contribute to increase the number of users of<br
/>>PostgreSQL, who need more high performance database systems.<br /> > <br />> Note that implementing query
cachein pgpool-II will bring merits not<br />> only to the latest version of PostgreSQL but to the previous
releases<br/>> of PostgreSQL.<br />> <br />> <br />> ・Project Schedule<br /> > <br />> -April<br
/>>preparation<br />> <br />> -May 1 - May 22<br />> write a specification<br />> <br />> -May 23 -
June19<br />> coding<br />> <br />> -June 20 - July 22<br />> test<br />> <br />> -July 23 - August
12<br/> > complete of coding and test, commit<br />> <br />> <br />> ・Personal Data and Biographical
Information<br/>> <br />> Name : Masanori Yamazaki<br />> Born : 23.1.1981<br />> School :Currently I learn
contemporaryphilosophy, culture and literature<br /> > at Waseda University in Japan.<br />> Coding :<br />>
1.Aboutfive years job as web application programer(PHP, Java).<br />> 2.I experienced projects used framework such
asSymfony, Zend Framework,<br />> CakePHP, and Struts.<br /> > 3.I am interested in OSS and like coding.<br
/>><br />> <br />> Regards<br /></div> 

Re: GSoC Proposal - Caching query results in pgpool-II

From
Tatsuo Ishii
Date:
In my understanding pqc is not designed to be working with pgpool.
Thus if a user want to use both query cache and query dispatching,
replication or failover etc. which are provided by pgpool, it seems
it's not possible. For this purpose maybe user could *cascade* pqc and
pgpool, but I'm not sure. Even if it's possible, it will bring huge
performance penalty.

Another point is cache invalidation. Masanori's proposal includes
cache invalidation technique by looking at write queries, which is
lacking in pqc in my understanding.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> How does this relate to the existing pqc project (
> http://code.google.com/p/pqc/)? Seems the goals are fairly similar, and both
> are based off pgpool?
> 
> /Magnus
>  On Apr 6, 2011 2:10 AM, "Masanori Yamazaki" <m.yamazaki23@gmail.com> wrote:
>> Hello
>>
>> My name is Masanori Yamazaki. I am sending my proposal about
>> Google Summer Of Code2011. It would be nice if you could give
>> me your opinion.
>>
>>
>> ・title
>>
>> Caching query results in pgpool-II
>>
>>
>> ・Synopsis
>>
>> Pgpool-II has query caching functionality using storage provided by
>> dedicated PostgreSQL ("system database"). This has several drawbacks
>> however. 1)it is slow because it needs to access disk storage 2)it
>> does not invalidate the cache automatically.
>>
>> This proposal tries to solve these problems.
>>
>> - To speed up the cache access, it will be placed on memory, rather
>> than database. The memory will be put on shared memory or external
>> memory services such as memcached so that the cache can be shared by
>> multiple sessions. Old cache entries will be deleted by LRU manner.
>>
>> - The cache will be invalidated automatically upon the timing when the
>> relevant tables are updated. Note that this is not always possible
>> because the query result might come from multiple tables, views or
>> even functions. In this case the cache will be invalidated by
>> timeout(or they are not cached at all).
>>
>> - Fine tuning knobs need to be invented to control the cache behavior
>> though they are not clear at this moment.
>>
>>
>> ・Benefits to the PostgreSQL Community:
>>
>>
>> Query caching will effectively enhance the performance of PostgreSQL
>> and this project will contribute to increase the number of users of
>> PostgreSQL, who need more high performance database systems.
>>
>> Note that implementing query cache in pgpool-II will bring merits not
>> only to the latest version of PostgreSQL but to the previous releases
>> of PostgreSQL.
>>
>>
>> ・Project Schedule
>>
>> -April
>> preparation
>>
>> -May 1 - May 22
>> write a specification
>>
>> -May 23 - June 19
>> coding
>>
>> -June 20 - July 22
>> test
>>
>> -July 23 - August 12
>> complete of coding and test, commit
>>
>>
>> ・Personal Data and Biographical Information
>>
>> Name : Masanori Yamazaki
>> Born : 23.1.1981
>> School :Currently I learn contemporary philosophy, culture and literature
>> at Waseda University in Japan.
>> Coding :
>> 1.About five years job as web application programer(PHP, Java).
>> 2.I experienced projects used framework such as Symfony, Zend Framework,
>> CakePHP, and Struts.
>> 3.I am interested in OSS and like coding.
>>
>>
>> Regards


Re: GSoC Proposal - Caching query results in pgpool-II

From
Magnus Hagander
Date:
2011/4/7 Tatsuo Ishii <ishii@postgresql.org>:
> In my understanding pqc is not designed to be working with pgpool.
> Thus if a user want to use both query cache and query dispatching,
> replication or failover etc. which are provided by pgpool, it seems
> it's not possible. For this purpose maybe user could *cascade* pqc and
> pgpool, but I'm not sure. Even if it's possible, it will bring huge
> performance penalty.
>
> Another point is cache invalidation. Masanori's proposal includes
> cache invalidation technique by looking at write queries, which is
> lacking in pqc in my understanding.

Probably. My question wasn't necessarily "hasn't this already been
done in pqc", more "should this perhaps build on or integrate with pgc
in order not to duplicate effort". I think at the very least, any
overlap should be researched and identified - because if it can
integrate parts of pgc, or work with, more effort can be spent on the
new parts rather than redoing something that's already been done.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/