Re: Patch for fast gin cache performance improvement - Mailing list pgsql-hackers

From Ian Link
Subject Re: Patch for fast gin cache performance improvement
Date
Msg-id 5258F2C3.7020301@ilink.io
Whole thread Raw
In response to Re: Patch for fast gin cache performance improvement  ("Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
<blockquote type="cite"><span><pre wrap="">I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?
</pre></span></blockquote> Sounds like a good idea. Thanks for the review!<br /> Ian Link<br /><br /><br /><blockquote
cite="mid:006401cec58e$f658b680$e30a2380$@lab.ntt.co.jp"style="border: 0px none;" type="cite"><div class="__pbConvHr"
style="margin:30px25px 10px 25px;"><div style="display:table;width:100%;border-top:1px solid 
 
#EDEEF0;padding-top:5px"><div style="display:table-cell;vertical-align:middle;padding-right:6px;"><img height="25px"
name="compose-unknown-contact.jpg"photoaddress="fujita.etsuro@lab.ntt.co.jp" photoname="Etsuro Fujita"
src="cid:part1.09030200.06060505@ilink.io"width="25px" /></div><div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%"><a
href="mailto:fujita.etsuro@lab.ntt.co.jp"moz-do-not-send="true" style="color:#737F92 
 
!important;padding-right:6px;font-weight:bold;text-decoration:none 
!important;">Etsuro Fujita</a></div><div style="display:table-cell;white-space:nowrap;vertical-align:middle;"><font
color="#9FA2A5"><spanstyle="padding-left:6px">Thursday, October 10, 2013 1:01 AM</span></font></div></div></div><div
__pbrmquotes="true"class="__pbConvBody" style="color:#888888;margin-left:24px;margin-right:24px;"><pre wrap="">Ian Link
wrote:
</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">Although I asked this question, I've reconsidered
aboutthese
 
parameters, and it seems that these parameters not only make code
rather complex but are a little confusing to users.  So I'd like to propose
</pre></blockquote><pre wrap="">to introduce only one parameter:
</pre><blockquote type="cite"><pre wrap="">fast_cache_size.  While users that give weight to update performance
for the fast update technique set this parameter to a large value,
users that give weight not only to update performance but to search
performance set this parameter to a small value.  What do you think about
</pre></blockquote><pre wrap="">this?
I think it makes sense to maintain this separation. If the user doesn't need
a per-index setting, they don't have to use the parameter. Since the parameter
is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need fine-grained
control. We can document this and I don't think it will be confusing to the
user.
</pre></blockquote><pre wrap="">
OK, though I'd like to hear the opinion of others.

</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">4. In my understanding, the small value of
gin_fast_limit/fast_cache_size leads to the increase in GIN search
performance, which, however, leads to the decrease in GIN update
performance.  Am I right?  If so, I think the tradeoff should be noted in
</pre></blockquote><pre wrap="">the documentation.
I believe this is correct.

</pre><blockquote type="cite"><pre wrap="">5. The following documents in Chapter 57. GIN Indexes need to be
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
Tricks
</pre></blockquote><pre wrap="">Sure, I can add something.

</pre><blockquote type="cite"><pre wrap="">6. I would like to see the results for the additional test cases
</pre></blockquote></blockquote><pre wrap="">(tsvectors).
</pre><blockquote type="cite"><pre wrap="">I don't really have any good test cases for this available, and have very
</pre></blockquote><pre wrap="">limited
</pre><blockquote type="cite"><pre wrap="">time for postgres at the moment. Feel free to create a test case, but I
don't
believe I can at the moment. Sorry!
</pre></blockquote><pre wrap="">
Unfortunately, I don't have much time to do such a thing, though I think we
should do that.  (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">7. The commented-out elog() code should be removed.
</pre></blockquote><pre wrap="">Sorry about that, I shouldn't have submitted the patch with those still there.

I should have a new patch soonish, hopefully. Thanks for your feedback!
</pre></blockquote><pre wrap="">
I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?

Thanks,

Best regards,
Etsuro Fujita


</pre></div><div class="__pbConvHr" style="margin:30px 25px 10px 25px;"><div
style="display:table;width:100%;border-top:1pxsolid 
 
#EDEEF0;padding-top:5px"><div style="display:table-cell;vertical-align:middle;padding-right:6px;"><img height="25px"
name="compose-unknown-contact.jpg"photoaddress="ian@ilink.io" photoname="Ian Link"
src="cid:part1.09030200.06060505@ilink.io"width="25px" /></div><div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%"><ahref="mailto:ian@ilink.io"
moz-do-not-send="true"style="color:#737F92 
 
!important;padding-right:6px;font-weight:bold;text-decoration:none 
!important;">Ian Link</a></div><div style="display:table-cell;white-space:nowrap;vertical-align:middle;"><font
color="#9FA2A5"><spanstyle="padding-left:6px">Monday, September 30, 2013 3:09 PM</span></font></div></div></div><div
__pbrmquotes="true"class="__pbConvBody" style="color:#888888;margin-left:24px;margin-right:24px;"><span>Hi Etsuro,<br
/>Sorry for the delay but I have been very busy with work. I have been away from postgres for a while, so I will need a
littletime to review the code and make sure I give you an informed response. I'll get back to you as soon as I am able.
Thanksfor understanding.<br /> Ian Link </span><br /><br /></div><div class="__pbConvHr" style="margin:30px 25px 10px
25px;"><divstyle="display:table;width:100%;border-top:1px solid 
 
#EDEEF0;padding-top:5px"><div style="display:table-cell;vertical-align:middle;padding-right:6px;"><img height="25px"
name="compose-unknown-contact.jpg"photoaddress="fujita.etsuro@lab.ntt.co.jp" photoname="Etsuro Fujita"
src="cid:part1.09030200.06060505@ilink.io"width="25px" /></div><div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%"><a
href="mailto:fujita.etsuro@lab.ntt.co.jp"moz-do-not-send="true" style="color:#737F92 
 
!important;padding-right:6px;font-weight:bold;text-decoration:none 
!important;">Etsuro Fujita</a></div><div style="display:table-cell;white-space:nowrap;vertical-align:middle;"><font
color="#9FA2A5"><spanstyle="padding-left:6px">Friday, September 27, 2013 2:24 AM</span></font></div></div></div><div
__pbrmquotes="true"class="__pbConvBody" style="color:#888888;margin-left:24px;margin-right:24px;"><pre wrap="">I
wrote:
</pre><blockquote type="cite"><pre wrap="">I had a look over this patch.  I think this patch is interesting and very
</pre></blockquote><pre wrap="">useful.
</pre><blockquote type="cite"><pre wrap="">Here are my review points:
</pre></blockquote><pre wrap="">
</pre><blockquote type="cite"><pre wrap="">8. I think there are no issues in this patch.  However, I have one
question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In
this case, in my understanding, this patch inserts new entries into the
</pre></blockquote><pre wrap="">pending
</pre><blockquote type="cite"><pre wrap="">list temporarily and immediately moves them to the main GIN data structure
</pre></blockquote><pre wrap="">using
</pre><blockquote type="cite"><pre wrap="">ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.
</pre></blockquote><pre wrap="">
Sorry, There are incorrect expressions.  I mean gin_fast_limit > 0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users.  So I'd like to propose to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value.  What do you think about this?

Thanks,

Best regards,
Etsuro Fujita


</pre></div><div class="__pbConvHr" style="margin:30px 25px 10px 25px;"><div
style="display:table;width:100%;border-top:1pxsolid 
 
#EDEEF0;padding-top:5px"><div style="display:table-cell;vertical-align:middle;padding-right:6px;"><img height="25px"
name="compose-unknown-contact.jpg"photoaddress="fujita.etsuro@lab.ntt.co.jp" photoname="Etsuro Fujita"
src="cid:part1.09030200.06060505@ilink.io"width="25px" /></div><div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%"><a
href="mailto:fujita.etsuro@lab.ntt.co.jp"moz-do-not-send="true" style="color:#737F92 
 
!important;padding-right:6px;font-weight:bold;text-decoration:none 
!important;">Etsuro Fujita</a></div><div style="display:table-cell;white-space:nowrap;vertical-align:middle;"><font
color="#9FA2A5"><spanstyle="padding-left:6px">Thursday, September 26, 2013 6:02 AM</span></font></div></div></div><div
__pbrmquotes="true"class="__pbConvBody" style="color:#888888;margin-left:24px;margin-right:24px;"><pre wrap="">Hi Ian,
 

</pre><blockquote type="cite"><pre wrap="">This patch contains a performance improvement for the fast gin cache. As
you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size
</pre></blockquote><pre wrap="">called
</pre><blockquote type="cite"><pre wrap="">gin_fast_limit. This global variable controls the size of the fast gin
cache,
independently of work_mem. Currently, the default gin_fast_limit is set to
</pre></blockquote><pre wrap="">128kB.
</pre><blockquote type="cite"><pre wrap="">However, that value could need tweaking. 64kB may work better, but it's
hard
to say with only my single machine to test on.
</pre></blockquote><pre wrap="">
</pre><blockquote type="cite"><pre wrap="">On my machine, this patch results in a nice speed up. Our test queries
improve
from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
it should be attached. I can look into additional test cases (tsvectors) if
anyone is interested.
</pre></blockquote><pre wrap="">
</pre><blockquote type="cite"><pre wrap="">In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it is
disabled. If the user does not specify a per-index limit, the index will
</pre></blockquote><pre wrap="">simply
</pre><blockquote type="cite"><pre wrap="">use the global limit.
</pre></blockquote><pre wrap="">
I had a look over this patch.  I think this patch is interesting and very
useful.  Here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did performance evaluation using your test queries with 64kB and 128kB of
gin_fast_limit (or fast_cache_size), and saw that both values achieved the
performance gains over gin_fast_limit = '256MB'.  64kB worked better than 128kB.
64kB improved from 1.057 ms to 0.075 ms.  Great!
4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance.  Am I right?  If so, I think the tradeoff should be
noted in the documentation.
5. The following documents in Chapter 57. GIN Indexes need to be updated:* 57.3.1. GIN Fast Update Technique* 57.4. GIN
Tipsand Tricks
 
6. I would like to see the results for the additional test cases (tsvectors).
7. The commented-out elog() code should be removed.
8. I think there are no issues in this patch.  However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.

Sorry for the delay.

Best regards,
Etsuro Fujita


</pre></div><div class="__pbConvHr" style="margin:30px 25px 10px 25px;"><div
style="display:table;width:100%;border-top:1pxsolid 
 
#EDEEF0;padding-top:5px"><div style="display:table-cell;vertical-align:middle;padding-right:6px;"><img height="25px"
name="compose-unknown-contact.jpg"photoaddress="ian@ilink.io" photoname="Ian Link"
src="cid:part1.09030200.06060505@ilink.io"width="25px" /></div><div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%"><ahref="mailto:ian@ilink.io"
moz-do-not-send="true"style="color:#737F92 
 
!important;padding-right:6px;font-weight:bold;text-decoration:none 
!important;">Ian Link</a></div><div style="display:table-cell;white-space:nowrap;vertical-align:middle;"><font
color="#9FA2A5"><spanstyle="padding-left:6px">Monday, June 17, 2013 9:42 PM</span></font></div></div></div><div
__pbrmquotes="true"class="__pbConvBody" style="color:#888888;margin-left:24px;margin-right:24px;"><span><b
id="docs-internal-guid-6c8260f0-5593-8781-f55c-65b1114a42bf"style="font-weight:normal;"><p dir="ltr"
style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;"><spanstyle="font-size:15px;font-family:'Droid 
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">This
patchcontains a performance improvement for the fast gin cache. As you may know, the performance of the fast gin cache
decreaseswith its size. Currently, the size of the fast gin cache is tied to work_mem. The size of work_mem can often
bequite high. The large size of work_mem is inappropriate for the fast gin cache size. Therefore, we created a separate
cachesize called gin_fast_limit. This global variable controls the size of the fast gin cache, independently of
work_mem.Currently, the default gin_fast_limit is set to 128kB. However, that value could need tweaking. 64kB may work
better,but it's hard to say with only my single machine to test on.</span><br /><span
style="font-size:15px;font-family:'Droid
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></span><p
dir="ltr"style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:15px;font-family:'Droid 
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">On
mymachine, this patch results in a nice speed up. Our test queries improve from about 0.9 ms to 0.030 ms. Please feel
freeto use the test case yourself: it should be attached. I can look into additional test cases (tsvectors) if anyone
isinterested. </span><br /><span style="font-size:15px;font-family:'Droid 
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></span><p
dir="ltr"style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:15px;font-family:'Droid 
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">In
additionto the global limit, we have provided a per-index limit: fast_cache_size. This per-index limit begins at -1,
whichmeans that it is disabled. If the user does not specify a per-index limit, the index will simply use the global
limit.</span><br /><span style="font-size:15px;font-family:'Droid 
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></span><span
style="font-size:15px;font-family:'Droid
 

Sans';color:#000000;background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">I
wouldlike to thank Andrew Gierth for all his help on this patch. As this is my first patch he was extremely helpful.
Theidea for this performance improvement was entirely his. I just did the implementation. Thanks for reading and
consideringthis patch!</span></b><br /><br /><br /> Ian Link<br /></span></div></blockquote> 

pgsql-hackers by date:

Previous
From: Sameer Thakur
Date:
Subject: Re: pg_stat_statements: calls under-estimation propagation
Next
From: Magnus Hagander
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem