Thread: About the performance of startup after dropping many tables
<div class="WordSection1"><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Hello guys,</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal" style="text-indent:10.5pt"><span lang="EN-US">wehave PG 8.3.13 in our system. When running performance cases, we find the startup recovery cost about 3 minutes.It is too long in our system. </span><p class="MsoNormal" style="text-indent:10.5pt"><span lang="EN-US">We diagnosethe problem by adding timestamp. Finally, we find almost all 3 minutes were used by the relation dropping and bufferinvalidation loop in xact_redo_commit. </span><p class="MsoNormal" style="text-indent:10.5pt"><span lang="EN-US">Beforethe problem happens, we drop 40000 tables and reboot linux. So the out loop will run 40000 times . Andwe have 13000 share buffer pages in PG. But in DropRelFileNodeBuffers who is used to drop shared buffer associated tothe specified relation we will have to run through all the shared buffers for each relation to check whether the buffercan be dropped, no matter how many pages the relation has in shared buffer. </span><p class="MsoNormal" style="text-indent:10.5pt"><spanlang="EN-US">In all, we will have 40000 * 13000 LWLock acquire and release. Is this necessary?How about building a hash to record all relfilenode to be dropped, and run through the shared buffers once to checkwhere the buffer's relfilenode is going to be dropped! If we can do this, LWLock traffic will be 13000 , we will havemuch better performance!</span><p class="MsoNormal" style="text-indent:10.5pt"><span lang="EN-US">Does this work? Andis there any risk to do so?</span><p class="MsoNormal" style="text-indent:10.5pt"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">Thanks!</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">Best reguards,</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanstyle="font-family:宋体">甘嘉栋</span><span lang="EN-US">(Gan Jiadong)</span><p class="MsoNormal"><spanlang="EN-US">E-MAIL: <a href="mailto:ganjd@huawei.com">ganjd@huawei.com</a></span><p class="MsoNormal"><spanlang="EN-US">Tel</span><span style="font-family:宋体">:</span><span lang="EN-US">+86-755-289720578</span><pclass="MsoNormal"><span lang="EN-US">*********************************************************************************************************</span><p class="MsoNormal"><spanlang="EN-US">This e-mail and its attachments contain confidential information from HUAWEI, which isintended only for the person or entity whose address is listed above. Any use of the information contained herein in anyway (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other thanthe intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or emailimmediately and delete it!</span><p class="MsoNormal"><span lang="EN-US">********************************************************************************************************* </span><pclass="MsoNormal"><span lang="EN-US"> </span></div>
Gan Jiadong <ganjd@huawei.com> writes: > we have PG 8.3.13 in our system. When running performance cases, we find the > startup recovery cost about 3 minutes. It is too long in our system. Maybe you should rethink the assumption that dropping 40000 tables is a cheap operation. Why do you have that many in the first place, let alone that many that you drop and recreate frequently? Almost certainly, you need a better-conceived schema. regards, tom lane
Hi, Thanks for your reply. Of course, we will think about whether 40000 relations dropping is reasonable. In fact, this happens in a very special scenario . But when we analyzed this issue, we found the PG code canbe rewritten to achieve better performance. Or we can say the arithmetic of this part is not good enough. For example, by doing the refactoring as we done, the startup time can be reduced from 3 minutes to 8 seconds, It is quite a great improvement, especially for the systems with low TTR (time to recovery) requirement. There is any problem or risk to change this part of code as we suggested? Thank you. Best reguards, 甘嘉栋(Gan Jiadong) E-MAIL: ganjd@huawei.com Tel:+86-755-289720578 **************************************************************************** ***************************** This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! **************************************************************************** ***************************** -----邮件原件----- 发件人: Tom Lane [mailto:tgl@sss.pgh.pa.us] 发送时间: 2011年2月18日 11:37 收件人: Gan Jiadong 抄送: pgsql-hackers@postgresql.org; liyuesen@huawei.com; yaoyiyu@huawei.com; liuxingyu@huawei.com; tianwengang@huawei.com 主题: Re: [HACKERS] About the performance of startup after dropping many tables Gan Jiadong <ganjd@huawei.com> writes: > we have PG 8.3.13 in our system. When running performance cases, we find the > startup recovery cost about 3 minutes. It is too long in our system. Maybe you should rethink the assumption that dropping 40000 tables is a cheap operation. Why do you have that many in the first place, let alone that many that you drop and recreate frequently? Almost certainly, you need a better-conceived schema. regards, tom lane
Hi, Thanks for your reply. Of course, we will think about whether 40000 relations dropping is reasonable. In fact, this happens in a very special scenario . But when we analyzed this issue, we found the PG code canbe rewritten to achieve better performance. Or we can say the arithmetic of this part is not good enough. For example, by doing the refactoring as we done, the startup time can be reduced from 3 minutes to 8 seconds, It is quite a great improvement, especially for the systems with low TTR (time to recovery) requirement. There is any problem or risk to change this part of code as we suggested? Thank you. Best reguards, 甘嘉栋(Gan Jiadong) E-MAIL: ganjd@huawei.com Tel:+86-755-289720578 **************************************************************************** ***************************** This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! **************************************************************************** ***************************** -----邮件原件----- 发件人: Tom Lane [mailto:tgl@sss.pgh.pa.us] 发送时间: 2011年2月18日 11:37 收件人: Gan Jiadong 抄送: pgsql-hackers@postgresql.org; liyuesen@huawei.com; yaoyiyu@huawei.com; liuxingyu@huawei.com; tianwengang@huawei.com 主题: Re: [HACKERS] About the performance of startup after dropping many tables Gan Jiadong <ganjd@huawei.com> writes: > we have PG 8.3.13 in our system. When running performance cases, we find the > startup recovery cost about 3 minutes. It is too long in our system. Maybe you should rethink the assumption that dropping 40000 tables is a cheap operation. Why do you have that many in the first place, let alone that many that you drop and recreate frequently? Almost certainly, you need a better-conceived schema. regards, tom lane
On Thu, Feb 17, 2011 at 10:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Gan Jiadong <ganjd@huawei.com> writes: >> we have PG 8.3.13 in our system. When running performance cases, we find the >> startup recovery cost about 3 minutes. It is too long in our system. > > Maybe you should rethink the assumption that dropping 40000 tables is a > cheap operation. Why do you have that many in the first place, let > alone that many that you drop and recreate frequently? Almost > certainly, you need a better-conceived schema. Possibly, but it's not necessarily a bad idea to improve performance for people with crazy schemas. What concerns me a little bit about the proposed scheme, though, is that it's only going to work if all over those tables are dropped by a single transaction. You still need one pass through all of shared_buffers for every transaction that drops one or more relations.Now, I'm not sure, maybe there's no help for that,but ever since commit c2281ac87cf4828b6b828dc8585a10aeb3a176e0 it's been on my mind that loops that iterate through the entire buffer cache are bad for scalability. Conventional wisdom seems to be that performance tops out at, or just before, 8GB, but it's already the case that that's a quite a small fraction of the memory on a large machine, and that's only going to keep getting worse. Admittedly, the existing places where we loop through the whole buffer cache are probably not the primary reason for that limitation, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Possibly, but it's not necessarily a bad idea to improve performance > for people with crazy schemas. It is if it introduces unmaintainable code. I see no way to collapse multiple drop operations into one that's not going to be a Rube Goldberg device. I'm especially unwilling to introduce such a thing into the xlog replay code paths, where it's guaranteed to get little testing. (BTW, it seems like a workaround for the OP is just to CHECKPOINT right after dropping all those tables. Or even reconsider their shutdown procedure.) regards, tom lane
Excerpts from Gan Jiadong's message of vie feb 18 03:42:02 -0300 2011: > Hi, > > Thanks for your reply. > Of course, we will think about whether 40000 relations dropping is > reasonable. In fact, this happens in a very special scenario . > But when we analyzed this issue, we found the PG code can be rewritten to > achieve better performance. Or we can say the arithmetic of this part is not > good enough. > For example, by doing the refactoring as we done, the startup time can be > reduced from 3 minutes to 8 seconds, It is quite a great improvement, > especially for the systems with low TTR (time to recovery) requirement. > > There is any problem or risk to change this part of code as we suggested? The only way to know would be to show the changes. If you were to submit the patch, and assuming we agree on the design and implementation, we could even consider including it (or, more likely, some derivate of it). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support