Thread: Optimize memory allocation code
Hi, hackers! I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() to allocate space, thereby I think we can reduce duplication of code. Best regards! -- Japin Li
Attachment
Hi, On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote: > > Hi, hackers! > > I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() > to allocate space, thereby I think we can reduce duplication of code. The code is duplicated on purpose. There's a comment at the beginning that mentions it: /* duplicates MemoryContextAllocZero to avoid increased overhead */ Same for MemoryContextAllocZero() itself.
> On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote: >> >> Hi, hackers! >> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() >> to allocate space, thereby I think we can reduce duplication of code. > > The code is duplicated on purpose. There's a comment at the beginning > that mentions it: > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > Same for MemoryContextAllocZero() itself. Thanks! How big is this overhead? Is there any way I can test it? Best regards! -- Japin Li
On Fri, Sep 25, 2020 at 7:32 PM Li Japin <japinli@hotmail.com> wrote: > > > > > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Hi, > > > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote: > >> > >> Hi, hackers! > >> > >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() > >> to allocate space, thereby I think we can reduce duplication of code. > > > > The code is duplicated on purpose. There's a comment at the beginning > > that mentions it: > > > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > > > Same for MemoryContextAllocZero() itself. > > Thanks! How big is this overhead? Is there any way I can test it? Profiler. For example, oprofile. In hot areas of the code (memory allocation is very hot), profiling is the first step. merlin
On 2020-Sep-26, Li Japin wrote: > Thanks! How big is this overhead? Is there any way I can test it? You could also have a look at the assembly code that your compiler generates -- particularly examine how it changes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for your advice!On Sep 29, 2020, at 9:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:On 2020-Sep-26, Li Japin wrote:Thanks! How big is this overhead? Is there any way I can test it?
You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.
The origin assembly code for palloc0 is:
0000000000517690 <palloc0>:
517690: 55 push %rbp
517691: 53 push %rbx
517692: 48 89 fb mov %rdi,%rbx
517695: 48 83 ec 08 sub $0x8,%rsp
517699: 48 81 ff ff ff ff 3f cmp $0x3fffffff,%rdi
5176a0: 48 8b 2d d9 0c 48 00 mov 0x480cd9(%rip),%rbp # 998380 <CurrentMemoryContext>
5176a7: 0f 87 d5 00 00 00 ja 517782 <palloc0+0xf2>
5176ad: 48 8b 45 10 mov 0x10(%rbp),%rax
5176b1: 48 89 fe mov %rdi,%rsi
5176b4: c6 45 04 00 movb $0x0,0x4(%rbp)
5176b8: 48 89 ef mov %rbp,%rdi
5176bb: ff 10 callq *(%rax)
5176bd: 48 85 c0 test %rax,%rax
5176c0: 48 89 c1 mov %rax,%rcx
5176c3: 74 5b je 517720 <palloc0+0x90>
5176c5: f6 c3 07 test $0x7,%bl
5176c8: 75 36 jne 517700 <palloc0+0x70>
5176ca: 48 81 fb 00 04 00 00 cmp $0x400,%rbx
5176d1: 77 2d ja 517700 <palloc0+0x70>
5176d3: 48 01 c3 add %rax,%rbx
5176d6: 48 39 d8 cmp %rbx,%rax
5176d9: 73 35 jae 517710 <palloc0+0x80>
5176db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
5176e0: 48 83 c0 08 add $0x8,%rax
5176e4: 48 c7 40 f8 00 00 00 movq $0x0,-0x8(%rax)
5176eb: 00
5176ec: 48 39 c3 cmp %rax,%rbx
5176ef: 77 ef ja 5176e0 <palloc0+0x50>
5176f1: 48 83 c4 08 add $0x8,%rsp
5176f5: 48 89 c8 mov %rcx,%rax
5176f8: 5b pop %rbx
5176f9: 5d pop %rbp
5176fa: c3 retq
5176fb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
517700: 48 89 cf mov %rcx,%rdi
517703: 48 89 da mov %rbx,%rdx
517706: 31 f6 xor %esi,%esi
517708: e8 e3 0e ba ff callq b85f0 <memset@plt>
51770d: 48 89 c1 mov %rax,%rcx
517710: 48 83 c4 08 add $0x8,%rsp
517714: 48 89 c8 mov %rcx,%rax
517717: 5b pop %rbx
517718: 5d pop %rbp
517719: c3 retq
51771a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
517720: 48 8b 3d 51 0c 48 00 mov 0x480c51(%rip),%rdi # 998378 <TopMemoryContext>
517727: be 64 00 00 00 mov $0x64,%esi
51772c: e8 1f f9 ff ff callq 517050 <MemoryContextStatsDetail>
517731: 31 f6 xor %esi,%esi
517733: bf 14 00 00 00 mov $0x14,%edi
517738: e8 53 6d fd ff callq 4ee490 <errstart>
51773d: bf c5 20 00 00 mov $0x20c5,%edi
517742: e8 99 9b fd ff callq 4f12e0 <errcode>
517747: 48 8d 3d 07 54 03 00 lea 0x35407(%rip),%rdi # 54cb55 <__func__.7554+0x45>
51774e: 31 c0 xor %eax,%eax
517750: e8 ab 9d fd ff callq 4f1500 <errmsg>
517755: 48 8b 55 38 mov 0x38(%rbp),%rdx
517759: 48 8d 3d 80 11 16 00 lea 0x161180(%rip),%rdi # 6788e0 <__func__.6248+0x150>
517760: 48 89 de mov %rbx,%rsi
517763: 31 c0 xor %eax,%eax
517765: e8 56 a2 fd ff callq 4f19c0 <errdetail>
51776a: 48 8d 15 ff 11 16 00 lea 0x1611ff(%rip),%rdx # 678970 <__func__.7326>
517771: 48 8d 3d 20 11 16 00 lea 0x161120(%rip),%rdi # 678898 <__func__.6248+0x108>
517778: be eb 03 00 00 mov $0x3eb,%esi
51777d: e8 0e 95 fd ff callq 4f0c90 <errfinish>
517782: 31 f6 xor %esi,%esi
517784: bf 14 00 00 00 mov $0x14,%edi
517789: e8 02 6d fd ff callq 4ee490 <errstart>
51778e: 48 8d 3d db 10 16 00 lea 0x1610db(%rip),%rdi # 678870 <__func__.6248+0xe0>
517795: 48 89 de mov %rbx,%rsi
517798: 31 c0 xor %eax,%eax
51779a: e8 91 98 fd ff callq 4f1030 <errmsg_internal>
51779f: 48 8d 15 ca 11 16 00 lea 0x1611ca(%rip),%rdx # 678970 <__func__.7326>
5177a6: 48 8d 3d eb 10 16 00 lea 0x1610eb(%rip),%rdi # 678898 <__func__.6248+0x108>
5177ad: be df 03 00 00 mov $0x3df,%esi
5177b2: e8 d9 94 fd ff callq 4f0c90 <errfinish>
5177b7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
5177be: 00 00
After modified, the palloc0 assembly code is:
0000000000517690 <palloc0>:
517690: 53 push %rbx
517691: 48 89 fb mov %rdi,%rbx
517694: e8 17 ff ff ff callq 5175b0 <palloc>
517699: f6 c3 07 test $0x7,%bl
51769c: 48 89 c1 mov %rax,%rcx
51769f: 75 2f jne 5176d0 <palloc0+0x40>
5176a1: 48 81 fb 00 04 00 00 cmp $0x400,%rbx
5176a8: 77 26 ja 5176d0 <palloc0+0x40>
5176aa: 48 01 c3 add %rax,%rbx
5176ad: 48 39 d8 cmp %rbx,%rax
5176b0: 73 2e jae 5176e0 <palloc0+0x50>
5176b2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
5176b8: 48 83 c0 08 add $0x8,%rax
5176bc: 48 c7 40 f8 00 00 00 movq $0x0,-0x8(%rax)
5176c3: 00
5176c4: 48 39 c3 cmp %rax,%rbx
5176c7: 77 ef ja 5176b8 <palloc0+0x28>
5176c9: 48 89 c8 mov %rcx,%rax
5176cc: 5b pop %rbx
5176cd: c3 retq
5176ce: 66 90 xchg %ax,%ax
5176d0: 48 89 cf mov %rcx,%rdi
5176d3: 48 89 da mov %rbx,%rdx
5176d6: 31 f6 xor %esi,%esi
5176d8: e8 13 0f ba ff callq b85f0 <memset@plt>
5176dd: 48 89 c1 mov %rax,%rcx
5176e0: 48 89 c8 mov %rcx,%rax
5176e3: 5b pop %rbx
5176e4: c3 retq
5176e5: 90 nop
5176e6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
5176ed: 00 00 00
I know why we need the duplication code in palloc0.
--
Best regrads
Japin Li
On Fri, Sep 25, 2020 at 07:37:07PM -0500, Merlin Moncure wrote: >On Fri, Sep 25, 2020 at 7:32 PM Li Japin <japinli@hotmail.com> wrote: >> >> >> >> > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote: >> > >> > Hi, >> > >> > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote: >> >> >> >> Hi, hackers! >> >> >> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() >> >> to allocate space, thereby I think we can reduce duplication of code. >> > >> > The code is duplicated on purpose. There's a comment at the beginning >> > that mentions it: >> > >> > /* duplicates MemoryContextAllocZero to avoid increased overhead */ >> > >> > Same for MemoryContextAllocZero() itself. >> >> Thanks! How big is this overhead? Is there any way I can test it? > >Profiler. For example, oprofile. In hot areas of the code (memory >allocation is very hot), profiling is the first step. > Maybe a micro-benchmark would be better, e.g. a function with a loop doing many palloc/palloc0 calls, or something similar. FWIW I wonder what kind of overhead is this meant to avoid, the comment unfortunaly does not go into any details. I suppose it's to not do extra function calls, but maybe there's something else going on. And maybe the overhead is much lower on modern CPUs (although this seems to come from 8396447cdbd in 2013, so it's not that old). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services