Thread: Optimize memory allocation code

Optimize memory allocation code

From
Li Japin
Date:
Hi, hackers!

I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
to allocate space, thereby I think we can reduce  duplication of code.

Best regards!

--
Japin Li


Attachment

Re: Optimize memory allocation code

From
Julien Rouhaud
Date:
Hi,

On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
>
> Hi, hackers!
>
> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
> to allocate space, thereby I think we can reduce  duplication of code.

The code is duplicated on purpose.  There's a comment at the beginning
that mentions it:

  /* duplicates MemoryContextAllocZero to avoid increased overhead */

Same for MemoryContextAllocZero() itself.



Re: Optimize memory allocation code

From
Li Japin
Date:

> On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> 
> Hi,
> 
> On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
>> 
>> Hi, hackers!
>> 
>> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
>> to allocate space, thereby I think we can reduce  duplication of code.
> 
> The code is duplicated on purpose.  There's a comment at the beginning
> that mentions it:
> 
>  /* duplicates MemoryContextAllocZero to avoid increased overhead */
> 
> Same for MemoryContextAllocZero() itself.

Thanks! How big is this overhead? Is there any way I can test it?

Best regards!

--
Japin Li

Re: Optimize memory allocation code

From
Merlin Moncure
Date:
On Fri, Sep 25, 2020 at 7:32 PM Li Japin <japinli@hotmail.com> wrote:
>
>
>
> > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
> >>
> >> Hi, hackers!
> >>
> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
> >> to allocate space, thereby I think we can reduce  duplication of code.
> >
> > The code is duplicated on purpose.  There's a comment at the beginning
> > that mentions it:
> >
> >  /* duplicates MemoryContextAllocZero to avoid increased overhead */
> >
> > Same for MemoryContextAllocZero() itself.
>
> Thanks! How big is this overhead? Is there any way I can test it?

Profiler.  For example, oprofile. In hot areas of the code (memory
allocation is very hot), profiling is the first step.

merlin



Re: Optimize memory allocation code

From
Alvaro Herrera
Date:
On 2020-Sep-26, Li Japin wrote:

> Thanks! How big is this overhead? Is there any way I can test it?

You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Optimize memory allocation code

From
Li Japin
Date:


On Sep 29, 2020, at 9:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Sep-26, Li Japin wrote:

Thanks! How big is this overhead? Is there any way I can test it?

You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.

Thanks for your advice!

The origin assembly code for palloc0 is:

0000000000517690 <palloc0>:
  517690: 55                   push   %rbp
  517691: 53                   push   %rbx
  517692: 48 89 fb             mov    %rdi,%rbx
  517695: 48 83 ec 08           sub    $0x8,%rsp
  517699: 48 81 ff ff ff ff 3f cmp    $0x3fffffff,%rdi
  5176a0: 48 8b 2d d9 0c 48 00 mov    0x480cd9(%rip),%rbp        # 998380 <CurrentMemoryContext>
  5176a7: 0f 87 d5 00 00 00     ja     517782 <palloc0+0xf2>
  5176ad: 48 8b 45 10           mov    0x10(%rbp),%rax
  5176b1: 48 89 fe             mov    %rdi,%rsi
  5176b4: c6 45 04 00           movb   $0x0,0x4(%rbp)
  5176b8: 48 89 ef             mov    %rbp,%rdi
  5176bb: ff 10                 callq  *(%rax)
  5176bd: 48 85 c0             test   %rax,%rax
  5176c0: 48 89 c1             mov    %rax,%rcx
  5176c3: 74 5b                 je     517720 <palloc0+0x90>
  5176c5: f6 c3 07             test   $0x7,%bl
  5176c8: 75 36                 jne    517700 <palloc0+0x70>
  5176ca: 48 81 fb 00 04 00 00 cmp    $0x400,%rbx
  5176d1: 77 2d                 ja     517700 <palloc0+0x70>
  5176d3: 48 01 c3             add    %rax,%rbx
  5176d6: 48 39 d8             cmp    %rbx,%rax
  5176d9: 73 35                 jae    517710 <palloc0+0x80>
  5176db: 0f 1f 44 00 00       nopl   0x0(%rax,%rax,1)
  5176e0: 48 83 c0 08           add    $0x8,%rax
  5176e4: 48 c7 40 f8 00 00 00 movq   $0x0,-0x8(%rax)
  5176eb: 00 
  5176ec: 48 39 c3             cmp    %rax,%rbx
  5176ef: 77 ef                 ja     5176e0 <palloc0+0x50>
  5176f1: 48 83 c4 08           add    $0x8,%rsp
  5176f5: 48 89 c8             mov    %rcx,%rax
  5176f8: 5b                   pop    %rbx
  5176f9: 5d                   pop    %rbp
  5176fa: c3                   retq   
  5176fb: 0f 1f 44 00 00       nopl   0x0(%rax,%rax,1)
  517700: 48 89 cf             mov    %rcx,%rdi
  517703: 48 89 da             mov    %rbx,%rdx
  517706: 31 f6                 xor    %esi,%esi
  517708: e8 e3 0e ba ff       callq  b85f0 <memset@plt>
  51770d: 48 89 c1             mov    %rax,%rcx
  517710: 48 83 c4 08           add    $0x8,%rsp
  517714: 48 89 c8             mov    %rcx,%rax
  517717: 5b                   pop    %rbx
  517718: 5d                   pop    %rbp
  517719: c3                   retq   
  51771a: 66 0f 1f 44 00 00     nopw   0x0(%rax,%rax,1)
  517720: 48 8b 3d 51 0c 48 00 mov    0x480c51(%rip),%rdi        # 998378 <TopMemoryContext>
  517727: be 64 00 00 00       mov    $0x64,%esi
  51772c: e8 1f f9 ff ff       callq  517050 <MemoryContextStatsDetail>
  517731: 31 f6                 xor    %esi,%esi
  517733: bf 14 00 00 00       mov    $0x14,%edi
  517738: e8 53 6d fd ff       callq  4ee490 <errstart>
  51773d: bf c5 20 00 00       mov    $0x20c5,%edi
  517742: e8 99 9b fd ff       callq  4f12e0 <errcode>
  517747: 48 8d 3d 07 54 03 00 lea    0x35407(%rip),%rdi        # 54cb55 <__func__.7554+0x45>
  51774e: 31 c0                 xor    %eax,%eax
  517750: e8 ab 9d fd ff       callq  4f1500 <errmsg>
  517755: 48 8b 55 38           mov    0x38(%rbp),%rdx
  517759: 48 8d 3d 80 11 16 00 lea    0x161180(%rip),%rdi        # 6788e0 <__func__.6248+0x150>
  517760: 48 89 de             mov    %rbx,%rsi
  517763: 31 c0                 xor    %eax,%eax
  517765: e8 56 a2 fd ff       callq  4f19c0 <errdetail>
  51776a: 48 8d 15 ff 11 16 00 lea    0x1611ff(%rip),%rdx        # 678970 <__func__.7326>
  517771: 48 8d 3d 20 11 16 00 lea    0x161120(%rip),%rdi        # 678898 <__func__.6248+0x108>
  517778: be eb 03 00 00       mov    $0x3eb,%esi
  51777d: e8 0e 95 fd ff       callq  4f0c90 <errfinish>
  517782: 31 f6                 xor    %esi,%esi
  517784: bf 14 00 00 00       mov    $0x14,%edi
  517789: e8 02 6d fd ff       callq  4ee490 <errstart>
  51778e: 48 8d 3d db 10 16 00 lea    0x1610db(%rip),%rdi        # 678870 <__func__.6248+0xe0>
  517795: 48 89 de             mov    %rbx,%rsi
  517798: 31 c0                 xor    %eax,%eax
  51779a: e8 91 98 fd ff       callq  4f1030 <errmsg_internal>
  51779f: 48 8d 15 ca 11 16 00 lea    0x1611ca(%rip),%rdx        # 678970 <__func__.7326>
  5177a6: 48 8d 3d eb 10 16 00 lea    0x1610eb(%rip),%rdi        # 678898 <__func__.6248+0x108>
  5177ad: be df 03 00 00       mov    $0x3df,%esi
  5177b2: e8 d9 94 fd ff       callq  4f0c90 <errfinish>
  5177b7: 66 0f 1f 84 00 00 00 nopw   0x0(%rax,%rax,1)
  5177be: 00 00 

After modified, the palloc0 assembly code is:

0000000000517690 <palloc0>:
  517690: 53                   push   %rbx
  517691: 48 89 fb             mov    %rdi,%rbx
  517694: e8 17 ff ff ff       callq  5175b0 <palloc>
  517699: f6 c3 07             test   $0x7,%bl
  51769c: 48 89 c1             mov    %rax,%rcx
  51769f: 75 2f                 jne    5176d0 <palloc0+0x40>
  5176a1: 48 81 fb 00 04 00 00 cmp    $0x400,%rbx
  5176a8: 77 26                 ja     5176d0 <palloc0+0x40>
  5176aa: 48 01 c3             add    %rax,%rbx
  5176ad: 48 39 d8             cmp    %rbx,%rax
  5176b0: 73 2e                 jae    5176e0 <palloc0+0x50>
  5176b2: 66 0f 1f 44 00 00     nopw   0x0(%rax,%rax,1)
  5176b8: 48 83 c0 08           add    $0x8,%rax
  5176bc: 48 c7 40 f8 00 00 00 movq   $0x0,-0x8(%rax)
  5176c3: 00 
  5176c4: 48 39 c3             cmp    %rax,%rbx
  5176c7: 77 ef                 ja     5176b8 <palloc0+0x28>
  5176c9: 48 89 c8             mov    %rcx,%rax
  5176cc: 5b                   pop    %rbx
  5176cd: c3                   retq   
  5176ce: 66 90                 xchg   %ax,%ax
  5176d0: 48 89 cf             mov    %rcx,%rdi
  5176d3: 48 89 da             mov    %rbx,%rdx
  5176d6: 31 f6                 xor    %esi,%esi
  5176d8: e8 13 0f ba ff       callq  b85f0 <memset@plt>
  5176dd: 48 89 c1             mov    %rax,%rcx
  5176e0: 48 89 c8             mov    %rcx,%rax
  5176e3: 5b                   pop    %rbx
  5176e4: c3                   retq   
  5176e5: 90                   nop
  5176e6: 66 2e 0f 1f 84 00 00 nopw   %cs:0x0(%rax,%rax,1)
  5176ed: 00 00 00 

I know why we need the duplication code in palloc0.

--
Best regrads
Japin Li

Re: Optimize memory allocation code

From
Tomas Vondra
Date:
On Fri, Sep 25, 2020 at 07:37:07PM -0500, Merlin Moncure wrote:
>On Fri, Sep 25, 2020 at 7:32 PM Li Japin <japinli@hotmail.com> wrote:
>>
>>
>>
>> > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
>> >>
>> >> Hi, hackers!
>> >>
>> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
>> >> to allocate space, thereby I think we can reduce  duplication of code.
>> >
>> > The code is duplicated on purpose.  There's a comment at the beginning
>> > that mentions it:
>> >
>> >  /* duplicates MemoryContextAllocZero to avoid increased overhead */
>> >
>> > Same for MemoryContextAllocZero() itself.
>>
>> Thanks! How big is this overhead? Is there any way I can test it?
>
>Profiler.  For example, oprofile. In hot areas of the code (memory
>allocation is very hot), profiling is the first step.
>

Maybe a micro-benchmark would be better, e.g. a function with a loop
doing many palloc/palloc0 calls, or something similar.

FWIW I wonder what kind of overhead is this meant to avoid, the comment
unfortunaly does not go into any details. I suppose it's to not do extra
function calls, but maybe there's something else going on. And maybe the
overhead is much lower on modern CPUs (although this seems to come from
8396447cdbd in 2013, so it's not that old).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services