Thread: Optimize memory allocation code

Optimize memory allocation code

From

Li Japin

Date:

25 September 2020, 16:14:44

Hi, hackers!

I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
to allocate space, thereby I think we can reduce  duplication of code.

Best regards!

--
Japin Li

Attachment

0001-Optimize-memory-allocation-code.patch

Re: Optimize memory allocation code

From

Julien Rouhaud

Date:

26 September 2020, 00:09:31

Hi,

On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
>
> Hi, hackers!
>
> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
> to allocate space, thereby I think we can reduce  duplication of code.

The code is duplicated on purpose.  There's a comment at the beginning
that mentions it:

  /* duplicates MemoryContextAllocZero to avoid increased overhead */

Same for MemoryContextAllocZero() itself.

Re: Optimize memory allocation code

From

Li Japin

Date:

26 September 2020, 00:32:28


> On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> 
> Hi,
> 
> On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
>> 
>> Hi, hackers!
>> 
>> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
>> to allocate space, thereby I think we can reduce  duplication of code.
> 
> The code is duplicated on purpose.  There's a comment at the beginning
> that mentions it:
> 
>  /* duplicates MemoryContextAllocZero to avoid increased overhead */
> 
> Same for MemoryContextAllocZero() itself.

Thanks! How big is this overhead? Is there any way I can test it？

Best regards!

--
Japin Li

Re: Optimize memory allocation code

From

Merlin Moncure

Date:

26 September 2020, 00:37:07

On Fri, Sep 25, 2020 at 7:32 PM Li Japin <japinli@hotmail.com> wrote:
>
>
>
> > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
> >>
> >> Hi, hackers!
> >>
> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
> >> to allocate space, thereby I think we can reduce  duplication of code.
> >
> > The code is duplicated on purpose.  There's a comment at the beginning
> > that mentions it:
> >
> >  /* duplicates MemoryContextAllocZero to avoid increased overhead */
> >
> > Same for MemoryContextAllocZero() itself.
>
> Thanks! How big is this overhead? Is there any way I can test it？

Profiler.  For example, oprofile. In hot areas of the code (memory
allocation is very hot), profiling is the first step.

merlin

Re: Optimize memory allocation code

From

Alvaro Herrera

Date:

29 September 2020, 13:30:33

On 2020-Sep-26, Li Japin wrote:

> Thanks! How big is this overhead? Is there any way I can test it？

You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Optimize memory allocation code

From

Li Japin

Date:

30 September 2020, 03:42:48

On Sep 29, 2020, at 9:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Sep-26, Li Japin wrote:

Thanks! How big is this overhead? Is there any way I can test it？

You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.

Thanks for your advice!

The origin assembly code for palloc0 is:

0000000000517690 <palloc0>:

517690: 55 push %rbp

517691: 53 push %rbx

517692: 48 89 fb mov %rdi,%rbx

517695: 48 83 ec 08 sub $0x8,%rsp

517699: 48 81 ff ff ff ff 3f cmp $0x3fffffff,%rdi

5176a0: 48 8b 2d d9 0c 48 00 mov 0x480cd9(%rip),%rbp # 998380 <CurrentMemoryContext>

5176a7: 0f 87 d5 00 00 00 ja 517782 <palloc0+0xf2>

5176ad: 48 8b 45 10 mov 0x10(%rbp),%rax

5176b1: 48 89 fe mov %rdi,%rsi

5176b4: c6 45 04 00 movb $0x0,0x4(%rbp)

5176b8: 48 89 ef mov %rbp,%rdi

5176bb: ff 10 callq *(%rax)

5176bd: 48 85 c0 test %rax,%rax

5176c0: 48 89 c1 mov %rax,%rcx

5176c3: 74 5b je 517720 <palloc0+0x90>

5176c5: f6 c3 07 test $0x7,%bl

5176c8: 75 36 jne 517700 <palloc0+0x70>

5176ca: 48 81 fb 00 04 00 00 cmp $0x400,%rbx

5176d1: 77 2d ja 517700 <palloc0+0x70>

5176d3: 48 01 c3 add %rax,%rbx

5176d6: 48 39 d8 cmp %rbx,%rax

5176d9: 73 35 jae 517710 <palloc0+0x80>

5176db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

5176e0: 48 83 c0 08 add $0x8,%rax

5176e4: 48 c7 40 f8 00 00 00 movq $0x0,-0x8(%rax)

5176eb: 00

5176ec: 48 39 c3 cmp %rax,%rbx

5176ef: 77 ef ja 5176e0 <palloc0+0x50>

5176f1: 48 83 c4 08 add $0x8,%rsp

5176f5: 48 89 c8 mov %rcx,%rax

5176f8: 5b pop %rbx

5176f9: 5d pop %rbp

5176fa: c3 retq

5176fb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

517700: 48 89 cf mov %rcx,%rdi

517703: 48 89 da mov %rbx,%rdx

517706: 31 f6 xor %esi,%esi

517708: e8 e3 0e ba ff callq b85f0 <memset@plt>

51770d: 48 89 c1 mov %rax,%rcx

517710: 48 83 c4 08 add $0x8,%rsp

517714: 48 89 c8 mov %rcx,%rax

517717: 5b pop %rbx

517718: 5d pop %rbp

517719: c3 retq

51771a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

517720: 48 8b 3d 51 0c 48 00 mov 0x480c51(%rip),%rdi # 998378 <TopMemoryContext>

517727: be 64 00 00 00 mov $0x64,%esi

51772c: e8 1f f9 ff ff callq 517050 <MemoryContextStatsDetail>

517731: 31 f6 xor %esi,%esi

517733: bf 14 00 00 00 mov $0x14,%edi

517738: e8 53 6d fd ff callq 4ee490 <errstart>

51773d: bf c5 20 00 00 mov $0x20c5,%edi

517742: e8 99 9b fd ff callq 4f12e0 <errcode>

517747: 48 8d 3d 07 54 03 00 lea 0x35407(%rip),%rdi # 54cb55 <__func__.7554+0x45>

51774e: 31 c0 xor %eax,%eax

517750: e8 ab 9d fd ff callq 4f1500 <errmsg>

517755: 48 8b 55 38 mov 0x38(%rbp),%rdx

517759: 48 8d 3d 80 11 16 00 lea 0x161180(%rip),%rdi # 6788e0 <__func__.6248+0x150>

517760: 48 89 de mov %rbx,%rsi

517763: 31 c0 xor %eax,%eax

517765: e8 56 a2 fd ff callq 4f19c0 <errdetail>

51776a: 48 8d 15 ff 11 16 00 lea 0x1611ff(%rip),%rdx # 678970 <__func__.7326>

517771: 48 8d 3d 20 11 16 00 lea 0x161120(%rip),%rdi # 678898 <__func__.6248+0x108>

517778: be eb 03 00 00 mov $0x3eb,%esi

51777d: e8 0e 95 fd ff callq 4f0c90 <errfinish>

517782: 31 f6 xor %esi,%esi

517784: bf 14 00 00 00 mov $0x14,%edi

517789: e8 02 6d fd ff callq 4ee490 <errstart>

51778e: 48 8d 3d db 10 16 00 lea 0x1610db(%rip),%rdi # 678870 <__func__.6248+0xe0>

517795: 48 89 de mov %rbx,%rsi

517798: 31 c0 xor %eax,%eax

51779a: e8 91 98 fd ff callq 4f1030 <errmsg_internal>

51779f: 48 8d 15 ca 11 16 00 lea 0x1611ca(%rip),%rdx # 678970 <__func__.7326>

5177a6: 48 8d 3d eb 10 16 00 lea 0x1610eb(%rip),%rdi # 678898 <__func__.6248+0x108>

5177ad: be df 03 00 00 mov $0x3df,%esi

5177b2: e8 d9 94 fd ff callq 4f0c90 <errfinish>

5177b7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)

5177be: 00 00

After modified, the palloc0 assembly code is:

0000000000517690 <palloc0>:

517690: 53 push %rbx

517691: 48 89 fb mov %rdi,%rbx

517694: e8 17 ff ff ff callq 5175b0 <palloc>

517699: f6 c3 07 test $0x7,%bl

51769c: 48 89 c1 mov %rax,%rcx

51769f: 75 2f jne 5176d0 <palloc0+0x40>

5176a1: 48 81 fb 00 04 00 00 cmp $0x400,%rbx

5176a8: 77 26 ja 5176d0 <palloc0+0x40>

5176aa: 48 01 c3 add %rax,%rbx

5176ad: 48 39 d8 cmp %rbx,%rax

5176b0: 73 2e jae 5176e0 <palloc0+0x50>

5176b2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

5176b8: 48 83 c0 08 add $0x8,%rax

5176bc: 48 c7 40 f8 00 00 00 movq $0x0,-0x8(%rax)

5176c3: 00

5176c4: 48 39 c3 cmp %rax,%rbx

5176c7: 77 ef ja 5176b8 <palloc0+0x28>

5176c9: 48 89 c8 mov %rcx,%rax

5176cc: 5b pop %rbx

5176cd: c3 retq

5176ce: 66 90 xchg %ax,%ax

5176d0: 48 89 cf mov %rcx,%rdi

5176d3: 48 89 da mov %rbx,%rdx

5176d6: 31 f6 xor %esi,%esi

5176d8: e8 13 0f ba ff callq b85f0 <memset@plt>

5176dd: 48 89 c1 mov %rax,%rcx

5176e0: 48 89 c8 mov %rcx,%rax

5176e3: 5b pop %rbx

5176e4: c3 retq

5176e5: 90 nop

5176e6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)

5176ed: 00 00 00

I know why we need the duplication code in palloc0.

Best regrads

Japin Li

Re: Optimize memory allocation code

From

Tomas Vondra

Date:

03 October 2020, 20:57:03

On Fri, Sep 25, 2020 at 07:37:07PM -0500, Merlin Moncure wrote:
>On Fri, Sep 25, 2020 at 7:32 PM Li Japin <japinli@hotmail.com> wrote:
>>
>>
>>
>> > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > On Sat, Sep 26, 2020 at 12:14 AM Li Japin <japinli@hotmail.com> wrote:
>> >>
>> >> Hi, hackers!
>> >>
>> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0()
>> >> to allocate space, thereby I think we can reduce  duplication of code.
>> >
>> > The code is duplicated on purpose.  There's a comment at the beginning
>> > that mentions it:
>> >
>> >  /* duplicates MemoryContextAllocZero to avoid increased overhead */
>> >
>> > Same for MemoryContextAllocZero() itself.
>>
>> Thanks! How big is this overhead? Is there any way I can test it？
>
>Profiler.  For example, oprofile. In hot areas of the code (memory
>allocation is very hot), profiling is the first step.
>

Maybe a micro-benchmark would be better, e.g. a function with a loop
doing many palloc/palloc0 calls, or something similar.

FWIW I wonder what kind of overhead is this meant to avoid, the comment
unfortunaly does not go into any details. I suppose it's to not do extra
function calls, but maybe there's something else going on. And maybe the
overhead is much lower on modern CPUs (although this seems to come from
8396447cdbd in 2013, so it's not that old).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services