Thread: Multiple sorts in a query
Just wanted to check some thoughts about how memory allocation works in complex queries. Been thinking some more about recent Solaris testing results that *seemed* to show issues with multiple concurrent queries that have multiple sorts. If we have a query that uses multiple sorts, we may have a top-level sort, with child nodes that contain sorts also. In some cases we may find with sub-nodes that have both inner and outer sub-trees that contain sorts also. If we allocate large chunks of memory we use malloc(). So complex queries can have multiple mallocs, followed by multiple reallocs. That in itself seems likely to end up with roughly double memory use, since realloc won't work properly/quickly with multiple mallocs. (Double since we allocate X bytes, then 2X bytes etc until we hit the limit.) When we later free() the memory, do we always free() it in the reverse order in which it was allocated? If not, how does that effect reducing the sbrk point, or other aspects of reusing allocated memory? Is it possible that Solaris's default malloc isn't appropriate for repeated use in complex queries that use multiple sorts? http://developers.sun.com/solaris/articles/multiproc/multiproc.html and recent OpenSolaris bug reports. Anyway, feel free to jump in. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, May 19, 2009 at 12:32:13PM +0100, Simon Riggs wrote: > If we allocate large chunks of memory we use malloc(). So complex > queries can have multiple mallocs, followed by multiple reallocs. That > in itself seems likely to end up with roughly double memory use, since > realloc won't work properly/quickly with multiple mallocs. (Double since > we allocate X bytes, then 2X bytes etc until we hit the limit.) I don't know about Solaris, but glibc has a threshold above which it starts using mmap() instead of sbrk(). Thus, once you start using very large blocks, freeing always returns the memory to the kernel, irrespective of other allocations. The threshold is dynamic apparently, but starts at 128KB. Just a thought, Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
On Tue, 2009-05-19 at 09:17 -0400, Merlin Moncure wrote: > On Tue, May 19, 2009 at 7:44 AM, Martijn van Oosterhout > > > > The threshold is dynamic apparently, but starts at 128KB. > > I just read an article that suggests assuming that can be dangerous > (by one of the authors of jemalloc)...an interesting read. > http://www.canonware.com/~ttt/2009/05/mr-malloc-gets-schooled.html Thanks both, interesting read. Hmmm... -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, May 19, 2009 at 12:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > If we have a query that uses multiple sorts, we may have a top-level > sort, with child nodes that contain sorts also. In some cases we may > find with sub-nodes that have both inner and outer sub-trees that > contain sorts also. Well a top-level sort and a child sort wouldn't both be accumulating rows at the same time. The child could still be alive behaving like a tuplestore though. > If we allocate large chunks of memory we use malloc(). So complex > queries can have multiple mallocs, followed by multiple reallocs. That > in itself seems likely to end up with roughly double memory use, since > realloc won't work properly/quickly with multiple mallocs. (Double since > we allocate X bytes, then 2X bytes etc until we hit the limit.) I think it's even worse than that since the old and new allocation have to briefly coexist. So at least transiently we use 3x the size of the actual array. > When we later free() the memory, do we always free() it in the reverse > order in which it was allocated? If not, how does that effect reducing > the sbrk point, or other aspects of reusing allocated memory? > > Is it possible that Solaris's default malloc isn't appropriate for > repeated use in complex queries that use multiple sorts? Well anything's possible. Do you have any specific ideas? I would expect any decent malloc library to shrink sbrk based on statically analyzing where its allocations actually are, so I wouldn't expect the pattern of frees to matter on that front. It might still fragment memory if we allocate a bunch of large tuplestore/tuplesorts and then allocate one object in a longer lived memory context. What problems have you seen? -- greg
On Tue, May 19, 2009 at 7:44 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Tue, May 19, 2009 at 12:32:13PM +0100, Simon Riggs wrote: >> If we allocate large chunks of memory we use malloc(). So complex >> queries can have multiple mallocs, followed by multiple reallocs. That >> in itself seems likely to end up with roughly double memory use, since >> realloc won't work properly/quickly with multiple mallocs. (Double since >> we allocate X bytes, then 2X bytes etc until we hit the limit.) > > I don't know about Solaris, but glibc has a threshold above which it > starts using mmap() instead of sbrk(). Thus, once you start using very > large blocks, freeing always returns the memory to the kernel, > irrespective of other allocations. > > The threshold is dynamic apparently, but starts at 128KB. I just read an article that suggests assuming that can be dangerous (by one of the authors of jemalloc)...an interesting read. "Update in 2006: The above was written in 2001. Since then the world has changed a lot. Memory got bigger. Applications got bigger. The virtual address space layout in 32 bit linux changed. In the new situation, brk() and mmap space is shared and there are no artificial limits on brk size imposed by the kernel. What is more, applications have started using transient allocations larger than the 128Kb as was imagined in 2001." http://www.canonware.com/~ttt/2009/05/mr-malloc-gets-schooled.html merlin
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Simon Riggs > Sent: Tuesday, May 19, 2009 4:32 AM > To: pgsql-hackers > Subject: [HACKERS] Multiple sorts in a query > > > Just wanted to check some thoughts about how memory allocation works in > complex queries. Been thinking some more about recent Solaris testing > results that *seemed* to show issues with multiple concurrent queries > that have multiple sorts. > > If we have a query that uses multiple sorts, we may have a top-level > sort, with child nodes that contain sorts also. In some cases we may > find with sub-nodes that have both inner and outer sub-trees that > contain sorts also. > > If we allocate large chunks of memory we use malloc(). So complex > queries can have multiple mallocs, followed by multiple reallocs. That > in itself seems likely to end up with roughly double memory use, since > realloc won't work properly/quickly with multiple mallocs. (Double > since > we allocate X bytes, then 2X bytes etc until we hit the limit.) > > When we later free() the memory, do we always free() it in the reverse > order in which it was allocated? If not, how does that effect reducing > the sbrk point, or other aspects of reusing allocated memory? > > Is it possible that Solaris's default malloc isn't appropriate for > repeated use in complex queries that use multiple sorts? > http://developers.sun.com/solaris/articles/multiproc/multiproc.html > and recent OpenSolaris bug reports. Solaris default malloc always uses sbrk(), and never ever tried to reduce the sbrk point. If you want a malloc that uses mmap, there is an non-default malloc that does that (libumem or something?)
On Tue, 2009-05-19 at 09:33 -0700, Chuck McDevitt wrote: > > Is it possible that Solaris's default malloc isn't appropriate for > > repeated use in complex queries that use multiple sorts? > > http://developers.sun.com/solaris/articles/multiproc/multiproc.html > > and recent OpenSolaris bug reports. > > Solaris default malloc always uses sbrk(), and never ever tried to > reduce the sbrk point. > > If you want a malloc that uses mmap, there is an non-default malloc > that does that (libumem or something?) OK, thanks Chuck. Doesn't sound good. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 2009-05-19 at 13:52 +0100, Greg Stark wrote: > So at least transiently we use 3x the size of the actual array. I was conjecturing, prior to investigation. Are you saying you know this/have seen this already? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Well I'm just saying if you realloc a x kilobyte block into a 2x block and the allocator can't expand it and has to copy then it seems inevitable. -- Greg On 19 May 2009, at 14:11, Simon Riggs <simon@2ndQuadrant.com> wrote: > > On Tue, 2009-05-19 at 13:52 +0100, Greg Stark wrote: > >> So at least transiently we use 3x the size of the actual array. > > I was conjecturing, prior to investigation. Are you saying you know > this/have seen this already? > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Training, Services and Support >
Chuck McDevitt píše v út 19. 05. 2009 v 09:33 -0700: > > Solaris default malloc always uses sbrk(), and never ever tried to reduce the sbrk point. > > If you want a malloc that uses mmap, there is an non-default malloc that does that (libumem or something?) > There are severals memory allocator on Solaris. You can choose what you need. See mapalloc (it uses mmap insted of srbk) mtmalloc (optimized fro multi threaded apps) mumem_alloc watchmalloc bsdmalloc maybe more. What I heart is that standard malloc is not good, but it is still here for compatibility reason with old application which depends on some functionality. Zdenek
On Tue, 2009-05-19 at 22:19 +0200, Zdenek Kotala wrote: > Chuck McDevitt píše v út 19. 05. 2009 v 09:33 -0700: > > Solaris default malloc always uses sbrk(), and never ever tried to reduce the sbrk point. > > > > If you want a malloc that uses mmap, there is an non-default malloc that does that (libumem or something?) > > There are severals memory allocator on Solaris. You can choose what you > need. See > > mapalloc (it uses mmap insted of srbk) > mtmalloc (optimized fro multi threaded apps) > mumem_alloc > watchmalloc > bsdmalloc > > maybe more. > > What I heart is that standard malloc is not good, but it is still here > for compatibility reason with old application which depends on some > functionality. Which one is used in the default PostgreSQL build for Solaris? If you use default malloc, have you tested the others and would you recommend one in particular? Which one has Dimitri used in his performance testing? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 2009-05-19 at 16:49 -0400, Greg Stark wrote: > Well I'm just saying if you realloc a x kilobyte block into a 2x block > and the allocator can't expand it and has to copy then it seems > inevitable. OK, understood. So there is grounds at least for an investigation into how that works and whether it is as inefficient as we think it might be. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On 05/20/2009 10:14 AM, Simon Riggs wrote: > On Tue, 2009-05-19 at 22:19 +0200, Zdenek Kotala wrote: >> Chuck McDevitt píše v út 19. 05. 2009 v 09:33 -0700: >> What I heart is that standard malloc is not good, but it is still here >> for compatibility reason with old application which depends on some >> functionality. > > Which one is used in the default PostgreSQL build for Solaris? If you > use default malloc, have you tested the others and would you recommend > one in particular? You don't even need to recompile it most of the time (unless statically compiled or similar things). LD_PRELOAD'ing another malloc library should normally be enough. Andres
Simon Riggs píše v st 20. 05. 2009 v 09:14 +0100: > > > > What I heart is that standard malloc is not good, but it is still here > > for compatibility reason with old application which depends on some > > functionality. > > Which one is used in the default PostgreSQL build for Solaris? If you > use default malloc, have you tested the others and would you recommend > one in particular? We use default one. I did not tested difference between them, but IIRC that Jignesh did some testing with umem. I will ask him. However if you give me test scenario I can test it. Zdenek
On Wed, 2009-05-20 at 23:01 -0400, Zdenek Kotala wrote: > We use default one. I did not tested difference between them, but IIRC > that Jignesh did some testing with umem. I will ask him. However if you > give me test scenario I can test it. Talk with Dimitri from Sun who is doing scalability benchmarks on pgsql-perform list now. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support