Thread: [PATCH] dtrace probes for memory manager
Attached patch contains new dtrace probes for memory management. Main purpose is to analyze memory footprint - for example how many memory needs transaction, peak memory per context, when memory block is reused or when it is allocate by malloc and so on. There are three groups of probes: 1) general memory context operation: mcxt-alloc mcxt-create mcxt-delete mcxt-free mcxt-realloc mcxt-reset 2) AllocSet operations (called from mcxt) aset-alloc aset-delete aset-free aset-realloc aset-reset 3) AllocSet Block operations. aset-block-free aset-block-new aset-block-realloc aset-block-reset Zdenek
Attachment
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Attached patch contains new dtrace probes for memory management. This is a bad idea and I want to reject it outright. No ordinary user is really going to care about those details, and palloc is a sufficiently hot hot-spot that even the allegedly negligible overhead of an inactive dtrace probe is going to cost us. If this goes in, I will disable dtrace support in Red Hat's builds, and I rather imagine that other packagers will react similarly. regards, tom lane
On Fri, 2009-11-13 at 16:06 -0500, Tom Lane wrote: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > Attached patch contains new dtrace probes for memory management. > > This is a bad idea and I want to reject it outright. No ordinary user > is really going to care about those details, and palloc is a > sufficiently hot hot-spot that even the allegedly negligible overhead > of an inactive dtrace probe is going to cost us. No ordinary user is going to use dtrace at all. > > If this goes in, I will disable dtrace support in Red Hat's builds, > and I rather imagine that other packagers will react similarly. Is it possible to have a set of probes that would only be enabled with say, --enable-debug compile time option? I could certainly see the benefit to these probes for profiling but that is such as specific use that it seems to need a specific flag anyway. Sincerely, Joshua D. Drake > > regards, tom lane > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
"Joshua D. Drake" <jd@commandprompt.com> writes: > On Fri, 2009-11-13 at 16:06 -0500, Tom Lane wrote: >> This is a bad idea and I want to reject it outright. No ordinary user >> is really going to care about those details, and palloc is a >> sufficiently hot hot-spot that even the allegedly negligible overhead >> of an inactive dtrace probe is going to cost us. > No ordinary user is going to use dtrace at all. Right, but *those probes are going to cost him performance anyway* if he's using a dtrace-enabled build. Probes associated with I/O calls might be negligible, probes in palloc are not going to be. regards, tom lane
"Joshua D. Drake" <jd@commandprompt.com> wrote: > Is it possible to have a set of probes that would only be enabled > with say, --enable-debug compile time option? But we routinely build with that for normal production use so that if we get a core dump or need to backtrace a problem process, we will get meaningful results. Please don't create a performance penalty for compiling with debug info. (We'd probably wind up disabling it....) -Kevin
On Fri, 2009-11-13 at 15:22 -0600, Kevin Grittner wrote: > "Joshua D. Drake" <jd@commandprompt.com> wrote: > > > Is it possible to have a set of probes that would only be enabled > > with say, --enable-debug compile time option? > > But we routinely build with that for normal production use so that if > we get a core dump or need to backtrace a problem process, we will get > meaningful results. Please don't create a performance penalty for > compiling with debug info. (We'd probably wind up disabling it....) Good point but I still say that the probes are useful. I am not sure we have a good compile time flag for it right now but it seems that would be the way to do it. Joshua D. Drake > > -Kevin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
Zdenek Kotala wrote: > Attached patch contains new dtrace probes for memory management. Main > purpose is to analyze memory footprint - for example how many memory > needs transaction, peak memory per context, when memory block is reused > or when it is allocate by malloc and so on. Having had to instrument these to figure out some problems, I'd give this patch a +1. However, the performance argument is compelling. As a compromise, maybe we could have a #define that needs to be turned on at compile time to enable these probes; so a regular dtrace-enabled build would not have them, but if you really needed to analyze memory allocations, you could recompile to turn them on. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane píše v pá 13. 11. 2009 v 16:06 -0500: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > Attached patch contains new dtrace probes for memory management. > > This is a bad idea and I want to reject it outright. No ordinary user > is really going to care about those details, and palloc is a > sufficiently hot hot-spot that even the allegedly negligible overhead > of an inactive dtrace probe is going to cost us. I don't think that impact is too big here. User space DTrace probes are implemented like function call. When probe is inactive call is replaced by nop. Only what stay in code is arguments preparations. there is asm code 32bit intel from palloc: MemoryContextAlloc+0xae: pushl $0x0 MemoryContextAlloc+0xb0: pushl -0x8(%ebx) MemoryContextAlloc+0xb3: pushl 0xc(%ebp) MemoryContextAlloc+0xb6: movl 0x8(%ebp),%eax MemoryContextAlloc+0xb9: pushl %eax MemoryContextAlloc+0xba: pushl 0x14(%eax) MemoryContextAlloc+0xbd: nop MemoryContextAlloc+0xbe: nop MemoryContextAlloc+0xbf: nop MemoryContextAlloc+0xc0: nop MemoryContextAlloc+0xc1: nop MemoryContextAlloc+0xc2: addl $0x20,%esp You can see is that overhead depends on number of argument. I used five arguments now but two can be enough. Only dtrace script will be complicated in some cases after that. By my opinion if you compare whole palloc code path, Dtrace probes overhead is minimal. Zdenek
Alvaro Herrera píše v pá 13. 11. 2009 v 18:34 -0300: > Zdenek Kotala wrote: > > Attached patch contains new dtrace probes for memory management. Main > > purpose is to analyze memory footprint - for example how many memory > > needs transaction, peak memory per context, when memory block is reused > > or when it is allocate by malloc and so on. > > Having had to instrument these to figure out some problems, I'd give > this patch a +1. However, the performance argument is compelling. As a > compromise, maybe we could have a #define that needs to be turned on at > compile time to enable these probes; so a regular dtrace-enabled build > would not have them, but if you really needed to analyze memory > allocations, you could recompile to turn them on. But point of dtrace probes is that they are here without recompilation :(. Do we have any test which we could use for performance penalty testing? I don't think that overhead is significant. Zdenek
On Fri, Nov 13, 2009 at 4:41 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > Alvaro Herrera píše v pá 13. 11. 2009 v 18:34 -0300: >> Zdenek Kotala wrote: >> > Attached patch contains new dtrace probes for memory management. Main >> > purpose is to analyze memory footprint - for example how many memory >> > needs transaction, peak memory per context, when memory block is reused >> > or when it is allocate by malloc and so on. >> >> Having had to instrument these to figure out some problems, I'd give >> this patch a +1. However, the performance argument is compelling. As a >> compromise, maybe we could have a #define that needs to be turned on at >> compile time to enable these probes; so a regular dtrace-enabled build >> would not have them, but if you really needed to analyze memory >> allocations, you could recompile to turn them on. > > But point of dtrace probes is that they are here without > recompilation :(. Do we have any test which we could use for performance > penalty testing? I don't think that overhead is significant. Don't think. Benchmark. :-) (If you can measure it at all, it's too much, at least IMHO.) ...Robert
Alvaro Herrera píše v pá 13. 11. 2009 v 18:34 -0300: > Zdenek Kotala wrote: > > Attached patch contains new dtrace probes for memory management. Main > > purpose is to analyze memory footprint - for example how many memory > > needs transaction, peak memory per context, when memory block is reused > > or when it is allocate by malloc and so on. > > Having had to instrument these to figure out some problems, I'd give > this patch a +1. However, the performance argument is compelling. As a > compromise, maybe we could have a #define that needs to be turned on at > compile time to enable these probes; so a regular dtrace-enabled build > would not have them, but if you really needed to analyze memory > allocations, you could recompile to turn them on. The another idea is to have two AllocSet functions set. One without and one with dtrace probes. And switch it by GUC flag for example. From mcxt I create and delete context is important, rest can be taken from alloc set probes. Zdenek
On Fri, Nov 13, 2009 at 5:20 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > Alvaro Herrera píše v pá 13. 11. 2009 v 18:34 -0300: >> Zdenek Kotala wrote: >> > Attached patch contains new dtrace probes for memory management. Main >> > purpose is to analyze memory footprint - for example how many memory >> > needs transaction, peak memory per context, when memory block is reused >> > or when it is allocate by malloc and so on. >> >> Having had to instrument these to figure out some problems, I'd give >> this patch a +1. However, the performance argument is compelling. As a >> compromise, maybe we could have a #define that needs to be turned on at >> compile time to enable these probes; so a regular dtrace-enabled build >> would not have them, but if you really needed to analyze memory >> allocations, you could recompile to turn them on. > > The another idea is to have two AllocSet functions set. One without and > one with dtrace probes. And switch it by GUC flag for example. From mcxt > I create and delete context is important, rest can be taken from alloc > set probes. I've always thought it was odd that we had a function-dispatch interface in such a performance-critical path. But if we really want to keep it around this might not be a bad thing to use it for. ...Robert
On Fri, 2009-11-13 at 16:06 -0500, Tom Lane wrote: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > Attached patch contains new dtrace probes for memory management. > > This is a bad idea and I want to reject it outright. No ordinary user > is really going to care about those details, and palloc is a > sufficiently hot hot-spot that even the allegedly negligible overhead > of an inactive dtrace probe is going to cost us. No ordinary user is going to use dtrace at all. > > If this goes in, I will disable dtrace support in Red Hat's builds, > and I rather imagine that other packagers will react similarly. Is it possible to have a set of probes that would only be enabled with say, --enable-debug compile time option? I could certainly see the benefit to these probes for profiling but that is such as specific use that it seems to need a specific flag anyway. Sincerely, Joshua D. Drake > > regards, tom lane > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
On Fri, 2009-11-13 at 15:22 -0600, Kevin Grittner wrote: > "Joshua D. Drake" <jd@commandprompt.com> wrote: > > > Is it possible to have a set of probes that would only be enabled > > with say, --enable-debug compile time option? > > But we routinely build with that for normal production use so that if > we get a core dump or need to backtrace a problem process, we will get > meaningful results. Please don't create a performance penalty for > compiling with debug info. (We'd probably wind up disabling it....) Good point but I still say that the probes are useful. I am not sure we have a good compile time flag for it right now but it seems that would be the way to do it. Joshua D. Drake > > -Kevin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
--On 13. November 2009 17:16:15 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > Don't think. Benchmark. :-) > > (If you can measure it at all, it's too much, at least IMHO.) I've tried to benchmark this now on my (fairly slow compared to server hardware) MacBook and see some negative trend for those memory probes in pgbench. Running dozens of rounds with pgbench (scale 150, 10 clients / 1000 transactions) gives the following numbers (untuned PostgreSQL config): AVG(tps) with dtrace memory probes: 31.62 tps AVG(tps) without dtrace memory probes: 38.24 tps So there seems to be a minor slowdown at least on *my* machine. However, it would be fine if someone can prove these numbers.. What do you guys think, what other tests/parameters can be invoked to test for an impact ? -- Thanks Bernd
Bernd Helmle wrote: > I've tried to benchmark this now on my (fairly slow compared to server > hardware) MacBook and see some negative trend for those memory probes > in pgbench. Running dozens of rounds with pgbench (scale 150, 10 > clients / 1000 transactions) That makes for a 5.5 minute test, which is unfortunately close to the default checkpoint period. You're going to want a pgbench configuration that's doing thousands of operations per second to measure this overhead I think, and let it run a bit longer. The difference you're seeing could easily be just that that the "with probes" result had more checkpoints happen during testing than the other one--if it got even a single checkpoint more, that could be enough to throw results off using the default test and such low TPS results. Try this instead, which will give you a test where checkpoints have a minimal impact, but lots of memory will be thrown around: pgbench -i -s 10 <db> pgbench -S -c 10 -T 600 <db> That will do just SELECT statements against a much smaller database (about 160MB) and will run for 10 minutes each time. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
--On 8. Dezember 2009 15:51:52 -0500 Greg Smith <greg@2ndquadrant.com> wrote: > Try this instead, which will give you a test where checkpoints have a > minimal impact, but lots of memory will be thrown around: > > pgbench -i -s 10 <db> > pgbench -S -c 10 -T 600 <db> Thanks for the input, will try.... -- Thanks Bernd
Bernd Helmle píše v út 08. 12. 2009 v 22:06 +0100: > > --On 8. Dezember 2009 15:51:52 -0500 Greg Smith <greg@2ndquadrant.com> > wrote: > > > Try this instead, which will give you a test where checkpoints have a > > minimal impact, but lots of memory will be thrown around: > > > > pgbench -i -s 10 <db> > > pgbench -S -c 10 -T 600 <db> > > Thanks for the input, will try.... > I modified probes to reduce overhead. Prototype patch is attached. Main point is to remove mcxt_alloc probe and keep only aset_alloc. I did also some testing with interesting results. At first I prepare special C store function (attached) which do only allocation and deallocation and I measured how long it takes: On 32bit the memory allocation is slow down 8.4% and on 64bit it is only 4.6%. Good to mention that I call palloc and pfree but in standard behavior pfree is not much used and memory is freed when context is destroyed. It means that we should think about 4.2% and 2.3% instead. But in normal situation database does also other thing and palloc is only one part of code path. It is why I run second test and use sun studio profiling tools (collect/analyzer) to determine how much CPU ticks cost the probes during pg_bench run. And results are much better. AllocSet alloc function takes about 4-5% and probes assembler code takes 0.1-0.2% on 64bit. I did not test 32bit but my expectation is that it should be about 0.3-0.4%. Zdenek
Attachment
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > [...] > + header = (StandardChunkHeader *) > + ((char *) ret - STANDARDCHUNKHEADERSIZE); > + > +// TRACE_POSTGRESQL_MCXT_ALLOC(context->name, context, size, header->size, true); > + > [...] If the dormant overhead of these probes is measured or suspected to be excessive, consider using the dtrace-generated per-probe foo_ENABLED() conditional, or a postgres configuration global thusly: if (__builtin_expect(TRACE_POSTGRESQL_MCXT_ALLOC_ENABLED(), 0)) TRACE_POSTGRESQL_MCXT_ALLOC(...); so that the whole instrumentation parameter setup/call can be placed out of the hot line with gcc -freorder-blocks. - FChE
Frank Ch. Eigler píše v čt 10. 12. 2009 v 14:11 -0500: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > > [...] > > + header = (StandardChunkHeader *) > > + ((char *) ret - STANDARDCHUNKHEADERSIZE); > > + > > +// TRACE_POSTGRESQL_MCXT_ALLOC(context->name, context, size, header->size, true); > > + > > [...] > > If the dormant overhead of these probes is measured or suspected to be > excessive, consider using the dtrace-generated per-probe foo_ENABLED() > conditional, or a postgres configuration global thusly: TRACE_POSTGRESQL_MCXT_ALLOC and TRACE_POSTGRESQL_ASET_ALLOC are duplicated probes. Have them both make sense but from performance point of view to have one of them is acceptable. foo_enable() is good to use when number of argument and their evaluation cost too much. In this case it does no seem to be much useful. See ASM code: AllocSetAlloc+0x17: xorq %rax,%rax AllocSetAlloc+0x1a: nop AllocSetAlloc+0x1b: nop AllocSetAlloc+0x1c: testl %eax,%eax AllocSetAlloc+0x1e: je +0xb <AllocSetAlloc+0x2b> AllocSetAlloc+0x20: movq %r13,%rdi AllocSetAlloc+0x23: movq %r14,%rsi AllocSetAlloc+0x26: nop AllocSetAlloc+0x27: nop AllocSetAlloc+0x28: nop AllocSetAlloc+0x29: nop AllocSetAlloc+0x2a: nop > if (__builtin_expect(TRACE_POSTGRESQL_MCXT_ALLOC_ENABLED(), 0)) > TRACE_POSTGRESQL_MCXT_ALLOC(...); > > so that the whole instrumentation parameter setup/call can be placed > out of the hot line with gcc -freorder-blocks. compiler specific construct is not good way. Do not forget that also other compiler exists. Zdenek
Hi - On Thu, Dec 10, 2009 at 09:33:28PM +0100, Zdenek Kotala wrote: > [...] > > If the dormant overhead of these probes is measured or suspected to be > > excessive, consider using the dtrace-generated per-probe foo_ENABLED() > > conditional, or a postgres configuration global thusly: > [...] foo_enable() is good to use when number of argument and their > evaluation cost too much. In this case it does no seem to be much > useful. [...] Right, I just wanted to make the others aware of the option. > > if (__builtin_expect(TRACE_POSTGRESQL_MCXT_ALLOC_ENABLED(), 0)) > > TRACE_POSTGRESQL_MCXT_ALLOC(...); > > > > so that the whole instrumentation parameter setup/call can be placed > > out of the hot line with gcc -freorder-blocks. > > compiler specific construct is not good way. Do not forget that also > other compiler exists. Certainly. Many projects -- but apparently not postgresql -- wrap such branch prediction hints in macros such as likely() and unlikely(), which are easily no-op'd for compilers that don't support this sort of thing. - FChE
On Wed, Dec 9, 2009 at 9:04 AM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > Bernd Helmle píše v út 08. 12. 2009 v 22:06 +0100: >> >> --On 8. Dezember 2009 15:51:52 -0500 Greg Smith <greg@2ndquadrant.com> >> wrote: >> >> > Try this instead, which will give you a test where checkpoints have a >> > minimal impact, but lots of memory will be thrown around: >> > >> > pgbench -i -s 10 <db> >> > pgbench -S -c 10 -T 600 <db> >> >> Thanks for the input, will try.... >> > > I modified probes to reduce overhead. Prototype patch is attached. Main > point is to remove mcxt_alloc probe and keep only aset_alloc. I did also > some testing with interesting results. At first I prepare special C > store function (attached) which do only allocation and deallocation and > I measured how long it takes: > > On 32bit the memory allocation is slow down 8.4% and on 64bit it is > only 4.6%. Good to mention that I call palloc and pfree but in standard > behavior pfree is not much used and memory is freed when context is > destroyed. It means that we should think about 4.2% and 2.3% instead. > > But in normal situation database does also other thing and palloc is > only one part of code path. It is why I run second test and use sun > studio profiling tools (collect/analyzer) to determine how much CPU > ticks cost the probes during pg_bench run. And results are much better. > AllocSet alloc function takes about 4-5% and probes assembler code takes > 0.1-0.2% on 64bit. I did not test 32bit but my expectation is that it > should be about 0.3-0.4%. There's not really enough detail here to determine what you tested and what the results were, and I don't think this patch has any chance at all of getting committed without that. Please clarify. If there's some real-world test where this probe costs 0.3%-0.4%, I think that is sufficient grounds for rejecting this patch. I understand the desire of people to be able to use dtrace, but our performance is too hard-won for me to want to give any measurable of it up for tracing and instrumentation hooks that will only be used by a small number of users in a small number of situations. ...Robert
--On 10. Dezember 2009 23:55:49 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > If there's some real-world test where this probe costs 0.3%-0.4%, I > think that is sufficient grounds for rejecting this patch. I > understand the desire of people to be able to use dtrace, but our > performance is too hard-won for me to want to give any measurable of > it up for tracing and instrumentation hooks that will only be used by > a small number of users in a small number of situations. I repeated the pgbench runs per Greg's advice (see upthread) and it seems there is actually a small slowdown which supports this argument, unfortunately. After repeating the pgbench runs with and without the new probes (note: i've used the new version of the patch, too), the numbers are going to stabilize as follows: without compiled probes: AVG(2531.68) with compiled probes: AVG(2511.97) I can repeat that tests over and over, but this doesn't seem to change the whole picture (so there seems some real argument for a 0.4 - 0.6% cost, at least on *my* machine here with pgbench). -- Thanks Bernd
Bernd Helmle escribió: > I repeated the pgbench runs per Greg's advice (see upthread) and it > seems there is actually a small slowdown which supports this > argument, unfortunately. After repeating the pgbench runs with and > without the new probes (note: i've used the new version of the > patch, too), the numbers are going to stabilize as follows: > > without compiled probes: AVG(2531.68) > with compiled probes: AVG(2511.97) Were the probes enabled? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
--On 11. Dezember 2009 11:28:54 -0300 Alvaro Herrera <alvherre@commandprompt.com> wrote: >> >> without compiled probes: AVG(2531.68) >> with compiled probes: AVG(2511.97) > > Were the probes enabled? Hmm, they were just compiled in, i didn't anything to instrument them with dtrace. I've just started a pgbench/dtrace run with instrumented probes aset_alloc, aset_free and aset_realloc which just counts the calls to them during pgbench, the first run gives me tps = 1035.045523 (excluding connections establishing) Ideas? -- Thanks Bernd
Bernd Helmle píše v pá 11. 12. 2009 v 17:13 +0100: > > --On 11. Dezember 2009 11:28:54 -0300 Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > >> > >> without compiled probes: AVG(2531.68) > >> with compiled probes: AVG(2511.97) > > > > Were the probes enabled? > > Hmm, they were just compiled in, i didn't anything to instrument them with > dtrace. > > I've just started a pgbench/dtrace run with instrumented probes aset_alloc, > aset_free and aset_realloc which just counts the calls to them during > pgbench, the first run gives me > > tps = 1035.045523 (excluding connections establishing) > > Ideas? When probes are activated they have performance impact. It is normal. Important is that you can use it when you need it on production system. No recompilation, no extra binary, no downtime and it is safe. Performance impact depends on Dscript Zdenek
On Fri, Dec 11, 2009 at 12:55 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > Bernd Helmle píše v pá 11. 12. 2009 v 17:13 +0100: >> >> --On 11. Dezember 2009 11:28:54 -0300 Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >> >> >> >> >> without compiled probes: AVG(2531.68) >> >> with compiled probes: AVG(2511.97) >> > >> > Were the probes enabled? >> >> Hmm, they were just compiled in, i didn't anything to instrument them with >> dtrace. >> >> I've just started a pgbench/dtrace run with instrumented probes aset_alloc, >> aset_free and aset_realloc which just counts the calls to them during >> pgbench, the first run gives me >> >> tps = 1035.045523 (excluding connections establishing) >> >> Ideas? > > When probes are activated they have performance impact. It is normal. > Important is that you can use it when you need it on production system. > No recompilation, no extra binary, no downtime and it is safe. > Performance impact depends on Dscript Yeah. The problem here is the impact when the probes aren't enabled. I thought we had an idea of using the AllocSet dispatch mechanism to make this zero-overhead in the case where the probes are not enabled. What happened to that notion? ...Robert
Robert Haas píše v čt 10. 12. 2009 v 23:55 -0500: > On Wed, Dec 9, 2009 at 9:04 AM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > > > > But in normal situation database does also other thing and palloc is > > only one part of code path. It is why I run second test and use sun > > studio profiling tools (collect/analyzer) to determine how much CPU > > ticks cost the probes during pg_bench run. And results are much better. > > AllocSet alloc function takes about 4-5% and probes assembler code takes > > 0.1-0.2% on 64bit. I did not test 32bit but my expectation is that it > > should be about 0.3-0.4%. > > There's not really enough detail here to determine what you tested and > what the results were, and I don't think this patch has any chance at > all of getting committed without that. Please clarify. > > If there's some real-world test where this probe costs 0.3%-0.4%, I > think that is sufficient grounds for rejecting this patch. I > understand the desire of people to be able to use dtrace, but our > performance is too hard-won for me to want to give any measurable of > it up for tracing and instrumentation hooks that will only be used by > a small number of users in a small number of situations. > As I mentioned I run pg_bench -c10 -t1000 and collect data from backends. collect and analyzer is similar tool to gprof. Zdenek
Robert Haas píše v pá 11. 12. 2009 v 12:55 -0500: > On Fri, Dec 11, 2009 at 12:55 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > > Bernd Helmle píše v pá 11. 12. 2009 v 17:13 +0100: > >> > >> --On 11. Dezember 2009 11:28:54 -0300 Alvaro Herrera > >> <alvherre@commandprompt.com> wrote: > >> > >> >> > >> >> without compiled probes: AVG(2531.68) > >> >> with compiled probes: AVG(2511.97) > >> > > >> > Were the probes enabled? > >> > >> Hmm, they were just compiled in, i didn't anything to instrument them with > >> dtrace. > >> > >> I've just started a pgbench/dtrace run with instrumented probes aset_alloc, > >> aset_free and aset_realloc which just counts the calls to them during > >> pgbench, the first run gives me > >> > >> tps = 1035.045523 (excluding connections establishing) > >> > >> Ideas? > > > > When probes are activated they have performance impact. It is normal. > > Important is that you can use it when you need it on production system. > > No recompilation, no extra binary, no downtime and it is safe. > > Performance impact depends on Dscript > > Yeah. The problem here is the impact when the probes aren't enabled. > > I thought we had an idea of using the AllocSet dispatch mechanism to > make this zero-overhead in the case where the probes are not enabled. > What happened to that notion? We know that performance impact is less then 1% probably less then 0.6%. The question is if it is acceptable or not. I personally think that it is acceptable. However if not, I will start work on backup solution with dtraced AllocSet and some switching mechanism. But it needs little discussion about design. And first we need decision about current performance impact. Zdenek
On Fri, Dec 11, 2009 at 1:04 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > We know that performance impact is less then 1% probably less then 0.6%. > The question is if it is acceptable or not. I personally think that it > is acceptable. However if not, I will start work on backup solution with > dtraced AllocSet and some switching mechanism. But it needs little > discussion about design. And first we need decision about current > performance impact. As far as I am concerned that is way too much, particularly considering that your test case isn't designed to be particularly memory-allocation intensive, and if it is up to me I will reject this.Even a quarter-percent slowdown for a feature thatwill be used only by a small fraction of users only a small fraction of time time seems totally unacceptable to me. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > As far as I am concerned that is way too much, particularly > considering that your test case isn't designed to be particularly > memory-allocation intensive, and if it is up to me I will reject this. > Even a quarter-percent slowdown for a feature that will be used only > by a small fraction of users only a small fraction of time time seems > totally unacceptable to me. It seems to me that anyone who really needs this can instrument the alloc functions anyway --- isn't one of the features of DTrace supposed to be that you can monitor calls to a particular function without any prearranged code support? Or is that one of the things like "zero overhead" that turns out to be more marketing-speak than reality? Anyway I concur with Robert's opinion that the use-case is far too small to justify incurring a measurable overhead for everybody. There might be some small argument for putting these in under an extra #ifdef, but they wouldn't get into any regular production build. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > I thought we had an idea of using the AllocSet dispatch mechanism to > make this zero-overhead in the case where the probes are not enabled. > What happened to that notion? I must have missed that discussion, but +1 --- should be possible to get to zero-overhead-when-off that way. The trick is to figure out what/where enables the alternate implementation. The current design assumes that the callers of FooContextCreate choose the implementation, but we don't want that here. regards, tom lane
Tom Lane píše v pá 11. 12. 2009 v 13:56 -0500: > Robert Haas <robertmhaas@gmail.com> writes: > > As far as I am concerned that is way too much, particularly > > considering that your test case isn't designed to be particularly > > memory-allocation intensive, and if it is up to me I will reject this. > > Even a quarter-percent slowdown for a feature that will be used only > > by a small fraction of users only a small fraction of time time seems > > totally unacceptable to me. > > It seems to me that anyone who really needs this can instrument the > alloc functions anyway --- isn't one of the features of DTrace supposed > to be that you can monitor calls to a particular function without any > prearranged code support? Or is that one of the things like "zero > overhead" that turns out to be more marketing-speak than reality? There are several types of probes. For example for PID provider probes you can monitor all entry and return point from global function. And also you can put probe on each asm instruction in the function. These probes have zero overhead, because dtrace understand ABI and know where args are. Unfortunately user defined probes has small overhead which is price for universal solution which works with all compilers and linkers. > Anyway I concur with Robert's opinion that the use-case is far too small > to justify incurring a measurable overhead for everybody. OK. > There might > be some small argument for putting these in under an extra #ifdef, but > they wouldn't get into any regular production build. unfortunately #ifdef solution kills main dtrace goal - without recompilation :(. Zdenek
Tom Lane píše v pá 11. 12. 2009 v 14:38 -0500: > Robert Haas <robertmhaas@gmail.com> writes: > > I thought we had an idea of using the AllocSet dispatch mechanism to > > make this zero-overhead in the case where the probes are not enabled. > > What happened to that notion? > > I must have missed that discussion, but +1 --- should be possible to get > to zero-overhead-when-off that way. The trick is to figure out > what/where enables the alternate implementation. The current design > assumes that the callers of FooContextCreate choose the implementation, > but we don't want that here. I thought about it. I think we can use GUC variable (e.g. dtraced_alloc) and hook switch pointers to dtraced AsetFunctions. The problem is how to distribute to all backend. Zdenek
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > I thought about it. I think we can use GUC variable (e.g. dtraced_alloc) > and hook switch pointers to dtraced AsetFunctions. The problem is how to > distribute to all backend. You set the GUC in postgresql.conf. No big deal. If we go this route it would be nice to think about making a facility that has some usefulness for non-DTrace platforms too. regards, tom lane
Tom Lane píše v pá 11. 12. 2009 v 15:11 -0500: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > I thought about it. I think we can use GUC variable (e.g. dtraced_alloc) > > and hook switch pointers to dtraced AsetFunctions. The problem is how to > > distribute to all backend. > > You set the GUC in postgresql.conf. No big deal. > > If we go this route it would be nice to think about making a facility > that has some usefulness for non-DTrace platforms too. Do you mean general facility for switching memory allocator? Zdenek
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Tom Lane píše v pá 11. 12. 2009 v 15:11 -0500: >> If we go this route it would be nice to think about making a facility >> that has some usefulness for non-DTrace platforms too. > Do you mean general facility for switching memory allocator? No, I was thinking of some sort of memory allocation stats collection that doesn't depend on DTrace. It's amazing to me that we've never gone back and improved on the original quick-and-dirty MemoryContextStats mechanism. I certainly find myself using that a lot for issues like tracking down memory leaks. While palloc has a lot of advantages, the fact that you can't easily plug in a debug-friendly substitute malloc package is not one of them :-( regards, tom lane
Tom Lane wrote: > It's amazing to me that we've never > gone back and improved on the original quick-and-dirty > MemoryContextStats mechanism. I certainly find myself using that a > lot for issues like tracking down memory leaks. That code hasn't really gone anywhere since Neil tweaked the indentation two years ago. What sorts of improvements were you looking for? I started on trying to improve this area at one point but didn't get far. My first step was going to be just wrapping the call into a UDF to make it easier to reach without having to set loose gdb. I thought that might expand the possible uses for MemoryContextStats to "help find a memory leak safely on the production box", and therefore attact more attention to improving it. People really don't like running gdb on production, but a UDF dumping the same data wouldn't seem so risky. As Daniel Farina pointed out to me one day in a "duh" moment, that idea is quite obviously doomed by the fact that people want dumps from processes that you won't be talking to in a UDF context. You won't be able to find a leak in the background writer by dumping the context the UDF can see. There would need to be some sort of IPC/signaling mechanism in each process if you wanted it to work everywhere, and once that realization hit I realized this was a deeper problem than it appeared. If you want to make it easier to export into user space, it seems to me the memory context stats data either needs to get pushed somewhere it's easier to get to (the way the regular stats are), or there needs to be some way of polling any random PG process to find it--presumably by passing the request the pid you want to instrument. Neither of which may really be a reasonable idea at all, particularly since that in most cases, you're using a debugger to track your leak down only after reproducing a test case for it. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Fri, Dec 11, 2009 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: >> Tom Lane píše v pá 11. 12. 2009 v 15:11 -0500: >>> If we go this route it would be nice to think about making a facility >>> that has some usefulness for non-DTrace platforms too. > >> Do you mean general facility for switching memory allocator? > > No, I was thinking of some sort of memory allocation stats collection > that doesn't depend on DTrace. It's amazing to me that we've never > gone back and improved on the original quick-and-dirty > MemoryContextStats mechanism. I certainly find myself using that a > lot for issues like tracking down memory leaks. While palloc has a lot > of advantages, the fact that you can't easily plug in a debug-friendly > substitute malloc package is not one of them :-( This might be nice to have, but I don't think it's necessarily a prerequisite for a committable patch. For that, I think we just need something that uses the dispatch mechanism to avoid the overhead in the normal case. However, as this is a substantial redesign and there are 4 days left in the CommitFest, I am going to mark this patch Returned with Feedback for now. ...Robert
Greg Smith <greg@2ndquadrant.com> writes: > Tom Lane wrote: >> It's amazing to me that we've never >> gone back and improved on the original quick-and-dirty >> MemoryContextStats mechanism. > That code hasn't really gone anywhere since Neil tweaked the indentation > two years ago. What sorts of improvements were you looking for? Not anything like what you were thinking about, apparently ;-). What I wish for the most often is a way to track palloc usage down to the particular file/line of the alloc call. This is obviously not something that we'd pay the overhead for in a standard build, but I would love to be able to build a debug version that could do it when I needed. There are malloc substitutes that can do this, but the palloc layer means that they're not especially useful for debugging the PG backend. Of course something like that doesn't have to have anything to do with what Zdenek was wishing for, but it just struck me that creating a stats-oriented shim layer in the memory context stuff is playing on pretty much the same turf. > I started on trying to improve this area at one point but didn't get > far. My first step was going to be just wrapping the call into a UDF to > make it easier to reach without having to set loose gdb. I thought that > might expand the possible uses for MemoryContextStats to "help find a > memory leak safely on the production box", and therefore attact more > attention to improving it. Just for context: the reason MemoryContextStats is designed as it is is so that it has a reasonable chance of telling you something useful about an out-of-memory failure. When you're down to your last ten bytes, it's not going to help if your reporting mechanism wants to allocate a data structure to put its results in. So it writes to stderr and nothing else. It may well be that that constraint means we can never do anything really creative with MemoryContextStats per se. But I see the use-case for other mechanisms as being to get some info before we reach the point of having our backs to the wall. regards, tom lane