Thread: Using ProcSignal to get memory context stats from a running backend
Hi all
--
TL;DR: Lets add a ProcSignalReason that makes a backend call MemoryContextStats when it sees it and a C func that users can use to set it on a proc. Sane?
I fairly frequently want to get a memory context dump from a running backend. It's trivial to do on a debug-enabled build (or one with debug symbols packages installed) when on a system with gdb and when I'm the one driving.
Frequently one or more of these things are not true. Users rarely install debug symbols packages, and never build with --enable-debug if building from source. They rarely have gdb to hand, and rarely know how to use it if they do.
So getting memory context dumps from backends showing signs of unreasonable memory bloat is harder than would be ideal for such incredibly handy debug info. Especially since most users run with default overcommit on Linux, so Linux likes to nuke the process rather than let us print context dumps when we detect OOM.
So how about borrowing a ProcSignalReason entry for "dump a memory context summary at your earliest convenience" ? We could name it a more generic "dump debug data" in case we want to add things later.
Then a new pg_log_debug_backend(int) function or something like that could signal the proc and let CHECK_FOR_INTERRUPTS handle calling MemoryContextStats next time it's called.
We could clobber a prior ProcSignalReason set on the proc, but that's why we retry.
Way, way easier than getting gdb and debuginfo in place, attaching, etc. You still have to go fishing for stderr to find the output, but that's usually the same place as the rest of the logs.
Barring objections I'll send a patch in a while.
RE: Using ProcSignal to get memory context stats from a runningbackend
From
"Tsunakawa, Takayuki"
Date:
From: Craig Ringer [mailto:craig@2ndquadrant.com] > TL;DR: Lets add a ProcSignalReason that makes a backend call > MemoryContextStats when it sees it and a C func that users can use to set > it on a proc. Sane? > So how about borrowing a ProcSignalReason entry for "dump a memory context > summary at your earliest convenience" ? We could name it a more generic > "dump debug data" in case we want to add things later. > > Then a new pg_log_debug_backend(int) function or something like that could > signal the proc and let CHECK_FOR_INTERRUPTS handle calling > MemoryContextStats next time it's called. +1 That's one of things I wanted to do. It will be more useful on Windows. Would it work for autovac processes and backgroundworkers, etc. that connect to shared memory? I have also wanted to dump stack traces. Linux (glibc) has backtrace_symbols(), and Windows has StackWalk()/StackWalk64(). Is it sane to make the function a hook? Regards Takayuki Tsunakawa
Hi, On 2017-12-12 11:57:41 +0800, Craig Ringer wrote: > TL;DR: Lets add a ProcSignalReason that makes a backend > call MemoryContextStats when it sees it and a C func that users can use to > set it on a proc. Sane? It's not unproblematic. procsignal_sigusr1_handler() runs in a signal handler, so you can't really rely on a lot of stuff being legal to do. It'd be easy to set a flag in the handler and then have CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call. But that'd have the disadvanatage that it possibly would take a while till the MemoryContextStats() is executed. Not sure if that's still good enough for you? Another question is whether printing to stderr, bypassing proper logging!, is good enough for something like this. Greetings, Andres Freund
On 12 December 2017 at 12:25, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Craig Ringer [mailto:craig@2ndquadrant.com]
> TL;DR: Lets add a ProcSignalReason that makes a backend call
> MemoryContextStats when it sees it and a C func that users can use to set
> it on a proc. Sane?
> So how about borrowing a ProcSignalReason entry for "dump a memory context
> summary at your earliest convenience" ? We could name it a more generic
> "dump debug data" in case we want to add things later.
>
> Then a new pg_log_debug_backend(int) function or something like that could
> signal the proc and let CHECK_FOR_INTERRUPTS handle calling
> MemoryContextStats next time it's called.
+1
That's one of things I wanted to do. It will be more useful on Windows. Would it work for autovac processes and background workers, etc. that connect to shared memory?
Anything that uses CHECK_FOR_INTERRUPTS() and is attached to PGXACT. So yeah, pretty much anything attached to shmem.
I have also wanted to dump stack traces. Linux (glibc) has backtrace_symbols(), and Windows has StackWalk()/StackWalk64(). Is it sane to make the function a hook?
In-proc stack traces are immensely useful, and IMO relatively safe in a proc that's already in a reasonable state. If your stack is mangled, making it worse with an in-proc stack trace is rarely your biggest concern. I'd LOVE to be able to do this.
However, I'd want to address anything like that quite separately to the change I proposed to expose an existing facility.
Andres Freund <andres@anarazel.de> writes: > On 2017-12-12 11:57:41 +0800, Craig Ringer wrote: >> TL;DR: Lets add a ProcSignalReason that makes a backend >> call MemoryContextStats when it sees it and a C func that users can use to >> set it on a proc. Sane? > It's not unproblematic. procsignal_sigusr1_handler() runs in a signal > handler, so you can't really rely on a lot of stuff being legal to do. Indeed, calling MemoryContextStats in a signal handler would be a damfool idea, but Craig didn't propose that AFAICS. > Another question is whether printing to stderr, bypassing proper > logging!, is good enough for something like this. Yeah, this is an issue. MemoryContextStats is designed to print to stderr in the (possibly vain) hope that it will work even when we are up against an OOM failure. That's not a property I much want to give up, but you're right that it's not very desirable if a user is trying to capture state during normal running. regards, tom lane
On 12 December 2017 at 12:43, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
> TL;DR: Lets add a ProcSignalReason that makes a backend
> call MemoryContextStats when it sees it and a C func that users can use to
> set it on a proc. Sane?
It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
handler, so you can't really rely on a lot of stuff being legal to
do.
It'd be easy to set a flag in the handler and then have
CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call.
Yes, definitely. That was my intention. Trying to write to stderr, mess with memory contexts, etc from a signal handler context seems awfully hairy and definitely not something I'd want to risk on a live system.
But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?
Definitely. Sure, it won't be perfect, but it'd be a big improvement on what we have.
Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.
I think the reason it prints to stderr now is that it's intended to run in OOM situations.
Arguably that's not such a concern when being triggered by a procsignal. So elog(...) in that context could make sense. I'd probably add a print-wrapper callback arg to MemoryContextStatsDetail that you can use to write to a stringinfo / elog / fprintf(stderr), as desired.
On 12 December 2017 at 14:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Another question is whether printing to stderr, bypassing proper
> logging!, is good enough for something like this.
Yeah, this is an issue. MemoryContextStats is designed to print
to stderr in the (possibly vain) hope that it will work even when
we are up against an OOM failure. That's not a property I much
want to give up, but you're right that it's not very desirable
if a user is trying to capture state during normal running.
If we were willing to wrap variadic calls in a callback and rely on vfprintf, we could:
typedef void (*printwrapper)(void *extra, const char *fmt,...) pg_attribute_printf(2, 3);
...
static void
printwrapper_stderr(void *extra, const char *fmt, ...)
{
vfprintf(stderr, fmt, va_list);
}
void
MemoryContextStats(MemoryContext context)
{
MemoryContextStatsDetail(context, 100, printwrapper_stderr, NULL);
}
void
MemoryContextStatsDetail(MemoryContext context, int max_children,
printwrapper outfunc, void *printwrapper_extra)
{
...
printwrapper(extra, "Grand total: ...", ...);
}
and let the ProcSignal based caller pass an elog wrapper instead, or form a StringInfo with appendStringInfoVA then elog it in one go after the MemoryContextStatsDetail call returns.
I was worried about relying on vfprintf, but we already use it widely in the codebase so it shouldn't be an issue with elderly buildfarm critters.
Letting it go to stderr isn't too bad, but it'd certainly make it that bit nicer if it didn't so I'm not opposed to adding the needed indirection. I'll give it a try in a bit.
On 2017-12-12 14:25:48 +0800, Craig Ringer wrote: > If we were willing to wrap variadic calls in a callback and rely on > vfprintf, we could: I don't think there's problems with relying on variadic calls, we do that in plenty places. > and let the ProcSignal based caller pass an elog wrapper instead, or form a > StringInfo with appendStringInfoVA then elog it in one go after the > MemoryContextStatsDetail call returns. I don't think we want a simple elog wrapper - outputting the context stats as hundreds of log messages doesn't seem right. So at the least it seems we should bunch it up in stringinfo - which seems to at least require expanding the API to pass around a void *callback_data or such. I do wonder if the right thing here wouldn't be to put the result into a dsm segment, and then return that to the UDF on the requesting side. Logging to the server log and then have the requestor dig that out doesn't seem particularly user friendly. Greetings, Andres Freund
On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund <andres@anarazel.de> wrote: > I do wonder if the right thing here wouldn't be to put the result into a > dsm segment, and then return that to the UDF on the requesting > side. Logging to the server log and then have the requestor dig that out > doesn't seem particularly user friendly. I think that dumping it to the server log will be fine for most people, and it's *significantly* safer. Creating a DSM segment could fail, and the last thing we want is for interrogating a long-running backend to make it crap out. +1 for this whole concept, just BTW. As I've said before, I grow weary of asking customers to run gdb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017-12-12 14:04:55 -0500, Robert Haas wrote: > On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund <andres@anarazel.de> wrote: > > I do wonder if the right thing here wouldn't be to put the result into a > > dsm segment, and then return that to the UDF on the requesting > > side. Logging to the server log and then have the requestor dig that out > > doesn't seem particularly user friendly. > > I think that dumping it to the server log will be fine for most > people, and it's *significantly* safer. I agree that it's more reliable - I hope there's no meaningful safety difference. I think you overestimate users a bit however - far from most of them are going to be able to extract a very long log entry from a busy log file. There's no generally available easy way to copy a few pages of text from a logfile that's a few gigabytes large... I'd suggest adding two functions. > Creating a DSM segment could fail, and the last thing we want is for > interrogating a long-running backend to make it crap out. It'd need to handle memory alloc failures gracefully, I agree. That doesn't seem impossible. Also don't think that's meaningfully different in the non-dsm case, it'd be good to handle malloc failures gracefully in that case as well. The requesting side should create the dsm segment and preallocate a reasonable amount of memory, if we go for it. > +1 for this whole concept, just BTW. As I've said before, I grow > weary of asking customers to run gdb. Indeed. Greetings, Andres Freund
On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote: > I agree that it's more reliable - I hope there's no meaningful safety > difference. I think you overestimate users a bit however - far from > most of them are going to be able to extract a very long log entry from > a busy log file. There's no generally available easy way to copy a few > pages of text from a logfile that's a few gigabytes large... Well, a lot of users will send us the whole logfile rather than just the relevant bits, but that doesn't bother me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017-12-12 14:16:59 -0500, Robert Haas wrote: > On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote: > > I agree that it's more reliable - I hope there's no meaningful safety > > difference. I think you overestimate users a bit however - far from > > most of them are going to be able to extract a very long log entry from > > a busy log file. There's no generally available easy way to copy a few > > pages of text from a logfile that's a few gigabytes large... > > Well, a lot of users will send us the whole logfile rather than just > the relevant bits, but that doesn't bother me. That doesn't really work on some busy servers, and also often in cases where the log potentially contains sensitive (e.g. HIPPA) data. I think a function returning the dump would be great, a function "just" dumping to the server log still pretty good. Greetings, Andres Freund
On Mon, Dec 11, 2017 at 10:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 12 December 2017 at 12:43, Andres Freund <andres@anarazel.de> wrote: >> On 2017-12-12 11:57:41 +0800, Craig Ringer wrote: >> But that'd have >> the disadvanatage that it possibly would take a while till the >> MemoryContextStats() is executed. Not sure if that's still good enough >> for you? > > Definitely. Sure, it won't be perfect, but it'd be a big improvement on what > we have. If this would be fine enough, why not giving a shot then? Having to use gdb now on production systems is something sometimes hard to justify to customers. There are also the Windows problems... >> Another question is whether printing to stderr, bypassing proper >> logging!, is good enough for something like this. > > I think the reason it prints to stderr now is that it's intended to run in > OOM situations. Yep. I am on board with Tom here that this property should not be thrown away. -- Michael
On 13 December 2017 at 12:10, Michael Paquier <michael.paquier@gmail.com> wrote:
-- On Mon, Dec 11, 2017 at 10:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 12 December 2017 at 12:43, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
>> But that'd have
>> the disadvanatage that it possibly would take a while till the
>> MemoryContextStats() is executed. Not sure if that's still good enough
>> for you?
>
> Definitely. Sure, it won't be perfect, but it'd be a big improvement on what
> we have.
If this would be fine enough, why not giving a shot then? Having to
use gdb now on production systems is something sometimes hard to
justify to customers. There are also the Windows problems...
>> Another question is whether printing to stderr, bypassing proper
>> logging!, is good enough for something like this.
>
> I think the reason it prints to stderr now is that it's intended to run in
> OOM situations.
Yep. I am on board with Tom here that this property should not be thrown away.
OK, so I think I'll do pretty much what I outlined above, using stringinfo for the !OOM case and fprintf for the OOM case, via callback, roughly as outlined upthread. I won't bother with elog, it's easy enough if someone cares.
Another simpler option would be to open up a new file in the log directory something like "debug_dump.<pid>.txt" and print whatever you want there. Then print out a reasonable size log entry saying "Debug dump output to file 'debug_dump.<pid>.txt'". You could provide a function that reads such files out of the log directory or just document how to access them using the pg_read_file(). It's not perfect but it's got most of the advantages of communicating with the requesting process without the complexities of a DSM segment and it's a bit more flexible too. Users can have automated monitoring tools watch for dumps for example. And regular tools can be used to back up and clean out old files.
On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:
-- Another simpler option would be to open up a new file in the log
directory
... if we have one.
We might be logging to syslog, or whatever else.
I'd rather keep it simple(ish).
On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote: >> Another simpler option would be to open up a new file in the log >> directory > > ... if we have one. > > We might be logging to syslog, or whatever else. > > I'd rather keep it simple(ish). +1. I still think just printing it to the log is fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:
>> Another simpler option would be to open up a new file in the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).
+1. I still think just printing it to the log is fine.
Here we go. Implemented pretty much as outlined above. A new pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...).
I didn't want to mess with the MemoryContextMethods and expose a printf-wrapper style typedef in memnodes.h, so I went with a hook global. It's a diagnostic routine so I don't think that's going to be a great bother. By default it's set to write_stderr. That just writes to vfprintf on unix so the outcome's the same as our direct use of fprintf now.
On Windows, though, using write_stderr will make Pg attempt to write memory context dumps to the eventlog with ERROR level if running as a service with no console. That means we vsnprintf the string into a static buffer first. I'm not 100% convinced of the wisdom of that because it could flood the event log, which is usually size limited by # of events and recycled. It'd be better to write one event than write one per memory context line, but it's not safe to do that when OOM. I lean toward this being a win: at least Windows users should have some hope of finding out why Pg OOM'd, which currently they don't when it runs as a service. If we don't, we should look at using SetStdHandle to write stderr to a secondary logfile instead.
I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the same behaviour instead, but I figured I'd start with reusing write_stderr.
I'd really like to preserve the writing to elog(...) not fprintf, because on some systems simply finding where stderr is written can be a challenge, if it's not going to /dev/null via some detached session. Is it in journald? In some separate log? Being captured by the logging collector (and probably silently eaten)? Who knows!
Using elog(...) provides a log_line_prefix and other useful info to associate the dump with the process of interest and what it's doing at the time.
Also, an astonishing number of deployed systems I've worked with actually don't put the pid or anything useful in log_line_prefix to make grouping up multi-line output practical. Which is insane. But it's only PGC_SIGHUP so fixing it is easy enough.
Attachment
Hi
2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:
>> Another simpler option would be to open up a new file in the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).
+1. I still think just printing it to the log is fine.Here we go. Implemented pretty much as outlined above. A new pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...). I didn't want to mess with the MemoryContextMethods and expose a printf-wrapper style typedef in memnodes.h, so I went with a hook global. It's a diagnostic routine so I don't think that's going to be a great bother. By default it's set to write_stderr. That just writes to vfprintf on unix so the outcome's the same as our direct use of fprintf now.On Windows, though, using write_stderr will make Pg attempt to write memory context dumps to the eventlog with ERROR level if running as a service with no console. That means we vsnprintf the string into a static buffer first. I'm not 100% convinced of the wisdom of that because it could flood the event log, which is usually size limited by # of events and recycled. It'd be better to write one event than write one per memory context line, but it's not safe to do that when OOM. I lean toward this being a win: at least Windows users should have some hope of finding out why Pg OOM'd, which currently they don't when it runs as a service. If we don't, we should look at using SetStdHandle to write stderr to a secondary logfile instead.I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the same behaviour instead, but I figured I'd start with reusing write_stderr.I'd really like to preserve the writing to elog(...) not fprintf, because on some systems simply finding where stderr is written can be a challenge, if it's not going to /dev/null via some detached session. Is it in journald? In some separate log? Being captured by the logging collector (and probably silently eaten)? Who knows!Using elog(...) provides a log_line_prefix and other useful info to associate the dump with the process of interest and what it's doing at the time.Also, an astonishing number of deployed systems I've worked with actually don't put the pid or anything useful in log_line_prefix to make grouping up multi-line output practical. Which is insane. But it's only PGC_SIGHUP so fixing it is easy enough.
sorry for small offtopic. Can be used this mechanism for log of executed plan or full query?
Regards
Pavel
--
On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > I didn't want to mess with the MemoryContextMethods and expose a > printf-wrapper style typedef in memnodes.h, so I went with a hook global. That looks pretty grotty to me. I think if you want to elog/ereport this, you need to pass another argument to MemoryContextStats() or add another memory context method. This is pretty much a textbook example of the wrong way to use a global variable, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> I didn't want to mess with the MemoryContextMethods and expose a >> printf-wrapper style typedef in memnodes.h, so I went with a hook global. > That looks pretty grotty to me. I think if you want to elog/ereport > this, you need to pass another argument to MemoryContextStats() or add > another memory context method. This is pretty much a textbook example > of the wrong way to use a global variable, IMHO. Yeah. But please don't mess with MemoryContextStats per se --- I dunno about you guys but "call MemoryContextStats(TopMemoryContext)" is kinda wired into my gdb reflexes. I think what'd make sense is a new function "MemoryContextStatsTo(context, function_pointer)". It's okay to redefine the APIs of the per-context-type functions these would call, though, because nobody calls those functions directly. regards, tom lane
On 2017-12-19 13:17:52 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> I didn't want to mess with the MemoryContextMethods and expose a > >> printf-wrapper style typedef in memnodes.h, so I went with a hook global. > > > That looks pretty grotty to me. I think if you want to elog/ereport > > this, you need to pass another argument to MemoryContextStats() or add > > another memory context method. This is pretty much a textbook example > > of the wrong way to use a global variable, IMHO. Agreed. > Yeah. But please don't mess with MemoryContextStats per se --- > I dunno about you guys but "call MemoryContextStats(TopMemoryContext)" > is kinda wired into my gdb reflexes. I think what'd make sense > is a new function "MemoryContextStatsTo(context, function_pointer)". > It's okay to redefine the APIs of the per-context-type functions > these would call, though, because nobody calls those functions directly. We already have MemoryContextStatsDetail() - it seems to make sense to expand that API and leave MemoryContextStats() alone. I don't think there's a need for a third variant? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-12-19 13:17:52 -0500, Tom Lane wrote: >> Yeah. But please don't mess with MemoryContextStats per se --- >> I dunno about you guys but "call MemoryContextStats(TopMemoryContext)" >> is kinda wired into my gdb reflexes. I think what'd make sense >> is a new function "MemoryContextStatsTo(context, function_pointer)". >> It's okay to redefine the APIs of the per-context-type functions >> these would call, though, because nobody calls those functions directly. > We already have MemoryContextStatsDetail() - it seems to make sense to > expand that API and leave MemoryContextStats() alone. I don't think > there's a need for a third variant? Sure, WFM. regards, tom lane
On 20 December 2017 at 02:35, Andres Freund <andres@anarazel.de> wrote:
> Yeah. But please don't mess with MemoryContextStats per se ---
> I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
> is kinda wired into my gdb reflexes. I think what'd make sense
> is a new function "MemoryContextStatsTo(context, function_pointer)".
> It's okay to redefine the APIs of the per-context-type functions
> these would call, though, because nobody calls those functions directly.
We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?
Cool, can do.
I'll have to expose a typedef for the printf-wrapper callback in memnodes.h and add it to the stats method, which I thought would be more likely to get complaints than the global hook. I'm actually happier to do it with a passed callback.
Will revise when I get a chance in the next couple of days.
On 20 December 2017 at 08:46, Craig Ringer <craig@2ndquadrant.com> wrote:
-- On 20 December 2017 at 02:35, Andres Freund <andres@anarazel.de> wrote:> Yeah. But please don't mess with MemoryContextStats per se ---
> I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
> is kinda wired into my gdb reflexes. I think what'd make sense
> is a new function "MemoryContextStatsTo(context, function_pointer)".
> It's okay to redefine the APIs of the per-context-type functions
> these would call, though, because nobody calls those functions directly.
We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?Cool, can do.I'll have to expose a typedef for the printf-wrapper callback in memnodes.h and add it to the stats method, which I thought would be more likely to get complaints than the global hook. I'm actually happier to do it with a passed callback.Will revise when I get a chance in the next couple of days.
Done.
It uses vfprintf unconditionally, even on Windows where we'd usually use write_stderr, so it doesn't change the current MemoryContextStats behaviour.
2017-12-21 14:39:20.045 AWST [10588] pg_regress/misc_functions LOG: diagnostic dump requested; memory context info: TopMemoryContext: 67440 total in 5 blocks; 13376 free (6 chunks); 54064 used
pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1376 free (0 chunks); 6816 used
TopTransactionContext: 8192 total in 1 blocks; 7728 free (1 chunks); 464 used
...
To verify that stderr output still works properly I ran:
SELECT
repeat('spamspamspam', 20000000),
repeat('spamspamspam', 20000000),
repeat('spamspamspam', 20000000),
repeat('spamspamspam', 20000000),
... lots ...;
and got the expected
+ERROR: out of memory
+DETAIL: Failed on request of size 240000004.
and memory context dump to stderr. I didn't add a TAP test for that because I'd rather avoid exercising OOM in tests where we don't know what the host's swap config is like, how its ulimit behaves, whether it has craziness like the OOM killer, etc. But it might make sense to add a TAP test to set a low ulimit and exercise OOM recovery at some point.
I've added a separate vfprintf wrapper for memory context use that doesn't try to use the windows error log like write_stderr(...) does. If we want to change OOM behaviour on Windows it can be done in a followup.
Attachment
Hi, On 2017-12-21 14:49:28 +0800, Craig Ringer wrote: > +/* > + * Accumulate writes into the buffer in diag_request_buf, > + * for use with functions that expect a printf-like callback. > + */ > +static void > +printwrapper_stringinfo(void *extra, const char * fmt, ...) > +{ > + StringInfo out = extra; > + for (;;) > + { > + va_list args; > + int needed; > + va_start(args, fmt); > + needed = appendStringInfoVA(out, fmt, args); > + va_end(args); > + if (needed == 0) > + break; > + enlargeStringInfo(out, needed); > + } > } Hm, so I'm not entirely sure it's ok to use something that ERRORs on OOM. There's plenty of scenarios with thousands of memory contexts, making this output fairly large. If we want to make this usable in production, I'm not sure it's ok to introduce additional ERRORs. I wonder if we can change this to emit a static message if collecting the output exhausted memory. Greetings, Andres Freund
On 21 December 2017 at 14:58, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-12-21 14:49:28 +0800, Craig Ringer wrote:
> +/*
> + * Accumulate writes into the buffer in diag_request_buf,
> + * for use with functions that expect a printf-like callback.
> + */
> +static void
> +printwrapper_stringinfo(void *extra, const char * fmt, ...)
> +{
> + StringInfo out = extra;
> + for (;;)
> + {
> + va_list args;
> + int needed;
> + va_start(args, fmt);
> + needed = appendStringInfoVA(out, fmt, args);
> + va_end(args);
> + if (needed == 0)
> + break;
> + enlargeStringInfo(out, needed);
> + }
> }
Hm, so I'm not entirely sure it's ok to use something that ERRORs on
OOM. There's plenty of scenarios with thousands of memory contexts,
making this output fairly large. If we want to make this usable in
production, I'm not sure it's ok to introduce additional ERRORs. I
wonder if we can change this to emit a static message if collecting the
output exhausted memory.
There tons of callers to enlargeStringInfo, so a 'noerror' parameter would be viable.
But I'm not convinced it's worth it personally. If we OOM in response to a ProcSignal request for memory context output, we're having pretty bad luck. The output is 8k in my test. But even if it were a couple of hundred kb, happening to hit OOM just then isn't great luck on modern systems with many gigabytes of RAM.
If that *does* happen, repalloc(...) will call MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get our memory context dump anyway, albeit to stderr.
Hi, On 2017-12-21 15:13:13 +0800, Craig Ringer wrote: > There tons of callers to enlargeStringInfo, so a 'noerror' parameter would > be viable. Not sure what you mean with that sentence? > But I'm not convinced it's worth it personally. If we OOM in response to a > ProcSignal request for memory context output, we're having pretty bad luck. > The output is 8k in my test. But even if it were a couple of hundred kb, > happening to hit OOM just then isn't great luck on modern systems with many > gigabytes of RAM. I've seen plenty memory dumps in the dozens to hundreds of megabytes. And imo such cases are more likely to invite use of this facility. > If that *does* happen, repalloc(...) will call > MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get > our memory context dump anyway, albeit to stderr. That would still abort the query that might otherwise continue to work, so that seems no excuse. Greetings, Andres Freund
On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> There tons of callers to enlargeStringInfo, so a 'noerror' parameter would
> be viable.
Not sure what you mean with that sentence?
Mangled in editing and sent prematurely, disregard.
There are NOT tons of callers to enlargeStringInfo, so adding a parameter that allowed it to return a failure result rather than ERROR on OOM seems to be a reasonable option. But it relies on repalloc(), which will ERROR on OOM. AFAICS we don't have "no error" variants of the memory allocation routines and I'm not eager to add them. Especially since we can't trust that we're not on misconfigured Linux where the OOM killer will go wild instead of giving us a nice NULL result.
So I guess that means we should probably just do individual elog(...)s and live with the ugliness of scraping the resulting mess out of the logs. After all, a log destination that could possibly copy and modify the string being logged a couple of times it's not a good idea to try to drop the whole thing into the logs in one blob. And we can't trust things like syslog with large messages.
I'll follow up with a revised patch that uses individual elog()s soon.
BTW, I also pgindented it in my tree, so it'll have formatting fixed up. pgindent's handling of
static void printwrapper_stringinfo(void *extra, const char *fmt,...) pg_attribute_printf(2, 3);
upsets me a little, since if I break the line it mangles it into
static void
printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);
so it'll break line-length rules a little, but otherwise conform.
> But I'm not convinced it's worth it personally. If we OOM in response to a
> ProcSignal request for memory context output, we're having pretty bad luck.
> The output is 8k in my test. But even if it were a couple of hundred kb,
> happening to hit OOM just then isn't great luck on modern systems with many
> gigabytes of RAM.
I've seen plenty memory dumps in the dozens to hundreds of
megabytes. And imo such cases are more likely to invite use of this
facility.
OK, that's more concerning then. Also impressively huge.
As written it's using MemoryContextStatsDetail(TopMemoryContext, 100) so it shouldn't really be *getting* multimegabyte dumps, but I'm not thrilled by hardcoding that. It doesn't merit a GUC though, so I'm not sure what to do about it and figured I'd make it a "later" problem.
> If that *does* happen, repalloc(...) will call
> MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
> our memory context dump anyway, albeit to stderr.
That would still abort the query that might otherwise continue to work,
so that seems no excuse.
Eh. It'd probably die soon enough anyway. But yeah, I agree it's not safe to hope we can sling around up to a couple of hundred MB under presumed memory pressure.
On 21 December 2017 at 15:55, Craig Ringer <craig@2ndquadrant.com> wrote:
On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> There tons of callers to enlargeStringInfo, so a 'noerror' parameter would
> be viable.
Not sure what you mean with that sentence?Mangled in editing and sent prematurely, disregard.There are NOT tons of callers to enlargeStringInfo, so adding a parameter that allowed it to return a failure result rather than ERROR on OOM seems to be a reasonable option. But it relies on repalloc(), which will ERROR on OOM. AFAICS we don't have "no error" variants of the memory allocation routines and I'm not eager to add them. Especially since we can't trust that we're not on misconfigured Linux where the OOM killer will go wild instead of giving us a nice NULL result.So I guess that means we should probably just do individual elog(...)s and live with the ugliness of scraping the resulting mess out of the logs. After all, a log destination that could possibly copy and modify the string being logged a couple of times it's not a good idea to try to drop the whole thing into the logs in one blob. And we can't trust things like syslog with large messages.I'll follow up with a revised patch that uses individual elog()s soon.
I intend to add an elog_internal_raw for this, which takes a pre-formatted string and bypasses EVALUATE_MESSAGE. In this case, one written to a static buffer by vsnprintf.
That bypasses two rounds of allocations by elog - expand_fmt_string for %m substitution, and the appendStringInfoVA for formatting. And it's a whole lot cleaner than
char buffer[2048];
...
vsnprintf(buffer, sizeof(buffer), ...)
...
elog(LOG, "%s", buffer);
It still imposes a single-line length limit, but no worse than write_stderr already does on win32. If that's not OK, preformatting each line a StringInfo before dumping straight to elog works too.
Complaints or seem OK?
On 22 December 2017 at 13:05, Craig Ringer <craig@2ndquadrant.com> wrote:
On 21 December 2017 at 15:55, Craig Ringer <craig@2ndquadrant.com> wrote:On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> There tons of callers to enlargeStringInfo, so a 'noerror' parameter would
> be viable.
Not sure what you mean with that sentence?Mangled in editing and sent prematurely, disregard.There are NOT tons of callers to enlargeStringInfo, so adding a parameter that allowed it to return a failure result rather than ERROR on OOM seems to be a reasonable option. But it relies on repalloc(), which will ERROR on OOM. AFAICS we don't have "no error" variants of the memory allocation routines and I'm not eager to add them. Especially since we can't trust that we're not on misconfigured Linux where the OOM killer will go wild instead of giving us a nice NULL result.So I guess that means we should probably just do individual elog(...)s and live with the ugliness of scraping the resulting mess out of the logs. After all, a log destination that could possibly copy and modify the string being logged a couple of times it's not a good idea to try to drop the whole thing into the logs in one blob. And we can't trust things like syslog with large messages.I'll follow up with a revised patch that uses individual elog()s soon.I intend to add an elog_internal_raw
Er, I mean
errmsg_internal_raw(const char * errmsg)
On 19.12.2017 16:54, Pavel Stehule wrote:
Hi2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:
>> Another simpler option would be to open up a new file in the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).
+1. I still think just printing it to the log is fine.Here we go. Implemented pretty much as outlined above. A new pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...). I didn't want to mess with the MemoryContextMethods and expose a printf-wrapper style typedef in memnodes.h, so I went with a hook global. It's a diagnostic routine so I don't think that's going to be a great bother. By default it's set to write_stderr. That just writes to vfprintf on unix so the outcome's the same as our direct use of fprintf now.On Windows, though, using write_stderr will make Pg attempt to write memory context dumps to the eventlog with ERROR level if running as a service with no console. That means we vsnprintf the string into a static buffer first. I'm not 100% convinced of the wisdom of that because it could flood the event log, which is usually size limited by # of events and recycled. It'd be better to write one event than write one per memory context line, but it's not safe to do that when OOM. I lean toward this being a win: at least Windows users should have some hope of finding out why Pg OOM'd, which currently they don't when it runs as a service. If we don't, we should look at using SetStdHandle to write stderr to a secondary logfile instead.I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the same behaviour instead, but I figured I'd start with reusing write_stderr.I'd really like to preserve the writing to elog(...) not fprintf, because on some systems simply finding where stderr is written can be a challenge, if it's not going to /dev/null via some detached session. Is it in journald? In some separate log? Being captured by the logging collector (and probably silently eaten)? Who knows!Using elog(...) provides a log_line_prefix and other useful info to associate the dump with the process of interest and what it's doing at the time.Also, an astonishing number of deployed systems I've worked with actually don't put the pid or anything useful in log_line_prefix to make grouping up multi-line output practical. Which is insane. But it's only PGC_SIGHUP so fixing it is easy enough.sorry for small offtopic. Can be used this mechanism for log of executed plan or full query?
This idea (but without logging) is implemented in the work on pg_progress command proposed by Remi Colinet[1] and in extension pg_query_state[2].
1. https://www.postgresql.org/message-id/CADdR5nxQUSh5kCm9MKmNga8%2Bc1JLxLHDzLhAyXpfo9-Wmc6s5g%40mail.gmail.com
2. https://github.com/postgrespro/pg_query_state
-- Regards, Maksim Milyutin
On 22 December 2017 at 20:50, Maksim Milyutin <milyutinma@gmail.com> wrote:
On 19.12.2017 16:54, Pavel Stehule wrote:
Hi2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:
>> Another simpler option would be to open up a new file in the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).
+1. I still think just printing it to the log is fine.Here we go. Implemented pretty much as outlined above. A new pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...). I didn't want to mess with the MemoryContextMethods and expose a printf-wrapper style typedef in memnodes.h, so I went with a hook global. It's a diagnostic routine so I don't think that's going to be a great bother. By default it's set to write_stderr. That just writes to vfprintf on unix so the outcome's the same as our direct use of fprintf now.On Windows, though, using write_stderr will make Pg attempt to write memory context dumps to the eventlog with ERROR level if running as a service with no console. That means we vsnprintf the string into a static buffer first. I'm not 100% convinced of the wisdom of that because it could flood the event log, which is usually size limited by # of events and recycled. It'd be better to write one event than write one per memory context line, but it's not safe to do that when OOM. I lean toward this being a win: at least Windows users should have some hope of finding out why Pg OOM'd, which currently they don't when it runs as a service. If we don't, we should look at using SetStdHandle to write stderr to a secondary logfile instead.I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the same behaviour instead, but I figured I'd start with reusing write_stderr.I'd really like to preserve the writing to elog(...) not fprintf, because on some systems simply finding where stderr is written can be a challenge, if it's not going to /dev/null via some detached session. Is it in journald? In some separate log? Being captured by the logging collector (and probably silently eaten)? Who knows!Using elog(...) provides a log_line_prefix and other useful info to associate the dump with the process of interest and what it's doing at the time.Also, an astonishing number of deployed systems I've worked with actually don't put the pid or anything useful in log_line_prefix to make grouping up multi-line output practical. Which is insane. But it's only PGC_SIGHUP so fixing it is easy enough.sorry for small offtopic. Can be used this mechanism for log of executed plan or full query?
That's a really good idea. I'd love to be able to pg_explain_backend(...)
I left the mechanism as a generic diagnostic signal exactly so that we could add other things we wanted to be able to ask backends. I think a follow-on patch that adds support for dumping explain-format plans would be great, if it's practical to do that while a query's already running.
2017-12-22 13:50 GMT+01:00 Maksim Milyutin <milyutinma@gmail.com>:
On 19.12.2017 16:54, Pavel Stehule wrote:
Hi2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:
>> Another simpler option would be to open up a new file in the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).
+1. I still think just printing it to the log is fine.Here we go. Implemented pretty much as outlined above. A new pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...). I didn't want to mess with the MemoryContextMethods and expose a printf-wrapper style typedef in memnodes.h, so I went with a hook global. It's a diagnostic routine so I don't think that's going to be a great bother. By default it's set to write_stderr. That just writes to vfprintf on unix so the outcome's the same as our direct use of fprintf now.On Windows, though, using write_stderr will make Pg attempt to write memory context dumps to the eventlog with ERROR level if running as a service with no console. That means we vsnprintf the string into a static buffer first. I'm not 100% convinced of the wisdom of that because it could flood the event log, which is usually size limited by # of events and recycled. It'd be better to write one event than write one per memory context line, but it's not safe to do that when OOM. I lean toward this being a win: at least Windows users should have some hope of finding out why Pg OOM'd, which currently they don't when it runs as a service. If we don't, we should look at using SetStdHandle to write stderr to a secondary logfile instead.I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the same behaviour instead, but I figured I'd start with reusing write_stderr.I'd really like to preserve the writing to elog(...) not fprintf, because on some systems simply finding where stderr is written can be a challenge, if it's not going to /dev/null via some detached session. Is it in journald? In some separate log? Being captured by the logging collector (and probably silently eaten)? Who knows!Using elog(...) provides a log_line_prefix and other useful info to associate the dump with the process of interest and what it's doing at the time.Also, an astonishing number of deployed systems I've worked with actually don't put the pid or anything useful in log_line_prefix to make grouping up multi-line output practical. Which is insane. But it's only PGC_SIGHUP so fixing it is easy enough.sorry for small offtopic. Can be used this mechanism for log of executed plan or full query?
This idea (but without logging) is implemented in the work on pg_progress command proposed by Remi Colinet[1] and in extension pg_query_state[2].
1. https://www.postgresql.org/message-id/ CADdR5nxQUSh5kCm9MKmNga8% 2Bc1JLxLHDzLhAyXpfo9-Wmc6s5g% 40mail.gmail.com
2. https://github.com/postgrespro/pg_query_state
I remember the discussion - and I hope so one time we will have some EXPLAIN PROCESS pid command.
Using signal and pg log can be very simple solution immediately available
Regards
Pavel
-- Regards, Maksim Milyutin
On 22.12.2017 16:56, Craig Ringer wrote:
On 22 December 2017 at 20:50, Maksim Milyutin <milyutinma@gmail.com> wrote:On 19.12.2017 16:54, Pavel Stehule wrote:
sorry for small offtopic. Can be used this mechanism for log of executed plan or full query?That's a really good idea. I'd love to be able to pg_explain_backend(...)I left the mechanism as a generic diagnostic signal exactly so that we could add other things we wanted to be able to ask backends. I think a follow-on patch that adds support for dumping explain-format plans would be great, if it's practical to do that while a query's already running.
Noticing the interest in the calling some routines on the remote backend through signals, in parallel thread[1] I have proposed the possibility to define user defined signal handlers in extensions. There is a patch taken from pg_query_state module.
1. https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592%40gmail.com
-- Regards, Maksim Milyutin
On 22 December 2017 at 23:19, Maksim Milyutin <milyutinma@gmail.com> wrote:
On 22.12.2017 16:56, Craig Ringer wrote:
On 22 December 2017 at 20:50, Maksim Milyutin <milyutinma@gmail.com> wrote:On 19.12.2017 16:54, Pavel Stehule wrote:
sorry for small offtopic. Can be used this mechanism for log of executed plan or full query?That's a really good idea. I'd love to be able to pg_explain_backend(...)I left the mechanism as a generic diagnostic signal exactly so that we could add other things we wanted to be able to ask backends. I think a follow-on patch that adds support for dumping explain-format plans would be great, if it's practical to do that while a query's already running.
Noticing the interest in the calling some routines on the remote backend through signals, in parallel thread[1] I have proposed the possibility to define user defined signal handlers in extensions. There is a patch taken from pg_query_state module.
1. https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0- 64a1-7fd9b8351592%40gmail.com
Haven't read the patch, but the idea seems useful if a bit hairy in practice. It'd be done on a "use at your own risk, and if it breaks you get to keep the pieces" basis, where no guarantees are made by Pg about how and when the function is called so it must be utterly defensive. The challenge there being that you can't always learn enough about the state of the system without doing things that might break it, so lots of things you could do would be fragile.
On 27.12.2017 10:44, Craig Ringer wrote:
On 22 December 2017 at 23:19, Maksim Milyutin <milyutinma@gmail.com> wrote:Noticing the interest in the calling some routines on the remote backend through signals, in parallel thread[1] I have proposed the possibility to define user defined signal handlers in extensions. There is a patch taken from pg_query_state module.
1. https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0- 64a1-7fd9b8351592%40gmail.com Haven't read the patch, but the idea seems useful if a bit hairy in practice. It'd be done on a "use at your own risk, and if it breaks you get to keep the pieces" basis, where no guarantees are made by Pg about how and when the function is called so it must be utterly defensive. The challenge there being that you can't always learn enough about the state of the system without doing things that might break it, so lots of things you could do would be fragile.
Yes, I agree with your thesis that the state of the system have to be checked/rechecked before starting its manipulation in signal handler. And the responsibility is down to developer of extension to provide the necessary level of defensive. Perhaps, it makes sense to add try-catch block around signal handler to interrupt potential errors therefrom.
-- Regards, Maksim Milyutindefe
Did this go anywhere? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services