Re: Redacting information from logs - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Redacting information from logs |
Date | |
Msg-id | 20190804181757.jkkqfew2ucqgkvxe@alap3.anarazel.de Whole thread Raw |
In response to | Re: Redacting information from logs (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Redacting information from logs
|
List | pgsql-hackers |
Hi, On 2019-08-03 19:14:13 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I.e. something very roughly like > > > ereport(ERROR, > > errmsg_rich("string with %{named}s references to %{parameter}s"), > > errparam("named", somevar), > > errparam("parameter", othervar, .redact = CONTEXT)); > > I'm not terribly attracted by that specific approach, though. With > this, we'd get to maintain *two* variants of snprintf, and I think > the one with annotation knowledge would have significant performance > problems. (Something I'm sensitive to, on account of certain persons > beating me over the head about snprintf.c's performance.) I don't think that the performance issues of *printf in general - which we use in a lot of hot paths like output functions - are the same as when we log messages/throw errors. My gut feeling that at least >= DEBUG1 the difference would be hard to measure. And given how we e.g. build up the log_line_prefix, we could probably also recoup a lot potentially lost performance, around logging. We would probably just allow both positional references and named references. I.e. for smaller messages we can just use the current format, and just for the very long ones with many parameters we'd normally use a positional reference - but still have the name associated with the parameter. Even if we don't want to allow reference parameters by name in the format string, I think it's worthwhile. See below. It's really kind of a shame that we constantly re-parse the format strings. With something like modern C++ it'd be fairly easy to not do that (by using compile time evaluated constexprs when possible) - but I wonder if we couldn't also manage to do that within plain C, at the cost of a branch and some macro magic. I'm imagining something like #define pg_printf(fmt, ...) \ do { \ if ( __builtin_constant_p(fmt)) \ { \ static processed_fmt processed_fmt_ = {.format = fmt}; \ if (unlikely(!processed_fmt_.parsed)) \ preprocess_format_string(&processed_fmt_) \ pg_printf_processed(&processed_fmt_, __VA_ARGS__); \ } \ else \ pg_printf_unprocessed(fmt, __VA_ARGS__); \ } while (0) \ having to use a static variable for this purpose sucks somewhat, but I can't think of something in C that'd let us avoid that. > It'd be an amazing pain in the rear for translators, too, on account > of their tools not understanding this format-string language. The limitations of that world are starting to be really frustrating. > It seems to me that it'd be sufficient to do the annotation by > inserting wrapper functions, like the errparam() you suggest above. > If we just had errparam() choosing whether to return "..." instead of > its argument string, we'd have what we need, without messing with > the format language. I agree that if redaction were the only reason, that the expanded format strings wouldn't be worth it. And perhaps they still aren't. I think we've quite repeatedly had requests for a log format that can be parsed reasonably, and annotated with additional information that's too verbose for the main message. It's a pain to have to parse quite complex strings in production, just to extract the value of the actual information in the message. As an *easy* example of something I wanted to do in the past: Plot the total wait time of backends that had to wait for locks longer than deadlock_timeout. In that case, because we saw the system failing with too much waiting, but didn't really know when that started become problematic, and what the neormal amount of waiting is So we need to parse ereport(LOG, (errmsg("process %d acquired %s on %s after %ld.%03d ms", MyProcPid, modename, buf.data, msecs, usecs))); into something usable. First problem is of course that the log_line_prefix is variable, and that we'll likely need information from it to hunt down the source of locks. Parsing the above with regexes isn't too hard - but then we also need to deal with the fact that the message can be translated. Oh, and finding the associated statement requires looking forward in the log stream by a hard to predict amount, and requires knowledge about log verbosity was set to. In the past I had also wanted to parse the log to understand how much work manual and auto-vacuum an installation was doing, and whether that's perhaps correlated to problems. But for that one need a parser that can handle this mess: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/vacuumlazy.c;h=a3c4a1df3b4b28953ab1dfcd43f3d566eecac555;hb=HEAD#l396 which is obviously much harder. If we instead had the relevant variables separated out in a named fashion, we could - like many other tools these days - have a richer log target, that'd first have typical log message as a human readable string, but then also all the interpolated values separately. E.g. in theaforementioned simpler case it could be something like ts="2019-08-04 10:19:21.286 PDT" pid=32416 vxid=12/9 sev=LOG \ msg="Prozess 15027 erlangte AccessShareLock-Sperre auf Relation 1259 der Datenbank 13416 nach 29808,865 ms" \ fmt="process %d acquired %s on %s after %ld.%03d ms" \ p:modename="AccessShareLock-Sperre auf Relation 1259" \ p:msec=18343 p:usec=173 \ p:locktag=190345/134343/0/0/relation/default \ c:statement="LOCK TABLE pg_class;" \ l:func=ProcSleep l:file=proc.c l:line=1495 So, the prefix fields are just included as k=v, the evaluated message as one field, the *untranslated* format as another. All the parameters are included prefixed with p:, context information prefixed with c:, and location with l:. Note how e.g. p:modename is a pre-translated string, which is why p:locktag was added as an intentionall untranslated paramter that's not included in the message. This isn't perfect, but still a heck of a lot easier to parse than what we have now. All of the data is one block rather than in consecutive log lines that may or may not belong together, it's key=value in a way that can be parsed, the untranslated format string is available, so tools can actually analyze logs independent of the current locale. This isn't a proposal to adopt precisely the above log format - that's just something I came up with while writing this email - but to have enough information available to actually produce something like it when emitting logs. If we have enough information to pass to the logging hook, we don't even need to define how all of this is going to look like exactly (although I'd probably argue that a logfmt or json target ought to be in core). The cool part - and people from other projects/company projects might be laughing about me describing it as cool in this day and age - is that once you have a format that can readily be parsed it's pretty simple to write tools that process a logfile and extract just the information you want. Greetings, Andres Freund
pgsql-hackers by date: