Thread: Proposal: More structured logging
Hello, The logging system already captures a lot of information in the ErrorData. But at present there is no way for a log message authors to include more metadata about the log outside of the log message itself. For example, including the extension name which can be useful for filtering / dispatching log messages. This can be done in the log message itself, but that requires specialized parsing in the log output. Even though among the available loggers in core, only the csv logger would be able to make sense of it, I feel it would be beneficial to add a tagging system to logs, by adding different (tag, value) combinations to a log entry, with an API similar to what we do for other ErrorData fields: ereport(NOTICE, (errmsg("My log message")), (errtag("EMITTER", "MYEXTENSION")), (errtag("MSG-ID", "%d", error_message_id)) ); Please find attached a very small POC patch to better demonstrate what I propose. Third party logging hooks could then exploit those values to output them correctly. For example the json loggger written by Michael Paquier here: https://github.com/michaelpq/pg_plugins/tree/master/jsonlog, or the seeminlgy-abandonned journald hook here: https://github.com/intgr/pg_journal could add those information in a structured way. I think the pgaudit extension (or something similar) could also make good use of such a feature instead of dumping a CSV entry in the log file. As for Postgres own log messages, I'm sure we could find practical use cases for tagging messages like this. On a related note, the only structured logger we have in-core is the CSV logger. Many users nowadays end up feeding the logs to journald either by capturing the stderr output with systemd, or by having syslog implemented by journald itself. Would there be any interest in having native journald support as a logging_destination ? Best regards, -- Ronan Dunklau
Attachment
On 2021-Aug-13, Ronan Dunklau wrote: > ereport(NOTICE, > (errmsg("My log message")), > (errtag("EMITTER", "MYEXTENSION")), > (errtag("MSG-ID", "%d", error_message_id)) > ); Interesting idea. I agree this would be useful. > Please find attached a very small POC patch to better demonstrate what I > propose. Seems like a good start. I think a further step towards a committable patch would include a test module that uses the new tagging functionality. > On a related note, the only structured logger we have in-core is the CSV > logger. Many users nowadays end up feeding the logs to journald either by > capturing the stderr output with systemd, or by having syslog implemented by > journald itself. Would there be any interest in having native journald support > as a logging_destination ? I don't know anything about journald, but I agree that we sorely need some better way to emit structured log entries. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "The problem with the facetime model is not just that it's demoralizing, but that the people pretending to work interrupt the ones actually working." (Paul Graham)
On Thu, Aug 19, 2021 at 3:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Aug-13, Ronan Dunklau wrote: > > > ereport(NOTICE, > > (errmsg("My log message")), > > (errtag("EMITTER", "MYEXTENSION")), > > (errtag("MSG-ID", "%d", error_message_id)) > > ); > > Interesting idea. I agree this would be useful. > > > Please find attached a very small POC patch to better demonstrate what I > > propose. > > Seems like a good start. I think a further step towards a committable > patch would include a test module that uses the new tagging > functionality. > > > On a related note, the only structured logger we have in-core is the CSV > > logger. Many users nowadays end up feeding the logs to journald either by > > capturing the stderr output with systemd, or by having syslog implemented by > > journald itself. Would there be any interest in having native journald support > > as a logging_destination ? > > I don't know anything about journald, but I agree that we sorely need > some better way to emit structured log entries. +1. I think having the ability to natively send structured data to journald would be useful. Another thing I've noticed in more and more other products is to be able to log as json, which is then later thrown into a central logging system somewhere. Basically like csv, but with the schema defined in each row. Yes, a lot more overhead, but some systems really do like to consume json.... So when we're on the topic of more structured logging... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On 2021-Aug-19, Magnus Hagander wrote: > Another thing I've noticed in more and more other products is to be > able to log as json, which is then later thrown into a central logging > system somewhere. Basically like csv, but with the schema defined in > each row. Yes, a lot more overhead, but some systems really do like to > consume json.... So when we're on the topic of more structured > logging... Yeah, I was thinking in json logging too -- specifically thinking about Fluentbit and similar tools. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Le jeudi 19 août 2021, 15:04:30 CEST Alvaro Herrera a écrit : > On 2021-Aug-13, Ronan Dunklau wrote: > > ereport(NOTICE, > > > > (errmsg("My log message")), > > (errtag("EMITTER", "MYEXTENSION")), > > (errtag("MSG-ID", "%d", error_message_id)) > > > > ); > > Interesting idea. I agree this would be useful. > > > Please find attached a very small POC patch to better demonstrate what I > > propose. > > Seems like a good start. I think a further step towards a committable > patch would include a test module that uses the new tagging > functionality. Please find attached the original patch + a new one adding a test module. The test module exposes a function for generating logs with tags, and a log hook which format the tags in the DETAIL field for easy regression testing. Next step would be to emit those tags in the CSV logs. I'm not sure what representation they should have though: a nested csv in it's own column ? A key => value pairs list similar to hstore ? A json object ? Also we should probably make this available to the client too, but once again the format of the tag field needs to be defined. Any opinion ? -- Ronan Dunklau
Attachment
Le jeudi 19 août 2021, 16:50:10 CEST Alvaro Herrera a écrit : > On 2021-Aug-19, Magnus Hagander wrote: > > Another thing I've noticed in more and more other products is to be > > able to log as json, which is then later thrown into a central logging > > system somewhere. Basically like csv, but with the schema defined in > > each row. Yes, a lot more overhead, but some systems really do like to > > consume json.... So when we're on the topic of more structured > > logging... > > Yeah, I was thinking in json logging too -- specifically thinking about > Fluentbit and similar tools. Michael, your jsonlog module already fullfills this need. Is it something that should be merged into our tree ? -- Ronan Dunklau
Le vendredi 20 août 2021, 11:31:21 CEST Ronan Dunklau a écrit : > Le jeudi 19 août 2021, 15:04:30 CEST Alvaro Herrera a écrit : > > On 2021-Aug-13, Ronan Dunklau wrote: > > > ereport(NOTICE, > > > > > > (errmsg("My log message")), > > > (errtag("EMITTER", "MYEXTENSION")), > > > (errtag("MSG-ID", "%d", error_message_id)) > > > > > > ); > > > > Interesting idea. I agree this would be useful. > > > > > Please find attached a very small POC patch to better demonstrate what I > > > propose. > > > > Seems like a good start. I think a further step towards a committable > > patch would include a test module that uses the new tagging > > functionality. > > Please find attached the original patch + a new one adding a test module. > The test module exposes a function for generating logs with tags, and a log > hook which format the tags in the DETAIL field for easy regression testing. > > Next step would be to emit those tags in the CSV logs. I'm not sure what > representation they should have though: a nested csv in it's own column ? A > key => value pairs list similar to hstore ? A json object ? I opted for a JSON representation in the CSV logs, please find attached v3 which is the same thing as v2 with another patch for CSV log output. > > Also we should probably make this available to the client too, but once > again the format of the tag field needs to be defined. Any opinion ? -- Ronan Dunklau
Attachment
On Sat, Aug 21, 2021 at 2:37 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 20, 2021 at 11:35:29AM +0200, Ronan Dunklau wrote: > > Michael, your jsonlog module already fullfills this need. Is it something that > > should be merged into our tree ? > > Yes, there is nothing technically preventing to have this stuff in > core, of course, and that would even take care of the issues in > detecting if the piping protocol should be used or not. > > Now, the last time this was proposed, there was a lot of drawback > because the presence of the JSON keys increases significantly the size > of the logs compared to CSV, and usually users care a lot about the > size of the log files. I would support the presence of JSON format > for the logs in core, FWIW. As long as it's optional, I don't think that drawback holds as an argument. The same argument could be made against the cvs logs in the first place -- they add information to every row that a lot of people don't need. But it's optional. Leaving it up to the administrator to choose whether they prefer the verbose-and-easier-to-parse-maybe format or the more compact format seems like the right path for that. I bet quite a few would actually choose json -- easier to integrate with other systems (because newer systems love json), and unless you're actually logging a lot of queries (which many people don't), the size of the logfile is often not a concern at all. In short, I would also support the presence of JSON log format in core. (but as a proper log_destination of course -- or if it's time to actually split that into a separaet thing, being one parameter for log_destination and another for log_format) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Mon, Aug 23, 2021 at 11:33:09AM +0200, Magnus Hagander wrote: > As long as it's optional, I don't think that drawback holds as an > argument. The same argument could be made against the cvs logs in the > first place -- they add information to every row that a lot of people > don't need. But it's optional. Leaving it up to the administrator to > choose whether they prefer the verbose-and-easier-to-parse-maybe > format or the more compact format seems like the right path for that. From a code perspective, and while on it, we could split a bit elog.c and move the log entries generated for each format into their own file. That would be cleaner for CSV and JSON. As a whole I don't have an objection with moving the JSON format into core. If one proposes a patch, feel free to reuse the code from the module I have released. > I bet quite a few would actually choose json -- easier to integrate > with other systems (because newer systems love json), and unless > you're actually logging a lot of queries (which many people don't), > the size of the logfile is often not a concern at all. The module I publish on github to do this work is the most popular thing of my plugin repo, FWIW. It even gets bug reports, sometimes. > In short, I would also support the presence of JSON log format in > core. (but as a proper log_destination of course -- or if it's time to > actually split that into a separaet thing, being one parameter for > log_destination and another for log_format) What would you gain with a new parameter here? I think that this would be rather confusing and log_destination has been around for years. -- Michael
Attachment
On Tue, Aug 24, 2021 at 7:22 PM Michael Paquier <michael@paquier.xyz> wrote:
From a code perspective, and while on it, we could split a bit elog.c
and move the log entries generated for each format into their own
file. That would be cleaner for CSV and JSON. As a whole I don't
have an objection with moving the JSON format into core. If one
proposes a patch, feel free to reuse the code from the module I have
released.
I had a patch that does exactly this from a few years ago. It started off copying a bunch out of your json logging module and adds it as a new "jsonlog" destination.
It needed some cleanup due to bit rot, but it now builds and works atop master. I'll post it in its own thread.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Tue, Aug 31, 2021 at 10:46:30AM -0400, Sehrope Sarkuni wrote: > It needed some cleanup due to bit rot, but it now builds and works atop > master. I'll post it in its own thread. Thanks. -- Michael
Attachment
On 23.08.21 11:33, Magnus Hagander wrote: > In short, I would also support the presence of JSON log format in > core. (but as a proper log_destination of course -- or if it's time to > actually split that into a separaet thing, being one parameter for > log_destination and another for log_format) It would be useful if there was some kind of standardized schema for JSON logging, meaning what the keys should be named etc. Then we don't need to make this all up from scratch, and there could be some integration with existing log analysis tools.
On 13.08.21 15:23, Ronan Dunklau wrote: > The logging system already captures a lot of information in the ErrorData. But > at present there is no way for a log message authors to include more metadata > about the log outside of the log message itself. For example, including the > extension name which can be useful for filtering / dispatching log messages. > This can be done in the log message itself, but that requires specialized > parsing in the log output. > > Even though among the available loggers in core, only the csv logger would be > able to make sense of it, I feel it would be beneficial to add a tagging system > to logs, by adding different (tag, value) combinations to a log entry, with an > API similar to what we do for other ErrorData fields: > > ereport(NOTICE, > (errmsg("My log message")), > (errtag("EMITTER", "MYEXTENSION")), > (errtag("MSG-ID", "%d", error_message_id)) > ); What are some more examples of where this could be used? The extension name could be added more or less automatically to ereport() calls. I don't know what the MSG-ID is supposed to be. Are there other use cases?
Le mercredi 1 septembre 2021, 09:36:50 CEST Peter Eisentraut a écrit : > On 13.08.21 15:23, Ronan Dunklau wrote: > > The logging system already captures a lot of information in the ErrorData. > > But at present there is no way for a log message authors to include more > > metadata about the log outside of the log message itself. For example, > > including the extension name which can be useful for filtering / > > dispatching log messages. This can be done in the log message itself, but > > that requires specialized parsing in the log output. > > > > Even though among the available loggers in core, only the csv logger would > > be able to make sense of it, I feel it would be beneficial to add a > > tagging system to logs, by adding different (tag, value) combinations to > > a log entry, with an API similar to what we do for other ErrorData > > fields: > > > > ereport(NOTICE, > > > > (errmsg("My log message")), > > (errtag("EMITTER", "MYEXTENSION")), > > (errtag("MSG-ID", "%d", error_message_id)) > > > > ); > > What are some more examples of where this could be used? The extension > name could be added more or less automatically to ereport() calls. I > don't know what the MSG-ID is supposed to be. Are there other use cases? Adding it automatically would be nice, but how would you go about that ? In-core it would open up the possibility to split log messages into different fields, for example the different statistics reported in the logs by VACUUM / ANALYZE VERBOSE and make it easier to consume the output without having to parse the message. Parsing the message also means that any tool parsing it needs to either be aware of the locale, or to force the user to use a specific one. Out-of-core, the same things could be done for extensions like pg_audit which logs structured information into the message itself, leaving the message format to be parsed at a later time. -- Ronan Dunklau
On 01.09.21 10:00, Ronan Dunklau wrote: > In-core it would open up the possibility to split log messages into different > fields, for example the different statistics reported in the logs by VACUUM / > ANALYZE VERBOSE and make it easier to consume the output without having to > parse the message. Parsing the message also means that any tool parsing it > needs to either be aware of the locale, or to force the user to use a specific > one. I think those messages would themselves have substructure. For example, the current message "I/O timings: read: %.3f ms, write: %.3f ms\n" might be {"I/O timings": {"read": ..., "write": ...}} and that in turn is already part of a larger message. So just having a single level of tags wouldn't really work for this.
Le mercredi 8 septembre 2021, 11:51:31 CEST Peter Eisentraut a écrit : > On 01.09.21 10:00, Ronan Dunklau wrote: > > In-core it would open up the possibility to split log messages into > > different fields, for example the different statistics reported in the > > logs by VACUUM / ANALYZE VERBOSE and make it easier to consume the output > > without having to parse the message. Parsing the message also means that > > any tool parsing it needs to either be aware of the locale, or to force > > the user to use a specific one. > > I think those messages would themselves have substructure. For example, > the current message > > "I/O timings: read: %.3f ms, write: %.3f ms\n" > > might be > > {"I/O timings": {"read": ..., "write": ...}} > > and that in turn is already part of a larger message. > > So just having a single level of tags wouldn't really work for this. I agree having a nested structure may feel more natural, but I don't think it would matter too much if we standardize on ini-style property names (ie, iotimings.read, iotimings.write and so on...) Regards, -- Ronan Dunklau
Le mercredi 29 décembre 2021, 14:59:16 CET Justin Pryzby a écrit : > > Subject: [PATCH v3 2/3] Add test module for the new tag functionality. > > ... > > > +test_logging(PG_FUNCTION_ARGS) > > +{ > > ... > > > + (errmsg("%s", message), > > + ({ > > + forboth(lk, keys, lv, values) > > + { > > + (errtag(lfirst(lk), "%s", (char *) lfirst(lv))); > > + }}) > > + )); > > The windows build fails with that syntax. > http://cfbot.cputube.org/ronan-dunklau.html > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157923 Thank you. I switched to an explicit sequence of errstart / errfinish to avoid putting too much things in nested macro calls. As I don't have any Windows knowledge, I am very grateful for the new cirrus-ci integration which allowed me to build on Windows without hassle. > > Subject: [PATCH v3 3/3] Output error tags in CSV logs > > +++ b/doc/src/sgml/config.sgml > > @@ -7370,6 +7371,7 @@ CREATE TABLE postgres_log > > > > backend_type text, > > leader_pid integer, > > query_id bigint, > > > > + tags jsonb > > > > PRIMARY KEY (session_id, session_line_num) > > > > ); > > </programlisting> > > That's invalid sql due to missing a trailing ",". Thanks, fixed. > > You should also update file-fdw.sgml - which I only think of since we forgot > in to update it before e568ed0eb and 0830d21f5. config.sgml should have a > comment as a reminder to do that. Done, and I added anoher commit per your suggestion to add this comment. Thank you for this review. Regards, -- Ronan Dunklau
Attachment
Le samedi 15 janvier 2022, 07:09:59 CET Julien Rouhaud a écrit : > Hi, > > On Tue, Jan 11, 2022 at 11:05:26AM +0100, Ronan Dunklau wrote: > > Done, and I added anoher commit per your suggestion to add this comment. > > The cfbot reports that the patchset doesn't apply anymore: > > http://cfbot.cputube.org/patch_36_3293.log > === Applying patches on top of PostgreSQL commit ID > 43c2175121c829c8591fc5117b725f1f22bfb670 === [...] > === applying patch ./v4-0003-Output-error-tags-in-CSV-logs.patch > [...] > patching file src/backend/utils/error/elog.c > Hunk #2 FAILED at 3014. > 1 out of 2 hunks FAILED -- saving rejects to file > src/backend/utils/error/elog.c.rej > > Could you send a rebased version? In the meantime I will switch the cf > entry to Waiting on Author. Thank you for this notification ! The csvlog has been refactored since I wrote the patch, and the new jsonlog destination has been introduced. I rebased to fix the first patch, and added a new patch to add support for tags in jsonlog output. Best regards, -- Ronan Dunklau
Attachment
Le lundi 17 janvier 2022, 09:18:04 CET Ronan Dunklau a écrit : > Le samedi 15 janvier 2022, 07:09:59 CET Julien Rouhaud a écrit : > > Hi, > > > > On Tue, Jan 11, 2022 at 11:05:26AM +0100, Ronan Dunklau wrote: > > > Done, and I added anoher commit per your suggestion to add this comment. > > > > The cfbot reports that the patchset doesn't apply anymore: > > > > http://cfbot.cputube.org/patch_36_3293.log > > === Applying patches on top of PostgreSQL commit ID > > 43c2175121c829c8591fc5117b725f1f22bfb670 === [...] > > === applying patch ./v4-0003-Output-error-tags-in-CSV-logs.patch > > [...] > > patching file src/backend/utils/error/elog.c > > Hunk #2 FAILED at 3014. > > 1 out of 2 hunks FAILED -- saving rejects to file > > src/backend/utils/error/elog.c.rej > > > > Could you send a rebased version? In the meantime I will switch the cf > > entry to Waiting on Author. > > Thank you for this notification ! > > The csvlog has been refactored since I wrote the patch, and the new jsonlog > destination has been introduced. I rebased to fix the first patch, and added > a new patch to add support for tags in jsonlog output. > > Best regards, Hum, there was a missing import in csvlog.c from the fix above. Sorry about that. -- Ronan Dunklau
Attachment
Le jeudi 27 janvier 2022, 08:15:01 CET Michael Paquier a écrit : > On Tue, Jan 18, 2022 at 06:46:03AM +0100, Ronan Dunklau wrote: > > Hum, there was a missing import in csvlog.c from the fix above. Sorry > > about > > that. > > +<!-- Don't forget to also update file_fdw.sgml if you change this > table definition --> > <programlisting> > > You are also forgetting that the table listing all the jsonlog fields > needs a refresh with this new key called "tags", and that it has a > JSON object underneath. Done. > > If you want to test all the formats supported by logging_destination, > wouldn't this stuff be better if tested through TAP with > logging_destination set with csvlog, jsonlog and stderr? This is > lacking coverage for csvlog.c and jsonlog.c, paths touched by your > patch. Done, I also added coverage for log_verbosity = verbose while I was at it, and fixed a bug I introduced in csvlog while rebasing... -- Ronan Dunklau
Attachment
1) I would like an interface which more or less guarantees that *every* parameter of the log message is included in the structured data. Ideally there should be no actual need to generate the formatted messages for destinations like elastic search, just record the message id and the parameters. To that end I would thing something like: ereport(DEBUG2, (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", vacrel->relname, (long long) index, vacuumed_pages))); Would end up like: ereport(DEBUG2, (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", errtag("relname", vacrel->relname), errtag("dead_item_pointers", (long long) index), errtag("vacuumed_pages", vacuumed_pages)))); And any untagged parameters could be included in the structured data with unknown tag names. (Or perhaps some macro magic involving # but I'm not sure that would be possible with va_args.) 2) I was kind of hoping we could have some sort of list of known tag labels that could be used. So instead of an arbitrary string for the label name you just referenced the known labels. Then we could document that whenever "relid" appears as a tag id it's the oid of a relation, for example. This would help with things like "search elastic search for all log entries for relid x, block p". Which wouldn't work if block is sometimes "page" and sometimes "blocknum" and sometimes "block".... So I was imagining something like errtag(TAG_RELID, relid) 3) When it comes to things like queryid it would be helpful to be able to inject tags from higher in the call stack so that invididual ereport sites don't need to include it and we can decide to add new things like it in the future. This will be useful for things like distributed tracing tools who need to inject a span id into the every log message. I imagine it could also be useful more generally. Application developers might want to inject things like a pipeline identifier from their CI tools and then pull any logs with that identifier into their CI reports. Yes, they could be added as new columns in CSV and top level json attributes but we may well have new tags that are not always there or are even created on user request. Or we may have cases where there can be more than one active at a time. 4) As far as code ... this is probably premature microoptimization but I do find the tags all being individually palloced and all going through string formatting seems like a lot of overhead. Like the actual number of errtag() calls in the ereport is always going to be the same and the values are probably always going to be a single parameter, not really using the formatting functionality. I don't necessarily have a better alternative though. The only thing I can think of isn't really any better: errtag_str(RELNAME, relname) errtag_int(RELID, relid) (possibly with some magic where there's an errtag() function that looks up which data type to expect for a given tag id). It looks strange to me that the errorTag struct has a "const char *tagname" but a "char *tagvalue". I think this is a side effect of the code and not actually a sensible way to define the struct. Surely they should both be const?
On Mon, Jan 31, 2022 at 05:46:29PM -0500, Greg Stark wrote: > It looks strange to me that the errorTag struct has a "const char > *tagname" but a "char *tagvalue". I think this is a side effect of the > code and not actually a sensible way to define the struct. Surely they > should both be const? This review has not been answered, and it was around for two months now. I am marking this patch as returned with feedback. -- Michael