Thread: Add jsonlog log_destination for JSON server logs

Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
Hi,

This patch adds a new log_destination, "jsonlog", that writes log entries as lines of JSON. It was originally started by David Fetter using the jsonlog module by Michael Paquier (https://github.com/michaelpq/pg_plugins/blob/master/jsonlog/jsonlog.c) as a basis for how to serialize the log messages. Thanks to both of them because this wouldn't be possible without that starting point.

The first commit splits out the destination in log pipe messages into its own field. Previously it would piggyback on the "is_last" field. This adds an int to the message size but makes the rest of the code easier to follow.

The second commit adds a TAP test for log_destination "csvlog". This was done to both confirm that the previous change didn't break anything and as a skeleton for the test in the next commit.

The third commit adds the new log_destination "jsonlog". The output format is one line per entry with the top level output being a JSON object keyed with the log fields. Newlines in the output fields are escaped as \n so the output file has exactly one line per log entry. It also includes a new test for verifying the JSON output with some basic regex checks (similar to the csvlog test).

Here's a sample of what the log entries look like:

{"timestamp":"2021-08-31 10:15:25.129 EDT","user":"sehrope","dbname":"postgres","pid":12012,"remote_host":"[local]","session_id":"612e397d.2eec","line_num":1,"ps":"idle","session_start":"2021-08-31 10:15:25 EDT","vxid":"3/2","txid":"0","error_severity":"LOG","application_name":"006_jsonlog.pl","message":"statement: SELECT 1/0"}

It builds and passes "make check-world" on Linux. It also includes code to handle Windows as well but I have not actually tried building it there.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Tue, Aug 31, 2021 at 11:34:56AM -0400, Sehrope Sarkuni wrote:
> The second commit adds a TAP test for log_destination "csvlog". This was
> done to both confirm that the previous change didn't break anything and as
> a skeleton for the test in the next commit.

+note "Before sleep";
+usleep(100_000);
+note "Before rotate";
+$node->logrotate();
+note "After rotate";
+usleep(100_000);

Do you really need a rotation of the log files here?  Wouldn't it be
better to grab the position of the current log file with a fixed log
file name, and then slurp the file from this position with your
expected output?  That would make the test faster, as well.

> The third commit adds the new log_destination "jsonlog". The output format
> is one line per entry with the top level output being a JSON object keyed
> with the log fields. Newlines in the output fields are escaped as \n so the
> output file has exactly one line per log entry. It also includes a new test
> for verifying the JSON output with some basic regex checks (similar to the
> csvlog test).

+ * Write logs in json format.
+ */
+static void
+write_jsonlog(ErrorData *edata)
+{
Rather than making elog.c larger, I think that we should try to split
that into more files.  Why not refactoring out the CSV part first?
You could just call that csvlog.c, then create a new jsonlog.c for the
meat of the patch.

The list of fields is not up to date.  At quick glance, you are
missing:
- backend type.
- leader PID.
- query ID.
- Session start timestamp (?)
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
On Tue, Aug 31, 2021 at 8:43 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Aug 31, 2021 at 11:34:56AM -0400, Sehrope Sarkuni wrote:
> The second commit adds a TAP test for log_destination "csvlog". This was
> done to both confirm that the previous change didn't break anything and as
> a skeleton for the test in the next commit.

+note "Before sleep";
+usleep(100_000);
+note "Before rotate";
+$node->logrotate();
+note "After rotate";
+usleep(100_000);

Do you really need a rotation of the log files here?  Wouldn't it be
better to grab the position of the current log file with a fixed log
file name, and then slurp the file from this position with your
expected output?  That would make the test faster, as well.

Yes, that was intentional. I used the log rotation tap test as a base and kept that piece in there so it verifies that the csv log files are actually rotated. Same for the json logs.
 
Rather than making elog.c larger, I think that we should try to split
that into more files.  Why not refactoring out the CSV part first?
You could just call that csvlog.c, then create a new jsonlog.c for the
meat of the patch.

That's a good idea. I'll try that out.

The list of fields is not up to date.  At quick glance, you are
missing:
- backend type.
- leader PID.
- query ID.
- Session start timestamp (?)

Thanks. I'll take a look.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 

Re: Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
Updated patch set is attached.

This version splits out the existing csvlog code into its own file and centralizes the common helpers into a new elog-internal.h so that they're only included by the actual write_xyz sources.

That makes the elog.c changes in the JSON logging patch minimal as all it's really doing is invoking the new write_jsonlog(...) function.

It also adds those missing fields to the JSON logger output.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:
> That makes the elog.c changes in the JSON logging patch minimal as all it's
> really doing is invoking the new write_jsonlog(...) function.

Looking at 0001, to do things in order.

> @@ -46,8 +46,8 @@ typedef struct
>      char        nuls[2];        /* always \0\0 */
>      uint16        len;            /* size of this chunk (counts data only) */
>      int32        pid;            /* writer's pid */
> -    char        is_last;        /* last chunk of message? 't' or 'f' ('T' or
> -                                 * 'F' for CSV case) */
> +    int32        dest;            /* log destination */
> +    char        is_last;        /* last chunk of message? 't' or 'f'*/
>      char        data[FLEXIBLE_ARRAY_MEMBER];    /* data payload starts here */
>  } PipeProtoHeader;

Making PipeProtoHeader larger is not free, and that could penalize
workloads with a lot of short messages and many backends as the
syslogger relies on pipes with sync calls.  Why not switching is_last
to bits8 flags instead?  That should be enough for the addition of
JSON.  3 bits are enough at the end: one to know if it is the last
chunk of message, one for CSV and one for JSON.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Andrew Dunstan
Date:
On 9/8/21 2:58 AM, Michael Paquier wrote:
> On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:
>> That makes the elog.c changes in the JSON logging patch minimal as all it's
>> really doing is invoking the new write_jsonlog(...) function.
> Looking at 0001, to do things in order.
>
>> @@ -46,8 +46,8 @@ typedef struct
>>      char        nuls[2];        /* always \0\0 */
>>      uint16        len;            /* size of this chunk (counts data only) */
>>      int32        pid;            /* writer's pid */
>> -    char        is_last;        /* last chunk of message? 't' or 'f' ('T' or
>> -                                 * 'F' for CSV case) */
>> +    int32        dest;            /* log destination */
>> +    char        is_last;        /* last chunk of message? 't' or 'f'*/
>>      char        data[FLEXIBLE_ARRAY_MEMBER];    /* data payload starts here */
>>  } PipeProtoHeader;
> Making PipeProtoHeader larger is not free, and that could penalize
> workloads with a lot of short messages and many backends as the
> syslogger relies on pipes with sync calls.  Why not switching is_last
> to bits8 flags instead?  That should be enough for the addition of
> JSON.  3 bits are enough at the end: one to know if it is the last
> chunk of message, one for CSV and one for JSON.



Yeah. A very simple change would be to use two different values for json
(say 'y' and 'n'). A slightly more principled scheme might use the top
bit for the end marker and the bottom 3 bits for the dest type (so up to
8 types possible), with the rest available for future use.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Wed, Sep 08, 2021 at 08:46:44AM -0400, Andrew Dunstan wrote:
> Yeah. A very simple change would be to use two different values for json
> (say 'y' and 'n'). A slightly more principled scheme might use the top
> bit for the end marker and the bottom 3 bits for the dest type (so up to
> 8 types possible), with the rest available for future use.

I was thinking to track stderr as a case where no bits are set in the
flags for the area of the destinations, but that's a bit crazy if we
have a lot of margin in what can be stored.  I have looked at that and
finished with the attached which is an improvement IMO, especially
when it comes to the header validation.

FWIW, while looking at my own external module for the JSON logs, I
noticed that I used the chunk protocol when the log redirection is
enabled, but just enforced everything to be sent to stderr.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Greg Stark
Date:
Fwiw I was shocked when I saw the t/f T/F kluge when I went to work on
jsonlogging. That's the kind of dead-end short-sighted hack that just
lays traps and barriers for future hackers to have to clean up before
they can do the work they want to do.

Please just put a "format" field (or "channel" field -- the logging
daemon doesn't really care what format) with a list of defined formats
that can easily be extended in the future. If you want to steal the
high bit for "is last" and only allow 128 values instead of 256 so be
it.



Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Wed, Sep 08, 2021 at 10:58:51PM -0400, Greg Stark wrote:
> Please just put a "format" field (or "channel" field -- the logging
> daemon doesn't really care what format) with a list of defined formats
> that can easily be extended in the future. If you want to steal the
> high bit for "is last" and only allow 128 values instead of 256 so be
> it.

Which is what I just posted here:
https://www.postgresql.org/message-id/YTlunSciDRl1z7ik@paquier.xyz

Well, we could also do things so as we have two fields, as of
something like:
typedef struct
{
[...]
bits8    flags:4, format:4;
[...]
} PipeProtoHeader;

I am not sure if this is an improvement in readability though, and
that's less consistent with the recent practice we've been trying to
follow with bitmasks and flag-like options.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Thu, Sep 09, 2021 at 11:17:01AM +0900, Michael Paquier wrote:
> I was thinking to track stderr as a case where no bits are set in the
> flags for the area of the destinations, but that's a bit crazy if we
> have a lot of margin in what can be stored.  I have looked at that and
> finished with the attached which is an improvement IMO, especially
> when it comes to the header validation.

One part that was a bit flacky after more consideration is that the
header validation would consider as correct the case where both stderr
and csvlog are set in the set of flags.  I have finished by just using
pg_popcount() on one byte with a filter on the log destinations,
making the whole more robust.  If there are any objections, please let
me know.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:
> This version splits out the existing csvlog code into its own file and
> centralizes the common helpers into a new elog-internal.h so that they're
> only included by the actual write_xyz sources.
>
> That makes the elog.c changes in the JSON logging patch minimal as all it's
> really doing is invoking the new write_jsonlog(...) function.
>
> It also adds those missing fields to the JSON logger output.

Forking a bit this thread while looking at 0002 that adds new tests
for csvlog.  While I agree that it would be useful to have more
coverage with the syslogger message chunk protocol in this area, I
think that having a separate test is a waste of resources.  Creating a
new node is not cheap either, and this adds more wait phases, making
the tests take longer.  It would be much better to extend
004_logrotate.pl and update it to use log_destination = 'stderr,
csvlog', to minimize the number of nodes we create as well as the
additional amount of time we'd spend for those tests.  Plugging in
JSON into that would not be complicated either once we have in place a
set of small routines that limit the code duplication between the
checks for each log destination type.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Fri, Sep 10, 2021 at 01:07:00PM +0900, Michael Paquier wrote:
> Forking a bit this thread while looking at 0002 that adds new tests
> for csvlog.  While I agree that it would be useful to have more
> coverage with the syslogger message chunk protocol in this area, I
> think that having a separate test is a waste of resources.  Creating a
> new node is not cheap either, and this adds more wait phases, making
> the tests take longer.  It would be much better to extend
> 004_logrotate.pl and update it to use log_destination = 'stderr,
> csvlog', to minimize the number of nodes we create as well as the
> additional amount of time we'd spend for those tests.  Plugging in
> JSON into that would not be complicated either once we have in place a
> set of small routines that limit the code duplication between the
> checks for each log destination type.

And this part leads me to the attached, where the addition of the JSON
format would result in the addition of a couple of lines.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Fri, Sep 10, 2021 at 03:56:18PM +0900, Michael Paquier wrote:
> And this part leads me to the attached, where the addition of the JSON
> format would result in the addition of a couple of lines.

Okay, I have worked through the first half of the patch set, and
applied the improved versions of 0001 (refactoring of the chunk
protocol) and 0002 (addition of the tests for csvlog).

I have not looked in details at 0003 and 0004 yet.  Still, here are
some comments after a quick scan.

+ * elog-internal.h
I'd rather avoid the hyphen, and use elog_internal.h.

+#define FORMATTED_TS_LEN 128
+extern char formatted_start_time[FORMATTED_TS_LEN];
+extern char formatted_log_time[FORMATTED_TS_LEN];
+
+void setup_formatted_log_time(void);
+void setup_formatted_start_time(void);
We could just use a static buffer in each one of those routines, and
return back a pointer to the caller.

+   else if ((Log_destination & LOG_DESTINATION_JSONLOG) &&
+       (jsonlogFile == NULL ||
+        time_based_rotation || (size_rotation_for & LOG_DESTINATION_JSONLOG)))
[...]
+   /* Write to JSON log if enabled */
+   else if (Log_destination & LOG_DESTINATION_JSONLOG)
+   {
Those bits in 0004 are wrong.  They should be two "if" clauses.  This
is leading to issues when setting log_destination to multiple values
with jsonlog in the set of values and logging_connector = on, and the
logs are not getting redirected properly to the .json file.  We would
get the data for the .log and .csv files though.  This issue also
happens with the original patch set applied on top of e757080.   I
think that we should also be more careful with the way we free
StringInfoData in send_message_to_server_log(), or we are going to
finish with unwelcome leaks.

The same coding pattern is now repeated three times in logfile_rotate():
- Check if a logging type is enabled.
- Optionally open new file, with logfile_open().
- Business with ENFILE and EMFILE.
- pfree() and save of the static FILE* ane file name for each type.
I think that we have enough material for a wrapper routine that does
this work, where we pass down LOG_DESTINATION_* and pointers to the
static FILE* and the static last file names.  That would have avoided
the bug I found above.

The rebased patch set, that has reworked the tests to be in line with
HEAD, also fails.  They compile.

Sehrope, could you adjust that?
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
On Sun, Sep 12, 2021 at 10:22 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 10, 2021 at 03:56:18PM +0900, Michael Paquier wrote:
> And this part leads me to the attached, where the addition of the JSON
> format would result in the addition of a couple of lines.

Okay, I have worked through the first half of the patch set, and
applied the improved versions of 0001 (refactoring of the chunk
protocol) and 0002 (addition of the tests for csvlog).

Thanks. I finally got a chance to look through those changes. I like it. The popcount and pulling out the flags are much easier to follow than the old hard coded letters.
 
I have not looked in details at 0003 and 0004 yet.  Still, here are
some comments after a quick scan.

+ * elog-internal.h
I'd rather avoid the hyphen, and use elog_internal.h.

+#define FORMATTED_TS_LEN 128
+extern char formatted_start_time[FORMATTED_TS_LEN];
+extern char formatted_log_time[FORMATTED_TS_LEN];
+
+void setup_formatted_log_time(void);
+void setup_formatted_start_time(void);
We could just use a static buffer in each one of those routines, and
return back a pointer to the caller.

+1
 
+   else if ((Log_destination & LOG_DESTINATION_JSONLOG) &&
+       (jsonlogFile == NULL ||
+        time_based_rotation || (size_rotation_for & LOG_DESTINATION_JSONLOG)))
[...]
+   /* Write to JSON log if enabled */
+   else if (Log_destination & LOG_DESTINATION_JSONLOG)
+   {
Those bits in 0004 are wrong.  They should be two "if" clauses.  This
is leading to issues when setting log_destination to multiple values
with jsonlog in the set of values and logging_connector = on, and the
logs are not getting redirected properly to the .json file.  We would
get the data for the .log and .csv files though.  This issue also
happens with the original patch set applied on top of e757080.   I
think that we should also be more careful with the way we free
StringInfoData in send_message_to_server_log(), or we are going to
finish with unwelcome leaks.

Good catch. Staring at that piece again, that's tricky as it tries to aggressively free the buffer before calling write_cvslog(...). Which can't just be duplicated for additional destinations.

I think we need to pull up the negative case (i.e. syslogger not available) before the other destinations and if it matches, free and exit early. Otherwise, free the buffer and call whatever destination routines are enabled.
 
The same coding pattern is now repeated three times in logfile_rotate():
- Check if a logging type is enabled.
- Optionally open new file, with logfile_open().
- Business with ENFILE and EMFILE.
- pfree() and save of the static FILE* ane file name for each type.
I think that we have enough material for a wrapper routine that does
this work, where we pass down LOG_DESTINATION_* and pointers to the
static FILE* and the static last file names.  That would have avoided
the bug I found above.

I started on a bit of this as well. There's so much overlap already between the syslog_ and csvlog code that I'm going to put that together first. Then the json addition should just fall into a couple of call sites.

I'm thinking of adding an internal struct to house the FILE* and file names. Then all the opening, closing, and log rotation code can pass pointers to those and centralize the pfree() and NULL checks. 
 
The rebased patch set, that has reworked the tests to be in line with
HEAD, also fails.  They compile.

Sehrope, could you adjust that?

Yes I'm looking to sync those up and address the above later this week.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Mon, Sep 13, 2021 at 11:56:52PM -0400, Sehrope Sarkuni wrote:
> Good catch. Staring at that piece again, that's tricky as it tries to
> aggressively free the buffer before calling write_cvslog(...). Which can't
> just be duplicated for additional destinations.
>
> I think we need to pull up the negative case (i.e. syslogger not available)
> before the other destinations and if it matches, free and exit early.
> Otherwise, free the buffer and call whatever destination routines are
> enabled.

Yes, I got a similar impression.

> I started on a bit of this as well. There's so much overlap already between
> the syslog_ and csvlog code that I'm going to put that together first. Then
> the json addition should just fall into a couple of call sites.
>
> I'm thinking of adding an internal struct to house the FILE* and file
> names. Then all the opening, closing, and log rotation code can pass
> pointers to those and centralize the pfree() and NULL checks.

Agreed on both points (using a structure and doing the refactoring as
a first patch).
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
Attached three patches refactor the syslogger handling of file based destinations a bit ... and then a lot.

First patch adds a new constant to represent both file based destinations. This should make it easier to ensure additional destinations are handled in "For all file destinations..." situations (e.g. when we add the jsonlog destination).

Second patch refactors the file descriptor serialization and re-opening in EXEC_BACKEND forking. Previously the code was duplicated for both stderr and csvlog. Again, this should simplify adding new destinations as they'd just invoke the same helper. There's an existing comment about not handling failed opens in syslogger_parseArgs(...) and this patch doesn't fix that, but it does provide a single location to do so in the future.

The third patch adds a new internal (to syslogger.c) structure, FileLogDestination, for file based log destinations and modifies the existing handling for syslogFile and csvlogFile to defer to a bunch of helper functions. It also renames "syslogFile" to "stderr_file_info". Working through this patch, it was initially confusing that the stderr log file was named "syslogFile", the C file is named syslogger.c, and there's an entirely separate syslog (the message logging API) destination. I think this clears that up a bit.

The patches pass check-world on Linux. I haven't tested it on Windows but it does pass check-world with EXEC_BACKEND defined to try out the forking code paths. Definitely needs some review to ensure it's functionally correct for the different log files.

I haven't tried splitting the csvlog code out or adding the new jsonlog atop this yet. There's enough changes here that I'd like to get this settled first.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
The previous patches failed on the cfbot Appveyor Windows build. That also makes me question whether I did the EXEC_BACKEND testing correctly as it should have caught that compile error. I'm going to look into that.

In the meantime the attached should correct that part of the build.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Thu, Sep 16, 2021 at 05:27:20PM -0400, Sehrope Sarkuni wrote:
> Attached three patches refactor the syslogger handling of file based
> destinations a bit ... and then a lot.
>
> First patch adds a new constant to represent both file based destinations.
> This should make it easier to ensure additional destinations are handled in
> "For all file destinations..." situations (e.g. when we add the jsonlog
> destination).
>
> Second patch refactors the file descriptor serialization and re-opening in
> EXEC_BACKEND forking. Previously the code was duplicated for both stderr
> and csvlog. Again, this should simplify adding new destinations as they'd
> just invoke the same helper. There's an existing comment about not handling
> failed opens in syslogger_parseArgs(...) and this patch doesn't fix that,
> but it does provide a single location to do so in the future.
>
> The third patch adds a new internal (to syslogger.c) structure,
> FileLogDestination, for file based log destinations and modifies the
> existing handling for syslogFile and csvlogFile to defer to a bunch of
> helper functions. It also renames "syslogFile" to "stderr_file_info".
> Working through this patch, it was initially confusing that the stderr log
> file was named "syslogFile", the C file is named syslogger.c, and there's
> an entirely separate syslog (the message logging API) destination. I think
> this clears that up a bit.
>
> The patches pass check-world on Linux. I haven't tested it on Windows but
> it does pass check-world with EXEC_BACKEND defined to try out the forking
> code paths. Definitely needs some review to ensure it's functionally
> correct for the different log files.

Compilation with EXEC_BACKEND on Linux should work, so no need to
bother with Windows when it comes to that.  There is a buildfarm
machine testing this configuration, for example.

> I haven't tried splitting the csvlog code out or adding the new jsonlog
> atop this yet. There's enough changes here that I'd like to get this
> settled first.

Makes sense.

I am not really impressed by 0001 and the introduction of
LOG_DESTINATIONS_WITH_FILES.  So I would leave that out and keep the
list of destinations listed instead.  And we are talking about two
places here, only within syslogger.c.

0002 looks like an improvement, but I think that 0003 makes the
readability of the code worse with the introduction of eight static
routines to handle all those cases.

file_log_dest_should_rotate_for_size(), file_log_dest_close(),
file_log_dest_check_rotate_for_size(), get_syslogger_file() don't
bring major improvements.  On the contrary,
file_log_dest_write_metadata and file_log_dest_rotate seem to reduce a
bit the noise.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Sehrope Sarkuni
Date:
On Thu, Sep 16, 2021 at 9:36 PM Michael Paquier <michael@paquier.xyz> wrote:
I am not really impressed by 0001 and the introduction of
LOG_DESTINATIONS_WITH_FILES.  So I would leave that out and keep the
list of destinations listed instead.  And we are talking about two
places here, only within syslogger.c.

I've taken that out for now. The idea was to simplify the additions when jsonlog is added but can circle back to that later if it makes sense.
  
0002 looks like an improvement,

Nice. That's left unchanged (renamed to 0001 now).
 
but I think that 0003 makes the
readability of the code worse with the introduction of eight static
routines to handle all those cases.

file_log_dest_should_rotate_for_size(), file_log_dest_close(),
file_log_dest_check_rotate_for_size(), get_syslogger_file() don't
bring major improvements.  On the contrary,
file_log_dest_write_metadata and file_log_dest_rotate seem to reduce a
bit the noise.

It was helpful to split them out while working on the patch but I see your point upon re-reading through the result.

Attached version (renamed to 002) adds only three static functions for checking rotation size, performing the actual rotation, and closing. Also one other to handle generating the logfile names with a suffix to handle the pfree-ing (wrapping logfile_open(...)).

The rest of the changes happen in place using the new structures.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 
Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Fri, Sep 17, 2021 at 04:36:57PM -0400, Sehrope Sarkuni wrote:
> It was helpful to split them out while working on the patch but I see your
> point upon re-reading through the result.
>
> Attached version (renamed to 002) adds only three static functions for
> checking rotation size, performing the actual rotation, and closing. Also
> one other to handle generating the logfile names with a suffix to handle
> the pfree-ing (wrapping logfile_open(...)).
>
> The rest of the changes happen in place using the new structures.

I have looked at that in details, and found that the introduction of
FileLogDestination makes the code harder to follow, and that the
introduction of the file extension, the destination name and the
expected target destination LOG_DESTINATION_* had a limited impact
because they are used in few places.  The last two useful pieces are
the FILE* handle and the last file name for current_logfiles.

Attached are updated patches.  The logic of 0001 to refactor the fd
fetch/save logic when forking the syslogger in EXEC_BACKEND builds is
unchanged.  I have tweaked the patch with more comments and different
routine names though.  Patch 0002 refactors the main point that
introduced FileLogDestination by refactoring the per-destination file
rotation, not forgetting the fact that the file last name and handle
for stderr can never be cleaned up even if LOG_DESTINATION_STDERR is
disabled.  Grepping after LOG_DESTINATION_CSVLOG in the code tree, I'd
be fine to live with this level of abstraction as each per-destination
change are grouped with each other so they are hard to miss.

0001 is in a rather commitable shape, and I have made the code
consistent with HEAD.  However, I think that its handling of
_get_osfhandle() is clunky for 64-bit compilations as long is 32b in
WIN32 but intptr_t is platform-dependent as it could be 32b or 64b, so
atoi() would overflow if the handle is larger than INT_MAX for 64b
builds:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/standard-types
This problem deserves a different thread.

It would be good for 0002 if an extra pair of eyes looks at it.  While
on it, I have renamed the existing last_file_name to
last_sys_file_name in 0002 to make the naming more consistent with
syslogFile.  It is independent of 0001, so it could be done first as
well.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Tue, Sep 28, 2021 at 12:30:10PM +0900, Michael Paquier wrote:
> 0001 is in a rather commitable shape, and I have made the code
> consistent with HEAD.  However, I think that its handling of
> _get_osfhandle() is clunky for 64-bit compilations as long is 32b in
> WIN32 but intptr_t is platform-dependent as it could be 32b or 64b, so
> atoi() would overflow if the handle is larger than INT_MAX for 64b
> builds:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/standard-types
> This problem deserves a different thread.

This happens to not be a problem as only 32 bits are significant for
handles for both Win32 and Win64.  This also means that we should be
able to remove the use for "long" in this code, making the routines
more symmetric.  I have done more tests with Win32 and Win64, and
applied it.  I don't have MinGW environments at hand, but looking at
the upstream code that should not matter.  The buildfarm will let
us know soon enough if there is a problem thanks to the TAP tests of
pg_ctl.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Wed, Sep 29, 2021 at 11:02:10AM +0900, Michael Paquier wrote:
> This happens to not be a problem as only 32 bits are significant for
> handles for both Win32 and Win64.  This also means that we should be
> able to remove the use for "long" in this code, making the routines
> more symmetric.  I have done more tests with Win32 and Win64, and
> applied it.  I don't have MinGW environments at hand, but looking at
> the upstream code that should not matter.  The buildfarm will let
> us know soon enough if there is a problem thanks to the TAP tests of
> pg_ctl.

So, I have been looking at the rest of the patch set for the last
couple of days, and I think that I have spotted all the code paths
that need to be smarter when it comes to multiple file-based log
destinations.  Attached is a new patch set:
- 0001 does some refactoring of the file rotation in syslogger.c,
that's the same patch as previously posted.
- 0002 is more refactoring of elog.c, adding routines for the start
timestamp, log timestamp, the backend type and an extra one to check
if a query can be logged or not.
- 0003 is a change to send_message_to_server_log() to be smarter
regarding the fallback to stderr if a csvlog (or a jsonlog!) entry
cannot be logged because the redirection is not ready yet.  The code
of HEAD processes first stderr, then csvlog, with csvlog moving back
to stderr if not done yet.  That's a bit strange, because for example
on WIN32 we would lose any csvlog entry for a service.  I propose here
to do csvlog first, and fallback to stderr so as it gets done in one
code path instead of two.  I have spent quite a bit of time thinking
about the best way to handle the case of multiple file log
destinations here because we don't want to log multiple times to
stderr if csvlog and jsonlog are both enabled.  And I think that this
is the simplest thing we could do.
- 0004 moves the CSV-specific code into its own file.  This include
some refactoring of elog.c that should be moved to 0002, as this
requires more routines of elog.c to be published:
-- write_pipe_chunks()
-- error_severity()
- 0005 is the main meat, that introduces JSON as log_destination.
This compiles and passes all my tests, but I have not really done an
in-depth review of this code yet.

0002 and 0004 could be more polished and most of their pieces had
better be squashed together.  0003, though, would improve the case of
WIN32 where only csvlog is enabled so as log entries are properly
redirected to the event logs if the redirection is not done yet.  I'd
like to move on with 0001 and 0003 as independent pieces.

Sehrope, any thoughts?
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
"Bossart, Nathan"
Date:
On 10/5/21, 12:22 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> - 0001 does some refactoring of the file rotation in syslogger.c,
> that's the same patch as previously posted.

My compiler is unhappy with 5c6e33f:

        syslogger.c: In function ‘logfile_rotate_dest’:
        syslogger.c:1302:11: warning: ‘logFileExt’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        filename = logfile_getname(fntime, logFileExt);
        ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The attached patch seems to fix it.

Nathan


Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Thu, Oct 07, 2021 at 06:00:08AM +0000, Bossart, Nathan wrote:
> The attached patch seems to fix it.

Thanks, sorry about that.  I was able to see that once I compiled
without assertions.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Tue, Oct 05, 2021 at 04:18:17PM +0900, Michael Paquier wrote:
> 0002 and 0004 could be more polished and most of their pieces had
> better be squashed together.  0003, though, would improve the case of
> WIN32 where only csvlog is enabled so as log entries are properly
> redirected to the event logs if the redirection is not done yet.  I'd
> like to move on with 0001 and 0003 as independent pieces.

0001 and 0003 have been applied independently, attached is a rebased
version.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Fri, Oct 08, 2021 at 12:28:58PM +0900, Michael Paquier wrote:
> 0001 and 0003 have been applied independently, attached is a rebased
> version.

Attached are rebased versions of the patch set, where I have done a
cleaner split:
- 0001 includes all the refactoring of the routines from elog.c.
- 0002 moves csv logging into its own file.
- 0003 introduces the JSON log.

0001 and 0002, the refactoring bits, are in a rather committable
shape, so I'd like to apply that as the last refactoring pieces I know
of for this thread.  0003 still needs a closer lookup, and one part I
do not like much in it is the split for [u]int and long values when it
comes to key and values.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Tue, Oct 19, 2021 at 08:02:02PM +0900, Michael Paquier wrote:
> 0001 and 0002, the refactoring bits, are in a rather committable
> shape, so I'd like to apply that as the last refactoring pieces I know
> of for this thread.  0003 still needs a closer lookup, and one part I
> do not like much in it is the split for [u]int and long values when it
> comes to key and values.

I have finally come around 0003 and reviewed it.  There were a couple
of issues within it, from complications in the code that did not feel
necessary to incorrect handling of the values logged, mostly around
when values should be escaped or not.  jsonlog.c has been reorganized
so as its fields match with csvlog.c, and I have simplified the APIs
in charge of saving the integers into a single one with an argument
list and an option to control if the value should be escaped or not.

postgresql.conf.sample also needed a refresh.

I have also spent some time on the documentation, where the list of
JSON keys with their descriptions and types has been changed to a
table, for clarity.  The list was a bit incorrect (incorrect fields
and missing entries), so that should hopefully be clean now.

Patch 0003 has been heavily reworked, and it would be good to have an
extra pair of eyes on it.  So I have switched the CF entry as "Needs
Review" and added my name to the list of authors (originally this
stuff took code portions of own module, as well).
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Andres Freund
Date:
Hi,

On 2021-11-10 22:44:49 +0900, Michael Paquier wrote:
> Patch 0003 has been heavily reworked, and it would be good to have an
> extra pair of eyes on it.  So I have switched the CF entry as "Needs
> Review" and added my name to the list of authors (originally this
> stuff took code portions of own module, as well).

The tests don't seem to pass on windows:
https://cirrus-ci.com/task/5412456754315264?logs=test_bin#L47
https://api.cirrus-ci.com/v1/artifact/task/5412456754315264/tap/src/bin/pg_ctl/tmp_check/log/regress_log_004_logrotate

psql:<stdin>:1: ERROR:  division by zero
could not open "c:/cirrus/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles": The system
cannotfind the file specified at t/004_logrotate.pl line 87.
 

Greetings,

Andres Freund



Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Sun, Jan 02, 2022 at 01:34:45PM -0800, Andres Freund wrote:
> The tests don't seem to pass on windows:
> https://cirrus-ci.com/task/5412456754315264?logs=test_bin#L47
>
https://api.cirrus-ci.com/v1/artifact/task/5412456754315264/tap/src/bin/pg_ctl/tmp_check/log/regress_log_004_logrotate
>
> psql:<stdin>:1: ERROR:  division by zero
> could not open "c:/cirrus/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles": The system
cannotfind the file specified at t/004_logrotate.pl line 87. 

This seems to point out that the syslogger is too slow to capture the
logrotate signal, and the patch set is introducing nothing new in
terms of infrastructure, just an extra value for log_destination.
This stuff passes here, and I am not spotting something amiss after an
extra close read.

Attached is an updated patch set that increases the test timeout (5min
-> 10min).  That should help, I assume.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Andrew Dunstan
Date:
On 1/5/22 02:32, Michael Paquier wrote:
> On Sun, Jan 02, 2022 at 01:34:45PM -0800, Andres Freund wrote:
>> The tests don't seem to pass on windows:
>> https://cirrus-ci.com/task/5412456754315264?logs=test_bin#L47
>>
https://api.cirrus-ci.com/v1/artifact/task/5412456754315264/tap/src/bin/pg_ctl/tmp_check/log/regress_log_004_logrotate
>>
>> psql:<stdin>:1: ERROR:  division by zero
>> could not open "c:/cirrus/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles": The system
cannotfind the file specified at t/004_logrotate.pl line 87.
 
> This seems to point out that the syslogger is too slow to capture the
> logrotate signal, and the patch set is introducing nothing new in
> terms of infrastructure, just an extra value for log_destination.
> This stuff passes here, and I am not spotting something amiss after an
> extra close read.
>
> Attached is an updated patch set that increases the test timeout (5min
> -> 10min).  That should help, I assume.


ITYM 3 min -> 6  min. But in any case, is that really going to solve
this? The file should exist, even if its contents are not up to date, AIUI.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Add jsonlog log_destination for JSON server logs

From
Andrew Dunstan
Date:
On 1/6/22 13:06, Andrew Dunstan wrote:
> On 1/5/22 02:32, Michael Paquier wrote:
>> On Sun, Jan 02, 2022 at 01:34:45PM -0800, Andres Freund wrote:
>>> The tests don't seem to pass on windows:
>>> https://cirrus-ci.com/task/5412456754315264?logs=test_bin#L47
>>>
https://api.cirrus-ci.com/v1/artifact/task/5412456754315264/tap/src/bin/pg_ctl/tmp_check/log/regress_log_004_logrotate
>>>
>>> psql:<stdin>:1: ERROR:  division by zero
>>> could not open "c:/cirrus/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles": The
systemcannot find the file specified at t/004_logrotate.pl line 87.
 
>> This seems to point out that the syslogger is too slow to capture the
>> logrotate signal, and the patch set is introducing nothing new in
>> terms of infrastructure, just an extra value for log_destination.
>> This stuff passes here, and I am not spotting something amiss after an
>> extra close read.
>>
>> Attached is an updated patch set that increases the test timeout (5min
>> -> 10min).  That should help, I assume.
>
> ITYM 3 min -> 6  min. But in any case, is that really going to solve
> this? The file should exist, even if its contents are not up to date, AIUI.



I have tested on an msys2 setup with your v8 patches and I am getting this:


#   Failed test 'current_logfiles is sane'
#   at t/004_logrotate.pl line 96.
#                   'stderr log/postgresql-2022-01-06_222419.log
# csvlog log/postgresql-2022-01-06_222419.csv
# '
#     doesn't match '(?^:^stderr log/postgresql-.*log
# csvlog log/postgresql-.*csv
# jsonlog log/postgresql-.*json$)'

#   Failed test 'found expected log file content for stderr'
#   at t/004_logrotate.pl line 103.
#                   ''
#     doesn't match '(?^:division by zero)'

#   Failed test 'found expected log file content for jsonlog'
#   at t/004_logrotate.pl line 105.
#                   undef
#     doesn't match '(?^:division by zero)'

#   Failed test 'pg_current_logfile() gives correct answer with jsonlog'
#   at t/004_logrotate.pl line 105.
#          got: ''
#     expected: undef
# Looks like you failed 4 tests of 14.
[22:37:31] t/004_logrotate.pl ...
Dubious, test returned 4 (wstat 1024, 0x400)
Failed 4/14 subtests
[22:37:31]


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Thu, Jan 06, 2022 at 06:28:26PM -0500, Andrew Dunstan wrote:
> I have tested on an msys2 setup with your v8 patches and I am getting this:
>
> #   Failed test 'current_logfiles is sane'
> #   at t/004_logrotate.pl line 96.
> #                   'stderr log/postgresql-2022-01-06_222419.log
> # csvlog log/postgresql-2022-01-06_222419.csv

Yes, I was waiting for the latest results, but that did not help at
all.  Something is wrong with the patch, I am not sure what yet, but
that seems like a mistake in the backend part of it rather than the
tests.  I have switched the CF entry as waiting on author until this
is addressed.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Fri, Jan 07, 2022 at 03:49:47PM +0900, Michael Paquier wrote:
> Yes, I was waiting for the latest results, but that did not help at
> all.  Something is wrong with the patch, I am not sure what yet, but
> that seems like a mistake in the backend part of it rather than the
> tests.  I have switched the CF entry as waiting on author until this
> is addressed.

The issue comes from an incorrect change in syslogger_parseArgs()
where I missed that the incrementation of argv by 3 has no need to be
changed.  A build with -DEXEC_BACKEND is enough to show the failure,
which caused a crash when starting up the syslogger because of a NULL
pointer dereference.  The attached v9 should be enough to switch the
CF bot to green.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
"Bossart, Nathan"
Date:
On 1/10/22, 4:51 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> The issue comes from an incorrect change in syslogger_parseArgs()
> where I missed that the incrementation of argv by 3 has no need to be
> changed.  A build with -DEXEC_BACKEND is enough to show the failure,
> which caused a crash when starting up the syslogger because of a NULL
> pointer dereference.  The attached v9 should be enough to switch the
> CF bot to green.

I've been looking at the latest patch set intermittently and playing
around with jsonlog a little.  It seems to work well, and I don't have
any significant comments about the code.  0001 and 0002 seem
straightforward and uncontroversial.  IIUC 0003 simply introduces
jsonlog using the existing framework.

I wonder if we should consider tracking each log destination as a set
of function pointers.  The main logging code would just loop through
the enabled log destinations and use these functions, and it otherwise
would be completely detached (i.e., no "if jsonlog" blocks).  This
might open up the ability to define custom log destinations via
modules, too.  However, I don't know if there's any real demand for
something like this, and it should probably be done separately from
introducing jsonlog, anyway.

Nathan


Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Tue, Jan 11, 2022 at 08:34:26PM +0000, Bossart, Nathan wrote:
> I've been looking at the latest patch set intermittently and playing
> around with jsonlog a little.  It seems to work well, and I don't have
> any significant comments about the code.  0001 and 0002 seem
> straightforward and uncontroversial.

Thanks.  I have looked again at 0001 and 0002 today and applied both,
so it means that we are done with all the refactoring pieces proposed
up to now.

> IIUC 0003 simply introduces jsonlog using the existing framework.

This part will have to wait a bit more, but yes, this piece should be
straight-forward.

> I wonder if we should consider tracking each log destination as a set
> of function pointers.  The main logging code would just loop through
> the enabled log destinations and use these functions, and it otherwise
> would be completely detached (i.e., no "if jsonlog" blocks).  This
> might open up the ability to define custom log destinations via
> modules, too.  However, I don't know if there's any real demand for
> something like this, and it should probably be done separately from
> introducing jsonlog, anyway.

I am not sure that this is worth the complications, either.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Wed, Jan 12, 2022 at 03:27:19PM +0900, Michael Paquier wrote:
> This part will have to wait a bit more, but yes, this piece should be
> straight-forward.

Okay, this last piece has been applied this morning, after more review
and a couple of adjustments, mainly cosmetic (pg_current_logfile
missed a refresh, incorrect copyright in jsonlog.c, etc.).  Let's see
what the buildfarm thinks.
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Mon, Jan 17, 2022 at 10:48:06AM +0900, Michael Paquier wrote:
> Okay, this last piece has been applied this morning, after more review
> and a couple of adjustments, mainly cosmetic (pg_current_logfile
> missed a refresh, incorrect copyright in jsonlog.c, etc.).  Let's see
> what the buildfarm thinks.

By the way, while on it, using directly COPY to load the logs from a
generated .json file can be trickier than it looks, as backslashes
require an extra escap when loading the data.  One idea, while not the
best performance-wise, is to rely on COPY FROM PROGRAM with commands
like that:
CREATE TABLE logs (data jsonb);
COPY logs FROM PROGRAM 'cat logs.json | sed ''s/\\/\\\\/g''';
--
Michael

Attachment

Re: Add jsonlog log_destination for JSON server logs

From
Alvaro Herrera
Date:
So, thinking about this, there is one important piece that is missing
here, which is the ability to change the default format for what we
write to stderr.  Right now, if you have stderr output, it is always in
the "plain multiline" format, with no option to change it.  If you want
a JSON log, you have to read a file.  But ISTM it would be pretty useful
if you could say "log_default_format=json" and get the log that we get
in stderr in the JSON format instead.

From what I hear in the container world, what they would *prefer* (but
they don't often get) is to receive the JSON-format logs directly in
stderr from the daemons they run; they capture stderr and they have the
logs just in the format they need, without having to open the log files,
parsing the lines to rewrite in a different format as is done currently.

I think this would be a relatively easy patch to do.  Opinions?

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



Re: Add jsonlog log_destination for JSON server logs

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> So, thinking about this, there is one important piece that is missing
> here, which is the ability to change the default format for what we
> write to stderr.  Right now, if you have stderr output, it is always in
> the "plain multiline" format, with no option to change it.  If you want
> a JSON log, you have to read a file.  But ISTM it would be pretty useful
> if you could say "log_default_format=json" and get the log that we get
> in stderr in the JSON format instead.

>> From what I hear in the container world, what they would *prefer* (but
> they don't often get) is to receive the JSON-format logs directly in
> stderr from the daemons they run; they capture stderr and they have the
> logs just in the format they need, without having to open the log files,
> parsing the lines to rewrite in a different format as is done currently.

> I think this would be a relatively easy patch to do.  Opinions?

I think assuming that everything that comes out on the postmaster's stderr
is generated by our code is hopelessly naive.  See for example glibc's
bleats when it detects malloc corruption, or when loading a shlib fails.
So I don't believe something like this can be made to work reliably.

The existing syslogger logic does have the ability to cope with
such out-of-protocol data.  So maybe, if you are using syslogger,
you could have it transform such messages into some
lowest-common-denominator jsonlog format.  But it's not going to
work to expect that to happen with raw stderr.

            regards, tom lane



Re: Add jsonlog log_destination for JSON server logs

From
Michael Paquier
Date:
On Thu, Feb 10, 2022 at 07:45:17PM -0300, Alvaro Herrera wrote:
> From what I hear in the container world, what they would *prefer* (but
> they don't often get) is to receive the JSON-format logs directly in
> stderr from the daemons they run; they capture stderr and they have the
> logs just in the format they need, without having to open the log files,
> parsing the lines to rewrite in a different format as is done currently.

Yes, I have been pinged about that, which is why there are still cases
for my out-of-core extension jsonlog that uses the elog hook.

> I think this would be a relatively easy patch to do.  Opinions?

The postmaster goes through a couple of loops with the fd to open for
the default format, that the syslogger inherits from the postmaster,
and I am pretty sure that there are a couple of code paths around the
postmaster startup that can be tricky to reason about.

Making the new parameter PGC_POSTMASTER makes things easier to handle,
still the postmaster generates a couple of LOG entries and redirects
them to stderr before loading any GUC values, which would mean that we
cannot make sure that all the logs are valid JSON objects.  If we want
to be 100% waterproof here, we may want to track down the format to
use by default with a mean different than a GUC for the postmaster
startup?  A file holding this information in the root of the data
folder would be one way.
--
Michael

Attachment