Thread: Refactoring syslogger piping to simplify adding new log destinations
Hi, While working on adding a new log_destination I noticed that the syslogger piping would need to be updated. At the moment both ends only handle stderr/csvlog as the pipe message header has a char "is_last" that is either t/f (stderr last, stderr partial) or T/F (csvlog last, csvlog partial). Couple approaches came to mind: 1. Use additional pairs of chars for each additional destination (e.g. x/X, y/Y, ...) and mimic the logic of csvlog. 2. Repurpose the char "is_last" as a bitmap of the log destination with the highest order bit indicating whether it's the last chunk. 3. Add a separate field "dest" for the log destination and leave "is_last" as a t/f indicating whether it's the last chunk. Attached are patches for each approach (fun exercise!). Also attached is a basic TAP test to invoke the csvlog destination. It's a clone of pg_ctl log rotation test that looks for .csv logs. If there's interest in the test I was thinking of expanding it a bit to include "big" output that would span multiple messages to test the partial/combining path. My thoughts on the approaches: #1 doesn't change the header types or size but seems ugly as it leads to new pairs of constants and logic in multiple places. In particular, both send and receive ends have to encode and decode the destination. #2 is cleaner as there's a logical separation of the dest fields and no need for new constant pairs when adding new destinations. Would also need to ensure new LOG_DESTINATION_xyz constants do not use that last bit (there's already four now so room for three more). #3 leads to the cleanest code though you lose 4-bytes of max data size per chunk. Which would be preferable? I'd like to validate the approach as the new log destination would be built atop it. I leaning toward #3 though if the 4-byte loss is a deal breaker then #2. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Attachment
Hi Sehrope, On 2019-Jul-10, Sehrope Sarkuni wrote: > While working on adding a new log_destination I noticed that the > syslogger piping would need to be updated. At the moment both ends > only handle stderr/csvlog as the pipe message header has a char > "is_last" that is either t/f (stderr last, stderr partial) or T/F > (csvlog last, csvlog partial). I spent some time a couple of weeks ago looking at profiles of the syslogger code, and my impression is that the current design of using a pipe to move data from one process to another may benefit from a complete rewrite; it seems that the kernel overhead of going over the pipe is significant. (The test case was a couple dozen processes all generating a thousand of couple-hundred-KB log lines. In perf, the pipe read/write takes up 99% of the CPU time). Maybe we can use something like a shared memory queue, working in a similar way to wal_buffers -- where backends send over the shm queue to syslogger, and syslogger writes in order to the actual log file. Or maybe something completely different; I didn't actually prototype anything, just observed the disaster. I'm not opposed to your patches, just trying to whet your appetite for something bigger in the vicinity. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Maybe we can use something like a shared memory queue, working in a > similar way to wal_buffers -- where backends send over the shm queue to > syslogger, and syslogger writes in order to the actual log file. No way that's going to be acceptable for postmaster output. regards, tom lane
On 2019-Jul-10, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Maybe we can use something like a shared memory queue, working in a > > similar way to wal_buffers -- where backends send over the shm queue to > > syslogger, and syslogger writes in order to the actual log file. > > No way that's going to be acceptable for postmaster output. Well, we can use both mechanisms simultaneously. Postmaster doesn't emit all that much output anyway, so I don't think that's a concern. And actually, we still need the pipes from the backend for the odd cases where third party code writes to stderr, no? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-Jul-10, Tom Lane wrote: >> No way that's going to be acceptable for postmaster output. > Well, we can use both mechanisms simultaneously. Postmaster doesn't emit > all that much output anyway, so I don't think that's a concern. And > actually, we still need the pipes from the backend for the odd cases > where third party code writes to stderr, no? Yeah, if you don't want to give up capturing random stderr output (and you shouldn't), that's another issue. But as you say, maybe we could have both mechanisms. There'd be a synchronization problem for pipe vs queue output from the same process, but maybe that will be tolerable. regards, tom lane