Thread: Refactoring syslogger piping to simplify adding new log destinations

Refactoring syslogger piping to simplify adding new log destinations

From
Sehrope Sarkuni
Date:
Hi,

While working on adding a new log_destination I noticed that the
syslogger piping would need to be updated. At the moment both ends
only handle stderr/csvlog as the pipe message header has a char
"is_last" that is either t/f (stderr last, stderr partial) or T/F
(csvlog last, csvlog partial). Couple approaches came to mind:

1. Use additional pairs of chars for each additional destination (e.g.
x/X, y/Y, ...) and mimic the logic of csvlog.
2. Repurpose the char "is_last" as a bitmap of the log destination
with the highest order bit indicating whether it's the last chunk.
3. Add a separate field "dest" for the log destination and leave
"is_last" as a t/f indicating whether it's the last chunk.

Attached are patches for each approach (fun exercise!). Also attached
is a basic TAP test to invoke the csvlog destination. It's a clone of
pg_ctl log rotation test that looks for .csv logs. If there's interest
in the test I was thinking of expanding it a bit to include "big"
output that would span multiple messages to test the partial/combining
path. My thoughts on the approaches:

#1 doesn't change the header types or size but seems ugly as it leads
to new pairs of constants and logic in multiple places. In particular,
both send and receive ends have to encode and decode the destination.
#2 is cleaner as there's a logical separation of the dest fields and
no need for new constant pairs when adding new destinations. Would
also need to ensure new LOG_DESTINATION_xyz constants do not use that
last bit (there's already four now so room for three more).
#3 leads to the cleanest code though you lose 4-bytes of max data size
per chunk.

Which would be preferable? I'd like to validate the approach as the
new log destination would be built atop it. I leaning toward #3 though
if the 4-byte loss is a deal breaker then #2.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachment

Re: Refactoring syslogger piping to simplify adding new logdestinations

From
Alvaro Herrera
Date:
Hi Sehrope,

On 2019-Jul-10, Sehrope Sarkuni wrote:

> While working on adding a new log_destination I noticed that the
> syslogger piping would need to be updated. At the moment both ends
> only handle stderr/csvlog as the pipe message header has a char
> "is_last" that is either t/f (stderr last, stderr partial) or T/F
> (csvlog last, csvlog partial).

I spent some time a couple of weeks ago looking at profiles of the
syslogger code, and my impression is that the current design of using a
pipe to move data from one process to another may benefit from a
complete rewrite; it seems that the kernel overhead of going over the
pipe is significant.  (The test case was a couple dozen processes all
generating a thousand of couple-hundred-KB log lines.  In perf, the pipe
read/write takes up 99% of the CPU time).

Maybe we can use something like a shared memory queue, working in a
similar way to wal_buffers -- where backends send over the shm queue to
syslogger, and syslogger writes in order to the actual log file.  Or
maybe something completely different; I didn't actually prototype
anything, just observed the disaster.

I'm not opposed to your patches, just trying to whet your appetite for
something bigger in the vicinity.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Maybe we can use something like a shared memory queue, working in a
> similar way to wal_buffers -- where backends send over the shm queue to
> syslogger, and syslogger writes in order to the actual log file.

No way that's going to be acceptable for postmaster output.

            regards, tom lane



Re: Refactoring syslogger piping to simplify adding new logdestinations

From
Alvaro Herrera
Date:
On 2019-Jul-10, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Maybe we can use something like a shared memory queue, working in a
> > similar way to wal_buffers -- where backends send over the shm queue to
> > syslogger, and syslogger writes in order to the actual log file.
> 
> No way that's going to be acceptable for postmaster output.

Well, we can use both mechanisms simultaneously. Postmaster doesn't emit
all that much output anyway, so I don't think that's a concern.  And
actually, we still need the pipes from the backend for the odd cases
where third party code writes to stderr, no?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Jul-10, Tom Lane wrote:
>> No way that's going to be acceptable for postmaster output.

> Well, we can use both mechanisms simultaneously. Postmaster doesn't emit
> all that much output anyway, so I don't think that's a concern.  And
> actually, we still need the pipes from the backend for the odd cases
> where third party code writes to stderr, no?

Yeah, if you don't want to give up capturing random stderr output (and you
shouldn't), that's another issue.  But as you say, maybe we could have both
mechanisms.  There'd be a synchronization problem for pipe vs queue output
from the same process, but maybe that will be tolerable.

            regards, tom lane